CS31 Weekly Lab: Week 13

top and gdb for pthreads programs

Week 13 lab goals:

  1. Look at further examples of pthread synchronization
  2. Learn about top for analyzing threads
  3. Learn about gdb for debugging threads
  4. See an example run of a multithreaded program

Create a week13 subdirectory in your weeklylab subdirectory and copy over some files:
    cd cs31/weeklylab
    mkdir week13
    cd week13
    cp ~lammert/public/cs31/week13/* .
    ls
    Makefile  deadlock.c  racecond.c  synch.c
Then type make to build the executable files.

example pthread synchronization primitives
The file synch.c contains some examples using pthreads mutex and barrier synchronization.

top and threads
top is a Unix utility that lists a lot of information about processes running in the system and how they are using resources like memory and CPU. If you run top with no command line options, then it will display per-process statistics. If you run top with -H, top will display statistics for individual threads:
top -H
While top is running, you can change what it is displaying in each column. To do this, type f in the top window, and you should get something like this:
Fields Management for window 1:Def, whose current sort field is %CPU
   Navigate with Up/Dn, Right selects for move then Enter or Left commits,
   'd' or Space toggles display, 's' sets sort.  Use 'q' or Esc to end!
* PID        = Process Id
* USER       = User Name
* PR         = Priority
* NI         = Nice value
* VIRT       = Virtual Image (kb)
* RES        = Resident size (kb)
* SHR        = Shared Mem size (kb)
* S          = Process Status
* %CPU       = CPU usage
* %MEM       = Memory usage (RES)
* TIME+      = CPU Time, hundredths
  PPID       = Parent Process Pid
  RUSER      = Real user name
  UID        = User Id
  GROUP      = Group Name
  TTY        = Controlling Tty
  P          = Last used cpu (SMP)
  SWAP       = Swapped size (kb)
  TIME       = CPU Time
...
* X: COMMAND    = Command name/line
The starred items are the current values. To select different items or units for top to display, navigate up and down with the arrow keys. When you have highlighted the item you want, hit the space bar. For example, to get top to print information about the last CPU each process or thread ran on, navigate down to P and the hit Space. When done changing options hit, q to return to the main top window. The top window will now have a new column P that list this information.
 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+   P COMMAND          
 8249 newhall   20   0  153m 4488  472 R   47  0.0   0:02.78 6 gol              
 8250 newhall   20   0  153m 4488  472 R   47  0.0   0:02.76 2 gol              
 8236 newhall   20   0  153m 4488  472 R   46  0.0   0:02.77 6 gol              
 8237 newhall   20   0  153m 4488  472 S   46  0.0   0:02.77 4 gol              
 8243 newhall   20   0  153m 4488  472 R   46  0.0   0:02.76 1 gol              
 8239 newhall   20   0  153m 4488  472 S   46  0.0   0:02.76 7 gol              
 8240 newhall   20   0  153m 4488  472 R   46  0.0   0:02.76 5 gol              
 8244 newhall   20   0  153m 4488  472 R   46  0.0   0:02.72 2 gol              
 8251 newhall   20   0  153m 4488  472 R   46  0.0   0:02.78 4 gol              
Type q to exit top.

Let's run synch with a bunch of threads, and then top -H in another window to see what we can see.

gdb and pthreads
Debugging threaded programs can be tricky because there are multiple streams of execution. In general, try to debug with as few threads as possible, and if you use printfs, print out a thread id and call fflush after.

We are going to look at using gdb to debug threaded programs today, and here is a link to some more information about:
gdb and pthreads

gdb has support for debugging threaded programs. One thing to keep in mind as you debug pthreaded programs on our system, is that there are at least three different identifiers for the same thread as you run it in gdb:

  1. the pthread library's id for the thread (its pthread_t value)
  2. the operating systems id for the thread (its LWP id value). This is used in part for the OS to keep track of this thread for scheduling purposes.
  3. the gdb id for the thread: this is the id you should use when specify gdb commands for a single thread.
The correspondence between the threads can differ from one OS and pthread library implementation to another, but on our systems there is a one-to-one-to-one correspondence between a pthread id, an LWP id, and a gdb thread id.

A few gdb thread-specific commands:

  set print thread-events   # prints out thread start and exit events
  info threads              # list all existing threads in program 
                            # the gdb threadno is the first value listed
                            # the thread that hit the break point is *'ed 
  thread threadno           # switch to thread threadno's context
                            # (see its stack when type where, for example)
  break [where] thread [threadno] # set a breakpoint at [where] just for 
                                  # thread threadno
                            
  thread apply [threadno|all] command  # apply the gdb command to all or a subset of threads
Basically, in gdb you use the following prefix to a gdb command to apply a particular gdb command to all or just a subset of threads (ex. 2-5) (using its gdb thread id):
thread apply [thread_id | all]  command
This doesn't seem to work for setting breakpoints on a single thread, so use the other way:
break line_no thread thread_no

The default behavior of gdb when a thread hits a breakpoint is that all threads are suspended wherever they happen to be until the user types cont. You can change this default behavior to have threads who are not at a breakpoint continue executing while you debug the ones that hit their breakpoints (but it is hard to think of scenarios where doing this would make debugging easier, so I'd say probably stick with the default).


A simple example run

Let's try running racecond in gdb. We will set a breakpoint for all threads in worker_loop and then set a breakpoint at line 76 just for thread 3 REMEMBER gdb's thread number 3 may not correspond to a logical thread number in your program (i.e. myid may not be 3 for gdb thread 3).
$ gdb ./racecond
(gdb) delete all
(gdb) break worker_loop
(gdb) run 5
(gdb) info threads
(gdb) break 76 thread 3     # set's the breakpoint just for thread 3
(gdb) display myid
(gdb) cont ...

Here is some more output from using gdb on the racecond program that shows how to use some of the thread commands and what their output might look like:
% gdb ./racecond
  ...
(gdb) set print thread-events on
(gdb) run 5

Starting program: /home/newhall/public/cs31/week13/racecond 5
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

[New Thread 0x7ffff77fd700 (LWP 17471)]
hello I'm thread 0 with pthread_id 140737345738496
# LWP 17471: means Light Weight Process with id number 17471: 
# an LWP is a thread the OS knows about, 17471 is the OS's id number for
# the thread 140737345738496 is the pthread library's id number for the thread 

[New Thread 0x7ffff6ffc700 (LWP 17472)]
hello I'm thread 1 with pthread_id 140737337345792
[New Thread 0x7ffff67fa700 (LWP 17473)]
hello I'm thread 2 with pthread_id 140737328948992
[New Thread 0x7ffff5ff9700 (LWP 17474)]
hello I'm thread 3 with pthread_id 140737320556288
[New Thread 0x7ffff57f8700 (LWP 17475)]
hello I'm thread 4 with pthread_id 140737312163584
[Thread 0x7ffff6ffc700 (LWP 17472) exited]
[Thread 0x7ffff77fd700 (LWP 17471) exited]
[Thread 0x7ffff67fa700 (LWP 17473) exited]
[Thread 0x7ffff57f8700 (LWP 17475) exited]
count = 141335712
[Thread 0x7ffff5ff9700 (LWP 17474) exited]
[Inferior 1 (process 17451) exited normally]


(gdb) break worker_loop
(gdb) run 3

(gdb) break 76     # sets the breakpoint for every thread 

Breakpoint 2, worker_loop (arg=0x602030) at racecond.c:76
76	      count += i; 

(gdb) info threads  (the starred one is active)
  Id   Target Id         Frame 
  4    Thread 0x7ffff67fa700 (LWP 17587) "racecond" worker_loop (arg=0x602038)
    at racecond.c:68
  3    Thread 0x7ffff6ffc700 (LWP 17549) "racecond" __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
* 2    Thread 0x7ffff77fd700 (LWP 17548) "racecond" worker_loop (arg=0x602030)
    at racecond.c:76
  1    Thread 0x7ffff7fcd700 (LWP 17539) "racecond" 0x00007ffff7bc6148 in 
    pthread_join (threadid=140737345738496, thread_return=0x0) at pthread_join.c:89

# thread 2 is the current thread, where will show thread 2's stack trace:
(gdb) where
#0  worker_loop (arg=0x602030) at racecond.c:76
#1  0x00007ffff7bc4e9a in start_thread (arg=0x7ffff77fd700)
    at pthread_create.c:308
#2  0x00007ffff78f1dbd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#3  0x0000000000000000 in ?? ()

# switch to thread three's context
(gdb) thread 3
[Switching to thread 3 (Thread 0x7ffff6ffc700 (LWP 17549))]
#0  __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
132	../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.

# get thread 3's stack trace
(gdb) where
#0  __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
#1  0x00007ffff7bc7065 in _L_lock_858 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007ffff7bc6eba in __pthread_mutex_lock (mutex=0x6010c0)
    at pthread_mutex_lock.c:61
#3  0x0000000000400aa2 in worker_loop (arg=0x602034) at racecond.c:75
#4  0x00007ffff7bc4e9a in start_thread (arg=0x7ffff6ffc700)
    at pthread_create.c:308
#5  0x00007ffff78f1dbd in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

# move into stack frame 3 of thread 3
(gdb) frame 3
#3  0x0000000000400aa2 in worker_loop (arg=0x602034) at racecond.c:75
75	      pthread_mutex_lock(&my_mutex);

(gdb) print my_mutex