1. Goals for this week:
-
Learn about manual pages and using
man
andapropos
. -
Learn tools for examining binary files.
-
Practice examining a binary program file to discover what it’s doing.
-
Introduction to Lab 6.
2. Starting Point Code
Start by creating a week07
in your cs31/WeeklyLabs
subdirectory
and copying over some files:
$ cd ~/cs31/WeeklyLabs
$ mkdir week07
$ cd week07
$ pwd
/home/you/cs31/WeeklyLabs/week07
$ cp ~sukrit/public/cs31/week07/* ./
$ ls
Makefile mystery* simplefuncs.c README
3. man and manpages
First, we are going to learn how to use man to read manual pages, and how to use apropos to find commands: man and apropos
Next, let’s look at the man page for strcmp and for scanf to see what they are telling us about these functions.
$ man scanf
$ man 3 scanf # or explicitly specify the manual section:
# (C library function scanf is in section 3 of the manual)
$ man strcmp
apropos
is a command for finding the names of other commands or library
functions. It is useful if you cannot remember the name of a library
function or command but you know what it does. Suppose that we cannot
remember strcmp
, we could try to find it using apropos:
$ apropos compare
4. gdb for debugging at the assembly code level
With gdb
you can debug and trace through a program execution at the
assembly code level. This includes executing individual assembly
instructions, examining register values, and disassembling functions.
First, let’s open up simplefuncs.c
in an editor. Then, let’s try some
things out in gdb:
$ gdb simplefuncs
(gdb) break main
(gdb) run
In gdb you can disassemble code using the disas
command:
(gdb) disas
(gdb) disas func1
Tired of typing (gdb) layout asm You can also use (gdb) layout reg The only caveat is that it doesn’t always play nicely when the program you’re
debugging produces output (e.g., with |
You can break at a particular offset into a function:
(gdb) break *main+77 # set breakpoint at offset +77 in main
And you can step or next at the instruction level using ni
or si
(si
steps into function calls, ni
skips over them):
(gdb) ni # execute the next instruction then gdb gets control again
(gdb) ni
(gdb) ni
(gdb) disas
(gdb) cont # continue to next break point
Now we are at the call to func1
, let’s step into this function using si
(we also have a breakpoint at this function, let’s see when it is hit):
(gdb) si # step into instructions in the called function (func1)
(gdb) disas
(gdb) ni
(gdb) where
(gdb) disas
(gdb) cont
The difference between si
and ni
shows up in what each does on a
call
instruction. si
gives gdb control again at instructions at
the beginning of the called function. ni
gives gdb control again at
the instruction immediately after the call
instruction (the
instruction at the return address). In other words, si
"steps into"
the called function, whereas ni
lets the called function code
continue, and only after the function returns does gdb get control
again.
You can print out the values of individual registers like this:
(gdb) p $rax
You can also view all register values:
(gdb) info registers
You can use the display command to automatically display values each time a breakpoint is reached:
(gdb) display $rax
(gdb) display $rdx
4.1. Examining memory
Let’s reset the state of the program to just before the call to func
:
(gdb) run (gdb) cont (gdb) disas
At this point in the program, we can see that in addition to being in
registers, the values 2
and 200
have been stored on the stack at addresses
-0x10(%rbp)
and -0x8(%rbp)
, respectively.
If you want to check the contents of memory, you could do something like:
p *(int *)($rbp - 0x10)
That’s a really nasty statement! Alternatively, you can use the examine
command (x
) to display the contents of a memory location. The memory address
operand to (x
) can be specified as the name of the register storing the
address value or as an absolute memory address value. Here are some examples:
(gdb) p $rbp-0x10 # see what p and x display for the same value
(gdb) x $rbp-0x10 # see what p and x display for the same value
The examine command also takes formatting options to tell it how to interpret the memory at the address:
(gdb) x/wd $rbp-0x10 # examine memory at specified address and display it in decimal
(gdb) x/wx $rbp-0x10 # examine memory at specified address and display it in hex
(gdb) x/s $rbp-0x10 # examine memory at specified address and display it as a string
Examine’s formatting is sticky, which means that its last format specification is the one used for subsequent calls. To change it, explicitly specify an option again. This behavior is different from print, which always defaults to int.
(gdb) x/wd $rbp-0x10 # examine memory at address ($rbp-0x10) as an int in decimal
(gdb) x $rbp-0x8 # examine memory ($rbp-0x8) with /d formatting (sticky formatting)
Let’s move forward until we’re about to call printf
at main+103
:
(gdb) break *main+103 (gdb) cont # breaks when we get to main+103 (gdb) disas
We know that printf
always receives a format string as its first argument, so
let’s see if we can find it. The first six parameters to a function get passed
in registers, starting with register rdi
(or in this case, edi
because the
compiler only needs to set 32 bits of the register). If we look right above
the call to printf
, we see mov $0x47f01c, %edi
. Let’s look at that address:
(gdb) p 0x47f01c
Hmm, that didn’t do anything helpful — maybe because it’s a memory address. Let’s try examining it as a string:
(gdb) x/s 0x47f01c
There we go! That’s the first argument to printf
. We can also print the value
of the second argument (y
) using p
to see the value that was set just prior
to setting register rdi
:
(gdb) p $rsi
This strategy of printing or examining function arguments just prior to calling a function should help you a lot when deciphering what mysterious assembly code is doing.
5. Try out some of these tools on a program binary
Run the mystery binary a few times and see what it is doing:
$ ./mystery
The program is asking you for input, but there is really not a lot of
information provided to guess the right input, and this executable was
not compiled with -g
so there is no C code information we can get from
it when we run it in gdb.
Let’s see if we can examine the assembly code to see if we can figure out what to enter.
Lets trying running it in gdb
and disassembling the code:
$ gdb ./mystery
(gdb) layout asm # optional: turns on the ASM layout
(gdb) break main
(gdb) run
(gdb) disas # you only need to do this if you didn't turn on ASM layout
Consider the following things when examining a program like this:
-
What does main’s control flow look like? Do you recognize any patterns that look like loops or if statements?
-
What is the state of the registers and memory around function calls?
-
Are there any constant addresses that might be strings? If so, you can try printing them using
x/s
.
(gdb) x/s base_addr_of_string
6. Lab 6
Finally, let’s take a look at Lab 6.
7. Handy References
-
GDB for Assembly (from the gdb Guide). (assembly debugging and x command)
-
Section 3.2 and Section 3.5 of textbook (assembly debugging, print, display, info and x commands)
-
Tools for examining phases of compiling and running C programs