1. Goals for this week:
-
Learn tools for examining binary files.
-
Practice examining a binary program file to discover what it’s doing
-
Introduction to Lab 5.
2. Starting Point Code
Start by creating a week06
in your cs31/weeklylab
subdirectory
and copying over some files:
$ cd ~/cs31/weeklylab
$ mkdir week06
$ cd week06
$ pwd
/home/you/cs31/weeklylab/week06
$ cp ~richardw/public/cs31/week06/* ./
$ ls
Makefile mystery* README simplefuncs.c
Compile simplefuncs.c
using the provided Makefile
:
$ make
3. Tools for examining binary files
Some tools for examining binary files:
-
strings
dumps all the strings in a binary file:
$ strings simplefuncs
-
objdump -t
ornm
to list the symbol table contents:
$ objdump -t simplefuncs # list symbol table in the executable file $ nm --format sysv simplefuncs # list symbol table in the executable file
The symbol table includes the names of all functions and global variables in
the program. There is a lot of information in the symbol table that looks
odd, but you should be able to see an entry for the two functions main
and
func1
, and see where their start addresses are in memory.
-
gdb
: for debugging programs at the assembly code level and examining the state of CPU registers and memory as the program runs. These will be the most useful tools for the next lab assignment.
3.1. gdb for debugging at the assembly code level
With gdb
you can debug and trace through a program execution at the
assembly code level. This includes executing individual IA32
instructions, examining register values, and disassembling functions.
First, let’s open up simplefuncs.c
in an editor. Then, let’s try some
things out in gdb:
$ gdb simplefuncs
(gdb) break main
(gdb) break func1
(gdb) run
In gdb you can disassemble code using the disas
command:
(gdb) disas
(gdb) disas func1
You can break at a particular offset into a function:
(gdb) break *main+58 # set breakpoint at offset +58 in main
And you can step or next at the instruction level using ni
or si
(si
steps into function calls, ni
skips over them):
(gdb) ni # execute the next instruction then gdb gets control again
(gdb) ni
(gdb) ni
(gdb) disas
(gdb) cont # continue to next break point
Now we are at the call to func1
, let’s step into this function using si
(we also have a breakpoint at this function, let’s see when it is hit):
(gdb) si # step into instructions in the called function (func1)
(gdb) disas
(gdb) ni
(gdb) where
(gdb) disas
(gdb) cont
The difference between si
and ni
shows up in what each does on a
call
instruction. si
gives gdb control again at instructions at
the beginning of the called function. ni
gives gdb control again at
the instruction immediately after the call
instruction (the
instruction at the return address). In other words, si
"steps into"
the called function, whereas ni
lets the called function code
continue, and only after the function returns does gdb get control
again.
You can print out the values of individual registers like this:
(gdb) p $eax
You can also view all register values:
(gdb) info registers
You can use the display command to automatically display values each time a breakpoint is reached:
(gdb) display $eax
(gdb) display $edx
Tired of typing (gdb) layout asm The only caveat is that it doesn’t always play nicely when the program you’re
debugging produces output (e.g., with |
3.2. Examining memory
Let’s reset the state of the program to just before the call to func
:
(gdb) run (gdb) cont (gdb) disas
At this point in the program, we can see that in addition to being in
registers, the values 2
and 200
have been stored on the stack at addresses
-0x10(%ebp)
and -0xc(%ebp)
, respectively.
If you want to check the contents of memory, you could do something like:
p *(int *)($ebp - 0x10)
That’s a really nasty statement! Alternatively, you can use the examine
command (x
) to display the contents of a memory location. The memory address
operand to (x
) can be specified as the name of the register storing the
address value or as an absolute memory address value. Here are some examples:
(gdb) p $ebp-0x10 # see what p and x display for the same value
(gdb) x $ebp-0x10 # see what p and x display for the same value
The examine command also takes formatting options to tell it how to interpret the memory at the address:
(gdb) x/wd $ebp-0x10 # examine memory at specified address and display it in decimal
(gdb) x/wx $ebp-0x10 # examine memory at specified address and display it in hex
(gdb) x/s $ebp-0x10 # examine memory at specified address and display it as a string
Examine’s formatting is sticky, which means that its last format specification is the one used for subsequent calls. To change it, explicitly specify an option again. This is different from print, which always defaults to int.
(gdb) x/wd $ebp-0x10 # examine memory at address ($ebp-0x10) as an int in decimal
(gdb) x $ebp-0xc # examine memory ($ebp-0xc) with /d formatting (sticky formatting)
Let’s move forward until we’re about to call printf
at main+80
:
(gdb) break *main+80 (gdb) cont # breaks when we enter func1 (gdb) cont # breaks when we get to main+80 (gdb) disas
We know that printf
always receives a format string as its first argument, so
let’s see if we can find it. The parameters to the function should have been pushed onto the stack. If we look right above the call to printf
, we see push $0x80b8014
. Let’s look at that:
(gdb) p 0x80b8014
Hmm, that didn’t do anything helpful — maybe because it’s a memory address. Let’s try examining it as a string:
(gdb) x/s 0x080b8014
There we go! That’s the first argument to printf
. We can also print the value of the second argument (y
) using x/wd
to examine the previously pushed item as an integer:
(gdb) x/wd $ebp-0xc
This strategy of printing function arguments just prior to calling a function should help you a lot when deciphering what mysterious assembly code is doing.
4. Try out some of these tools on a program binary
Run the mystery binary a few times and see what it is doing:
$ ./mystery
The program is asking you for input, but there is really not a lot of
information provided to guess the right input, and this executable was
not compiled with -g
so there is no C code information we can get from
it when we run it in gdb.
Let’s see if we can examine the assembly code to see if we can figure out what to enter.
Lets trying running in gdb
and disassemble some code.
$ gdb ./mystery
(gdb) layout asm # optional: turns on the ASM layout
(gdb) break main
(gdb) run
(gdb) disas # you only need to do this if you didn't turn on ASM layout
Let’s consider some questions about this program:
-
what does main control flow look like?
-
let’s add some break points around function calls and in functions
-
let’s examine some state around functions
-
we can print out strings using x/s
(gdb) x/s base_addr_of_string
5. Lab 5
Finally, let’s take a look at Lab 5.
6. Handy References
-
gdb for IA32 assembly debugging IA32 gdb debugging guide
-
GDB for Assembly (from the gdb Guide). (assembly debugging and
x
command) -
Sections 3.2 and 3.5 of textbook (assembly debugging, print, display, info and
x
commands) -
Tools for examining phases of compiling and running C programs