1. Goals for this week:

  1. Learn tools for examining binary files.

  2. Practice examining a binary program file to discover what it’s doing

  3. Introduction to Lab 5.

2. Starting Point Code

Start by creating a week06 in your cs31/weeklylab subdirectory and copying over some files:

$ cd ~/cs31/weeklylab
$ mkdir week06
$ cd week06
$ pwd
/home/you/cs31/weeklylab/week06
$ cp ~sukrit/public/cs31/week06/* ./
$ ls
Makefile  mystery*  README  simplefuncs.c

Compile simplefuncs.c using the provided Makefile:

$ make

3. gdb for debugging at the assembly code level

With gdb you can debug and trace through a program execution at the assembly code level. This includes executing individual assembly instructions, examining register values, and disassembling functions.

First, let’s open up simplefuncs.c in an editor. Then, let’s try some things out in gdb:

$ gdb simplefuncs
(gdb) break main
(gdb) run

In gdb you can disassemble code using the disas command:

(gdb) disas
(gdb) disas func1

Tired of typing disas all the time? GDB has an option to always show the current assembly code in part of the window:

(gdb) layout asm

The only caveat is that it doesn’t always play nicely when the program you’re debugging produces output (e.g., with printf). If you’re using this mode and the display gets scrambled, try pressing CTRL-L (or resizing the terminal window.)

You can break at a particular offset into a function:

(gdb) break *main+77    # set breakpoint at offset +77 in main

And you can step or next at the instruction level using ni or si (si steps into function calls, ni skips over them):

(gdb) ni        # execute the next instruction then gdb gets control again
(gdb) ni
(gdb) ni
(gdb) disas
(gdb) cont      # continue to next break point

Now we are at the call to func1, let’s step into this function using si (we also have a breakpoint at this function, let’s see when it is hit):

(gdb) si        # step into instructions in the called function (func1)
(gdb) disas
(gdb) ni
(gdb) where
(gdb) disas
(gdb) cont

The difference between si and ni shows up in what each does on a call instruction. si gives gdb control again at instructions at the beginning of the called function. ni gives gdb control again at the instruction immediately after the call instruction (the instruction at the return address). In other words, si "steps into" the called function, whereas ni lets the called function code continue, and only after the function returns does gdb get control again.

You can print out the values of individual registers like this:

(gdb) p $rax

You can also view all register values:

(gdb) info registers

You can use the display command to automatically display values each time a breakpoint is reached:

(gdb) display $rax
(gdb) display $rdx

3.1. Examining memory

Let’s reset the state of the program to just before the call to func:

(gdb) run
(gdb) cont
(gdb) disas

At this point in the program, we can see that in addition to being in registers, the values 2 and 200 have been stored on the stack at addresses -0x10(%rbp) and -0x8(%rbp), respectively.

If you want to check the contents of memory, you could do something like:

p *(int *)($rbp - 0x10)

That’s a really nasty statement! Alternatively, you can use the examine command (x) to display the contents of a memory location. The memory address operand to (x) can be specified as the name of the register storing the address value or as an absolute memory address value. Here are some examples:

(gdb) p $rbp-0x10   # see what p and x display for the same value
(gdb) x $rbp-0x10   # see what p and x display for the same value

The examine command also takes formatting options to tell it how to interpret the memory at the address:

(gdb) x/wd $rbp-0x10 # examine memory at specified address and display it in decimal
(gdb) x/wx $rbp-0x10 # examine memory at specified address and display it in hex
(gdb) x/s $rbp-0x10  # examine memory at specified address and display it as a string

Examine’s formatting is sticky, which means that its last format specification is the one used for subsequent calls. To change it, explicitly specify an option again. This behavior is different from print, which always defaults to int.

(gdb) x/wd $rbp-0x10 # examine memory at address ($rbp-0x10) as an int in decimal
(gdb) x $rbp-0x8     # examine memory ($rbp-0x8) with /d formatting (sticky formatting)

Let’s move forward until we’re about to call printf at main+107:

(gdb) break *main+107
(gdb) cont   # breaks when we get to main+107
(gdb) disas

We know that printf always receives a format string as its first argument, so let’s see if we can find it. The first six parameters to a function get passed in registers, starting with register rdi (or in this case, edi because the compiler only needs to set 32 bits of the register). If we look right above the call to printf, we see mov $0x498010, %edi. Let’s look at that address:

(gdb) p 0x498010

Hmm, that didn’t do anything helpful — maybe because it’s a memory address. Let’s try examining it as a string:

(gdb) x/s 0x498010

There we go! That’s the first argument to printf. We can also print the value of the second argument (y) using p to see the value that was set just prior to setting register rdi:

(gdb) p $rsi

This strategy of printing or examining function arguments just prior to calling a function should help you a lot when deciphering what mysterious assembly code is doing.

4. Try out some of these tools on a program binary

Run the mystery binary a few times and see what it is doing:

$ ./mystery

The program is asking you for input, but there is really not a lot of information provided to guess the right input, and this executable was not compiled with -g so there is no C code information we can get from it when we run it in gdb.

Let’s see if we can examine the assembly code to see if we can figure out what to enter.

Lets trying running it in gdb and disassembling the code:

$ gdb ./mystery
(gdb) layout asm    # optional: turns on the ASM layout
(gdb) break main
(gdb) run
(gdb) disas         # you only need to do this if you didn't turn on ASM layout

Consider the following things when examining a program like this:

  1. What does main’s control flow look like? Do you recognize any patterns that look like loops or if statements?

  2. What is the state of the registers and memory around function calls?

  3. Are there any constant addresses that might be strings? If so, you can try printing them using x/s.

(gdb) x/s base_addr_of_string

5. Lab 5

Finally, let’s take a look at Lab 5.

6. Handy References