CS31 Weekly Lab Warm-Up: tools for examining binary files

1. Goals for this week

Learn tools for examining binary files (gdb and ddd in particular)
Practice examining a binary program file to discover what it is doing
Introduction to Lab 5: lab assignment.

2. Handy References

gdb for IA32 assembly debugging IA32 gdb debugging guide
GDB for Assembly (from the gdb Guide). (assembly debugging and x command)
Sections 3.2 and 3.5 of textbook (assembly debugging, print, display, info and x commands)
IA32 Reference Sheet
Tools for examining phases of compiling and running C programs
Tools for examinging .o and executable files

3. Getting your code

Your warm-up code for this week is pused to your Lab5 repo. Follow the instructions below to get your code:

cd ~/cs31/labs/
cd Lab5-user1-user2/warmup-code/
git pull
ls
# you should see the following files in
# your warm-up code folder.
 Makefile  README  mystery*  simplefuncs.c

You are highly encouraged to make your own copy of the warm-up code folder as follows:
```
cd Lab5-user1-user2/
cp -r warmup-code/ warmup-code-yourusername/
```
You can now modify your own copy of the warm-up code.
Once you are donw with your changes, you can git add, commit and push. If you are the first person in your partnership to push, the rest of your team members should first git pull before they perform an add, commit and push.

4. The leal instruction

Load effective address: leal S,D # D-→S, where D must be a register, and S is a Memory operand. It’s often used to implement C’s address of (&) operator.

leal looks like a mov instruction, but it does not access Memory. Instead, it takes advantage of the addressing circuitry and uses it to do arithmetic (as opposed to generating multiple arithmetic instructions to do arithmetic).

For example, suppose you write C code that looks like the following:

int *values = malloc(15 * sizeof(int));
if (values == NULL) {
    //Error handling
}

int *index12 = &values[12];

When this example is converted to assembly code by the compiler, values (the memory block’s base address) will be assigned to a register.
Suppose it’s put into %eax. Let’s say the compiler wants to preserve %eax as the base address, but it also wants to store index12, the address of a bucket from the middle of the memory block, in %ecx.

One way that the compiler might compute &values[12] is to use a leal:

leal 48(%eax), %ecx   # compute an address equal to the value in eax + 48 and store the result in ecx

leal instruction with parenthesis

The key thing about interpreting the leal instruction is that it violates our rule of putting () around a register. For just this instruction, the () does not perform a dereference. Instead, the instruction is computing an address using the memory address hardware (e.g., with a displacement + register) and storing the result of that address computation.

leal appears a lot in compiler generated code. The compiler sometimes abuses leal to perform basic arithmetic, since it’s another way to perform an add or subtract.

So… if it’s just performing basic add/subtract arithmetic, why use leal then? The answer is that it cuts down on the number of instructions you need. In the example above, there’s no other simple way to express, "add 48 to eax and store the result in ecx". Here’s alternative, but it’s twice as many instructions!

# Alternative
movl $48, %ecx   # Overwrite ecx by setting it to the constant value 48
addl %eax, %ecx  # Add eax and ecx, store the result in ecx

5. Tools for examining binary files

Some tools for examining binary files:

strings dumps all the strings in a binary file:

  strings simplefuncs

objdump -d to see the instructions and their encodings in memory:
```
  objdump -d  simplefuncs
```
gdb (and ddd): for debugging programs at the assembly code level and examining the state of CPU registers and memory as the program runs. These will be the most useful tools for the next lab assignment.

5.1. gdb (and ddd) for debugging at the assembly code level

With gdb you can debug and trace through a program execution at the assembly code level. This includes executing individual IA32 instructions, examine register values, and disassembling functions.

Let’s try it out again with the simplefuncs program, but first do 'make clean' then a 'make' to rebuild an IA32 version of the simpleops executable file.

First, let’s open up simplefuncs.c in vim. Then, let’s try some things out in gdb:

gdb simplefuncs
(gdb) break main
(gdb) break func1
(gdb) run

In gdb you can disassemble code using the disass command:

gdb simplefuncs
(gdb) disass main

You can set a break point at a specific instruction:

(gdb) break *0x08048477 # set breakpoint at specified address

And you can step or next at the instruction level using ni or si (si steps into function calls, ni skips over them):

(gdb) ni	  # execute the next instruction then gdb gets control again
(gdb) ni
(gdb) ni
(gdb) ni
(gdb) ni
(gdb) disass
(gdb) cont      # continue to next break point

Now we are at the call to func1, let’s step into this function using si (we also have a breakpoint at this function, let’s see when it is hit):

(gdb) si	  # step into instructions in the called function (func1)
(gdb) disass
(gdb) ni
(gdb) where
(gdb) disass
(gdb) cont

The difference between si and ni is shows up in what each does on a call instruction. si gives gdb control again at instructions at the beginning of the called function. ni gives gdb control again at the instruction immediately after the call instruction (the instruction at the return address). si "steps into" the called function, "ni" lets the called function code continue, and only after the function returns does gdb get control again.

You can print out the values of individual registers like this:

(gdb) print $eax

Or the memory contents at a given address, providing either the absolute numeric address or its value stored in registers:

(gdb) p *(int *)($ebp + 8)
(gdb) x      $ebp + 8
(gdb) x/wd   $ebp + 8   # x/d display as an int (4-byte) decimal value

You can also view all register values:

(gdb) info registers

You can also use the display command to automatically display values each time a breakpoint is reached:

(gdb) display $eax
(gdb) display $edx

You can use the examine command (x) to display the contents of a memory location. The memory address operand to (x) can be specified as the name of the register storing the address value or as an absolute memory address value. Here are some examples (x is shorthand for examine, and p is shorthand for the print command):

x $esp-0x8              # see what p and x display for the same value
p $esp-0x8

p *(int *)($ebp-0x8)    # here is how to print value at memory location

x $ebp-0x8              # or a much easier way using x

# here is an example of examining the contents at a memory location
# specifying the address in two different ways (the exact address
# value in the second depends on what $esp - 0x1c is, it can vary run to run)
x $esp + 0x1c
x 0xffffd2fc

The examine command also takes formatting options to tell it how to interpret the memory at the address:

x/wd $ebp-0x8    # examine memory at address ($ebp-8) as an int in decimal
                 # w: word size (32 bit on IA32)  d: signed decimal
x/wx $ebp-0x8    # examine memory at address ($ebp-8) as an 4-byte value in hex
x/s $ebp-0x8     # examine memory at address ($ebp-8) as a string

Examine’s formatting is sticky, which means that its last format specification is the one used for subsequent calls. To change it, explicitly specify an option again. This is different from print, which always defaults to int.

x/wd $ebp-0x8    # examine memory at address ($ebp-8) as an int in decimal
x   $ebp-0xc     # examine memory at address ($ebp-0xc) as an int (sticky formatting)

The sticky formating also applies to the size of value stored at the address (i.e. is an an address of a 1 byte value, a 2 byte value, or a 4 byte). This "size stickyness" can result in some seemly strange behavior when switching between formatting, and it sometimes requires specifying the size in the format options to x to fix (e.g. x/wd in example below).

x/wd $ebp-0x8    # examine memory at address ($ebp-8) as a 4-byte int in decimal
x/s $ebp-0x8     # examine memory at address ($ebp-8) as a string
x/d $ebp-0x8     # examine memory at address ($ebp-8) as an 1 byte decimal
x/wd $ebp-0x8    # examine memory at address ($ebp-8) as an int
                 # NEED to specify /wd to say interpret this as addr of a 4-byte
                 # word rather than to a 1-byte (x/s set it to 1 byte address)

Becuase of this behavior, we recommend that you always specify the byte-width (w for 4-bytes) when you specify int or hex formatting for an int or unsigned int value: x/wd or x/wx

For more information about the x command see the IA32 debugging links in Section 2.

5.1.1. ddd

We are going to try running this in ddd instead of gdb, because ddd has a nicer interface for viewing assembly, registers, and stepping through program execution:

ddd simplefuncs

The gdb prompt is in the bottom window. There are also menu options and buttons for gdb commands, but I find using the gdb prompt at the bottom easier to use.

Choose View→Machine Code Window to view the IA32 assembly code.

You can view the register values as the program runs (choose Status->Registers to open the register window).

For more information see the IA32 debugging links in Section 2

6. Try out some of these tool on a program binary

Run the mystery binary a few times and see what it is doing:

./mystery

there is really not a lot of information to guess the right input, and this executable was not compiled with -g so there is no C code information we can get from it when we run it in gdb.

Let’s see if we can examine the assembly code to see if we can figure out what to enter.

Lets trying running in ddd and disassemble some code

ddd ./mystery
(gdb) break main
(gdb) run
(gdb) disass

Let’s consider some questions about this program:

what does main control flow look like?
let’s add some break points around function calls and in functions
let’s examine some state around functions
we can print out strings using x/s

(gdb) x/s base_addr_of_string