The different phases examined below are:
More information and examples using some of these tools to examine .o and a.out files (hexdump, strings, objdump, gdb).
// simple.c: #include <unistd.h> #define MAX 10 int foo(int y); main() { int x, i; char buf[10]; for(i=0; i < MAX; i++) { x = foo(i); // a crazy way to print to stdout sprintf(buf, "%d", x); write(0, buf, strlen(buf)); buf[0] = '\n'; write(0, buf, 1); } } int foo(int y) { return y*y; }
# the C source file: # $ file simple.c simple.c: ASCII C program text # the object file: produces relocatable machine code # ELF: stands for Executable and Linking Format, and is the format for # .o, a.out, and .so files produced by gcc. The format is necessary # so that programs that process these files, and the OS, know how # to find different parts of the code and data in this file # Intel 80386: is the target architecture # not stripped: means that this .o file includes a symbol table # $ file simple.o simple.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped # the executable file: # $ file simple simple: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.8, dynamically linked (uses shared libs), not stripped # a shared object file (dynamically linked library): # $ file /lib/libc-2.7.so /lib/libc-2.7.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.8, stripped
# run cpp: $ cpp simple.c | less # run just the preprocessor part of gcc: $gcc -E simple.c | less # look at the output to see what happens to #includes and #defines from simple.chere is a very detailed reference about the pre-processor.
This phase does the bulk of the compilation work, translating a program written in the C high-level programming language to low-level instructions for a specific instruction set architecture (ISA). A processor microarchitecture that implments this ISA can execute these instructions. For example, both Intel and AMD have processors that can execute the IA32 ISA.
Use the -S option to gcc to produce a .s file:
$ gcc -S simple.cThis creates a text file, simple.s, of the C to assembly code translation. The simple.s file can be viewed using a text editor:
$ vim simple.s
Use the -c option to gcc to produce a .o file:
$ gcc -c simple.cYou can see the assembly code in simple.o using either objdump or gdb (all addresses are listed in hexidecimal (base 16)):
$ objdump -d simple.o ... 00000000: 0: 8d 4c 24 04 lea 0x4(%esp),%ecx 4: 83 e4 f0 and $0xfffffff0,%esp 7: ff 71 fc pushl -0x4(%ecx) a: 55 push %ebp b: 89 e5 mov %esp,%ebp d: 51 push %ecx e: 83 ec 34 sub $0x34,%esp 11: 65 a1 14 00 00 00 mov %gs:0x14,%eax 17: 89 45 f8 mov %eax,-0x8(%ebp) 1a: 31 c0 xor %eax,%eax ...
$ gdb simple.o (gdb) disass main (gdb) disass foo (gdb) quit
# create an executable file from simple.o and some standard libraries that gcc automatically links in: gcc -o simple simple.o
$ objdump -d simple ... 08048434: 8048434: 8d 4c 24 04 lea 0x4(%esp),%ecx 8048438: 83 e4 f0 and $0xfffffff0,%esp 804843b: ff 71 fc pushl -0x4(%ecx) 804843e: 55 push %ebp 804843f: 89 e5 mov %esp,%ebp 8048441: 51 push %ecx 8048442: 83 ec 34 sub $0x34,%esp 8048445: 65 a1 14 00 00 00 mov %gs:0x14,%eax 804844b: 89 45 f8 mov %eax,-0x8(%ebp) 804844e: 31 c0 xor %eax,%eax 8048450: c7 45 e4 00 00 00 00 movl $0x0,-0x1c(%ebp) 8048457: eb 6d jmp 80484c6 ...
$ nm --format sysv simple # system V format is easier to read than bsd format which is the default Name Value Class Type Size Line Section ... foo |080484e6| T | FUNC|0000000c| |.text frame_dummy |08048410| t | FUNC| | |.text main |08048434| T | FUNC|000000b2| |.text p.5841 |080496dc| d | OBJECT| | |.data sprintf@@GLIBC_2.0 | | U | FUNC|00000034| |*UND* strlen@@GLIBC_2.0 | | U | FUNC|000000af| |*UND* write@@GLIBC_2.0 | | U | FUNC|00000076| |*UND* Section *UND* means that these symbols are from .so files that will be loaded at run-time, Section .text means that these are in the .text section of the executable file (the code section). Class T and t are functions and D and d are data (global variables), R is read-only data, the Value column gives the address of the function or data.
ldd simple linux-gate.so.1 => (0xb7ef2000) libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7d8a000) /lib/ld-linux.so.2 (0xb7ef3000)Use objdump -T to see dynamic symbol table entries from a .so file (here we are just finding the one for write):
$ objdump -T /lib/libc.so.6 | grep write 000b6ab0 w DF .text 00000076 GLIBC_2.0 write
If you do objdump -d simple you can see that the call to write in main is a call into the .plt section of the a.out (which contains the PLT):
08048434The jmp *0x80496c4 instruction is jumping to a value stored in the Global Offset Table (GOT) at address 0x80496b0. The value in the GOT is loaded at runtime by the dynamic linker.: ... 804849e: e8 c9 fe ff ff call 804836c Disassembly of section .plt: ... 0804836c : 804836c: ff 25 c4 96 04 08 jmp *0x80496c4 8048372: 68 10 00 00 00 push $0x10 8048377: e9 c0 ff ff ff jmp 804833c <_init+0x30>
To see what this value is set to at runtime, disassemble instructions in gdb:
$ gdb simple (gdb) break *0x0804849e (gdb) cont (gdb) disass main ... 0x0804849e: call 0x804836c ...
(gdb) disass 0x804836c Dump of assembler code for function write@plt: 0x0804836c: jmp *0x80496c4 0x08048372 : push $0x10 0x08048377 : jmp 0x804833c <_init+48>
(gdb) disass 0x80496c4 Dump of assembler code for function _GLOBAL_OFFSET_TABLE_: 0x080496b0 <_GLOBAL_OFFSET_TABLE_+0>: fcoml 0x66680804(%ebp) 0x080496b6 <_GLOBAL_OFFSET_TABLE_+6>: icebp 0x080496b7 <_GLOBAL_OFFSET_TABLE_+7>: mov $0x30,%bh 0x080496b9 <_GLOBAL_OFFSET_TABLE_+9>: fdiv %st,%st(0) 0x080496bb <_GLOBAL_OFFSET_TABLE_+11>: mov $0xb0,%bh 0x080496bd <_GLOBAL_OFFSET_TABLE_+13>: xchg %eax,%ebx 0x080496be <_GLOBAL_OFFSET_TABLE_+14>: fnsave 0x8048362(%edi) 0x080496c4 <_GLOBAL_OFFSET_TABLE_+20>: rclb -0x7c8f481b(%edx) 0x080496ca <_GLOBAL_OFFSET_TABLE_+26>: fidivl -0x481fbdb0(%edi) 0x080496d0 <_GLOBAL_OFFSET_TABLE_+32>: mov %al,0x80483
(gdb) print/x *0x80496c4 $2 = 0xb7e592d0
(gdb) disass 0xb7e592d0 Dump of assembler code for function write: 0xb7e592d0: cmpl $0x0,%gs:0xc 0xb7e592d8 : jne 0xb7e592fc 0xb7e592da : push %ebx 0xb7e592db : mov 0x10(%esp),%edx 0xb7e592df : mov 0xc(%esp),%ecx 0xb7e592e3 : mov 0x8(%esp),%ebx ...
Here is some more information about readelf, objdump, and other tools: