We can walk through the steps taken to build a simple application step by step.
Note that when you type gcc
that actually runs a driver program that hides most of the steps
from you. Under normal circumstances this is exactly what you
want, because the exact commands and options to get a real life
working executable on a real system can be quite complicated and
architecture specific.
We will show the compliation process with the two following
examples. Both are C source files, one defined the
main()
function for the inital
program entry point, and another declares a helper type function.
There is one global variable too, just for illustration.
1 #include <stdio.h> /* We need a prototype so the compiler knows what types function() takes */ 5 int function(char *input); /* Since this is static, we can define it in both hello.c and function.c */ static int i = 100; 10 /* This is a global variable */ int global = 10; int main(void) { 15 /* function() should return the value of global */ int ret = function("Hello, World!"); exit(ret); } 20
1 #include <stdio.h> static int i = 100; 5 /* Declard as extern since defined in hello.c */ extern int global; int function(char *input) 10 { printf("%s\n", input); return global; } 15
All compilers have an option to only execute the first
step of compilation. Usually this is something like
-S
and the output will
generally be put into a file with the same name as the input
file but with a .s
extension.
Thus we can show the first step with gcc
-S
as illustrated in the example below.
1 ianw@lime:~/programs/csbu/wk7/code$ gcc -S hello.c ianw@lime:~/programs/csbu/wk7/code$ gcc -S function.c ianw@lime:~/programs/csbu/wk7/code$ cat function.s 5 .file "function.c" .pred.safe_across_calls p1-p5,p16-p63 .section .sdata,"aw",@progbits .align 4 .type i#, @object 10 .size i#, 4 i: data4 100 .section .rodata .align 8 15 .LC0: stringz "%s\n" .text .align 16 .global function# 20 .proc function# function: .prologue 14, 33 .save ar.pfs, r34 alloc r34 = ar.pfs, 1, 4, 2, 0 25 .vframe r35 mov r35 = r12 adds r12 = -16, r12 mov r36 = r1 .save rp, r33 30 mov r33 = b0 .body ;; st8 [r35] = r32 addl r14 = @ltoffx(.LC0), r1 35 ;; ld8.mov r37 = [r14], .LC0 ld8 r38 = [r35] br.call.sptk.many b0 = printf# mov r1 = r36 40 ;; addl r15 = @ltoffx(global#), r1 ;; ld8.mov r14 = [r15], global# ;; 45 ld4 r14 = [r14] ;; mov r8 = r14 mov ar.pfs = r34 mov b0 = r33 50 .restore sp mov r12 = r35 br.ret.sptk.many b0 ;; .endp function# 55 .ident "GCC: (GNU) 3.3.5 (Debian 1:3.3.5-11)"
The assembly is a little to complex to fully describe, but
you should be able to see where i is defined as a
data4
(i.e. 4 bytes or 32 bits,
the size of an int
), where
function
is defined
(function:
) and a call to
printf()
.
We now have two assembly files ready to be assembled into machine code!
Assembly is a fairly straight forward process. The
assembler is usually called as
and takes arguments in a similar fasion to
gcc
1 ianw@lime:~/programs/csbu/wk7/code$ as -o function.o function.s ianw@lime:~/programs/csbu/wk7/code$ as -o hello.o hello.s ianw@lime:~/programs/csbu/wk7/code$ ls 5 function.c function.o function.s hello.c hello.o hello.s
After assembling we have object code,
which is ready to be linked together into the final executable.
You can usually skip having to use the assembler by hand by
calling the compiler with -c
,
which will directly convert the input file to object code,
putting it in a file with the same prefix but
.o
as an extension.
We can't inspect the object code directly, as it is in a
binary format (in future weeks we will learn about this binary
format). However we can use some tools to inspect the object
files, for example readelf
--symbols
will show us symbols in the object
file.
1 ianw@lime:~/programs/csbu/wk7/code$ readelf --symbols ./hello.o Symbol table '.symtab' contains 15 entries: 5 Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS hello.c 2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 3: 0000000000000000 0 SECTION LOCAL DEFAULT 3 10 4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 5: 0000000000000000 0 SECTION LOCAL DEFAULT 5 6: 0000000000000000 4 OBJECT LOCAL DEFAULT 5 i 7: 0000000000000000 0 SECTION LOCAL DEFAULT 6 8: 0000000000000000 0 SECTION LOCAL DEFAULT 7 15 9: 0000000000000000 0 SECTION LOCAL DEFAULT 8 10: 0000000000000000 0 SECTION LOCAL DEFAULT 10 11: 0000000000000004 4 OBJECT GLOBAL DEFAULT 5 global 12: 0000000000000000 96 FUNC GLOBAL DEFAULT 1 main 13: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND function 20 14: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND exit ianw@lime:~/programs/csbu/wk7/code$ readelf --symbols ./function.o Symbol table '.symtab' contains 14 entries: 25 Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS function.c 2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 3: 0000000000000000 0 SECTION LOCAL DEFAULT 3 30 4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 5: 0000000000000000 0 SECTION LOCAL DEFAULT 5 6: 0000000000000000 4 OBJECT LOCAL DEFAULT 5 i 7: 0000000000000000 0 SECTION LOCAL DEFAULT 6 8: 0000000000000000 0 SECTION LOCAL DEFAULT 7 35 9: 0000000000000000 0 SECTION LOCAL DEFAULT 8 10: 0000000000000000 0 SECTION LOCAL DEFAULT 10 11: 0000000000000000 128 FUNC GLOBAL DEFAULT 1 function 12: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND printf 13: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND global 40
Although the output is quite complicated (again!) you should be able to understand much of it. For example
In the output of
hello.o
have a look at the
symbol with name i
. Notice
how it says it is LOCAL
?
That is because we declared it
static
and as such it has
been flagged as being local to this object file.
In the same output, notice that the
global
variable is defined
as a GLOBAL
, meaning that
it is visible outside this file. Similarly the
main()
function is
externally visable.
Notice that the
function
symbol (for the
call to function()
is left
has UND
or
undefined. This means that it has been
left for the linker to find the address of the
function.
Have a look at the symbols in the
function.c
file and how
they fit into the output.
Actually invoking the linker, called
ld
, is a very complicated
process on a real system (are you sick of hearing this yet?).
This is why we leave the linking process up to
gcc
.
But of course we can spy on what
gcc
is doing under the hood
with the -v
(verbose)
flag.
1 /usr/lib/gcc-lib/ia64-linux/3.3.5/collect2 -static /usr/lib/gcc-lib/ia64-linux/3.3.5/../../../crt1.o /usr/lib/gcc-lib/ia64-linux/3.3.5/../../../crti.o 5 /usr/lib/gcc-lib/ia64-linux/3.3.5/crtbegin.o -L/usr/lib/gcc-lib/ia64-linux/3.3.5 -L/usr/lib/gcc-lib/ia64-linux/3.3.5/../../.. hello.o function.o 10 --start-group -lgcc -lgcc_eh -lunwind -lc 15 --end-group /usr/lib/gcc-lib/ia64-linux/3.3.5/crtend.o /usr/lib/gcc-lib/ia64-linux/3.3.5/../../../crtn.o
The first thing you notice is that a program called collect2 is being called. This is a simple wrapper around ld that is used internally by gcc.
The next thing you notice is object files starting with
crt
being specified to the
linker. These functions are provided by gcc and the system
libraries and contain code required to start the program. In
actuality, the main()
function
is not the first one called when a program runs, but a function
called _start
which is in the
crt
object files. This
function does some generic setup which application programmers
do not need to worry about.
The path heirarchy is quite complicated, but in essence we can see that the final step is to link in some extra object files, namely
crt1.o
: provided
by the system libraries (libc) this object file contains
the _start
function
which is actually the first thing called within the program.
crti.o
: provided
by the system libraries
crtbegin.o
crtsaveres.o
crtend.o
crtn.o
We discuss how these are used to start the program a little later.
Next you can see that we link in our two object files,
hello.o
and
function.o
. After that we
specify some extra libraries with
-l
flags. These libraries are
system specific and required for every program. The major one
is -lc
which brings in the C
library, which has all common functions like
printf()
.
After that we again link in some more system object files which do some cleanup after programs exit.
Although the details are complicated, the concept is straight forward. All the object files will be linked together into a single executable file, ready to run!
We will go into more details about the executable in the short future, but we can do some inspection in a similar fashion to the object files to see what has happened.
1 ianw@lime:~/programs/csbu/wk7/code$ gcc -o program hello.c function.c ianw@lime:~/programs/csbu/wk7/code$ readelf --symbols ./program 5 Symbol table '.dynsym' contains 11 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 6000000000000de0 0 OBJECT GLOBAL DEFAULT ABS _DYNAMIC 2: 0000000000000000 176 FUNC GLOBAL DEFAULT UND printf@GLIBC_2.2 (2) 10 3: 600000000000109c 0 NOTYPE GLOBAL DEFAULT ABS __bss_start 4: 0000000000000000 704 FUNC GLOBAL DEFAULT UND exit@GLIBC_2.2 (2) 5: 600000000000109c 0 NOTYPE GLOBAL DEFAULT ABS _edata 6: 6000000000000fe8 0 OBJECT GLOBAL DEFAULT ABS _GLOBAL_OFFSET_TABLE_ 7: 60000000000010b0 0 NOTYPE GLOBAL DEFAULT ABS _end 8: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses 15 9: 0000000000000000 544 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.2 (2) 10: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__ Symbol table '.symtab' contains 127 entries: Num: Value Size Type Bind Vis Ndx Name 20 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 40000000000001c8 0 SECTION LOCAL DEFAULT 1 2: 40000000000001e0 0 SECTION LOCAL DEFAULT 2 3: 4000000000000200 0 SECTION LOCAL DEFAULT 3 4: 4000000000000240 0 SECTION LOCAL DEFAULT 4 25 5: 4000000000000348 0 SECTION LOCAL DEFAULT 5 6: 40000000000003d8 0 SECTION LOCAL DEFAULT 6 7: 40000000000003f0 0 SECTION LOCAL DEFAULT 7 8: 4000000000000410 0 SECTION LOCAL DEFAULT 8 9: 4000000000000440 0 SECTION LOCAL DEFAULT 9 30 10: 40000000000004a0 0 SECTION LOCAL DEFAULT 10 11: 40000000000004e0 0 SECTION LOCAL DEFAULT 11 12: 40000000000005e0 0 SECTION LOCAL DEFAULT 12 13: 4000000000000b00 0 SECTION LOCAL DEFAULT 13 14: 4000000000000b40 0 SECTION LOCAL DEFAULT 14 35 15: 4000000000000b60 0 SECTION LOCAL DEFAULT 15 16: 4000000000000bd0 0 SECTION LOCAL DEFAULT 16 17: 4000000000000ce0 0 SECTION LOCAL DEFAULT 17 18: 6000000000000db8 0 SECTION LOCAL DEFAULT 18 19: 6000000000000dd0 0 SECTION LOCAL DEFAULT 19 40 20: 6000000000000dd8 0 SECTION LOCAL DEFAULT 20 21: 6000000000000de0 0 SECTION LOCAL DEFAULT 21 22: 6000000000000fc0 0 SECTION LOCAL DEFAULT 22 23: 6000000000000fd0 0 SECTION LOCAL DEFAULT 23 24: 6000000000000fe0 0 SECTION LOCAL DEFAULT 24 45 25: 6000000000000fe8 0 SECTION LOCAL DEFAULT 25 26: 6000000000001040 0 SECTION LOCAL DEFAULT 26 27: 6000000000001080 0 SECTION LOCAL DEFAULT 27 28: 60000000000010a0 0 SECTION LOCAL DEFAULT 28 29: 60000000000010a8 0 SECTION LOCAL DEFAULT 29 50 30: 0000000000000000 0 SECTION LOCAL DEFAULT 30 31: 0000000000000000 0 SECTION LOCAL DEFAULT 31 32: 0000000000000000 0 SECTION LOCAL DEFAULT 32 33: 0000000000000000 0 SECTION LOCAL DEFAULT 33 34: 0000000000000000 0 SECTION LOCAL DEFAULT 34 55 35: 0000000000000000 0 SECTION LOCAL DEFAULT 35 36: 0000000000000000 0 SECTION LOCAL DEFAULT 36 37: 0000000000000000 0 SECTION LOCAL DEFAULT 37 38: 0000000000000000 0 SECTION LOCAL DEFAULT 38 39: 0000000000000000 0 SECTION LOCAL DEFAULT 39 60 40: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 41: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 42: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 43: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 44: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 65 45: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 46: 0000000000000000 0 FILE LOCAL DEFAULT ABS <command line> 47: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 48: 0000000000000000 0 FILE LOCAL DEFAULT ABS <command line> 49: 0000000000000000 0 FILE LOCAL DEFAULT ABS <built-in> 70 50: 0000000000000000 0 FILE LOCAL DEFAULT ABS abi-note.S 51: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 52: 0000000000000000 0 FILE LOCAL DEFAULT ABS abi-note.S 53: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 54: 0000000000000000 0 FILE LOCAL DEFAULT ABS abi-note.S 75 55: 0000000000000000 0 FILE LOCAL DEFAULT ABS <command line> 56: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 57: 0000000000000000 0 FILE LOCAL DEFAULT ABS <command line> 58: 0000000000000000 0 FILE LOCAL DEFAULT ABS <built-in> 59: 0000000000000000 0 FILE LOCAL DEFAULT ABS abi-note.S 80 60: 0000000000000000 0 FILE LOCAL DEFAULT ABS init.c 61: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 62: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 63: 0000000000000000 0 FILE LOCAL DEFAULT ABS initfini.c 64: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 85 65: 0000000000000000 0 FILE LOCAL DEFAULT ABS <command line> 66: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 67: 0000000000000000 0 FILE LOCAL DEFAULT ABS <command line> 68: 0000000000000000 0 FILE LOCAL DEFAULT ABS <built-in> 69: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 90 70: 4000000000000670 128 FUNC LOCAL DEFAULT 12 gmon_initializer 71: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 72: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 73: 0000000000000000 0 FILE LOCAL DEFAULT ABS initfini.c 74: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 95 75: 0000000000000000 0 FILE LOCAL DEFAULT ABS <command line> 76: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 77: 0000000000000000 0 FILE LOCAL DEFAULT ABS <command line> 78: 0000000000000000 0 FILE LOCAL DEFAULT ABS <built-in> 79: 0000000000000000 0 FILE LOCAL DEFAULT ABS /build/buildd/glibc-2.3.2 100 80: 0000000000000000 0 FILE LOCAL DEFAULT ABS auto-host.h 81: 0000000000000000 0 FILE LOCAL DEFAULT ABS <command line> 82: 0000000000000000 0 FILE LOCAL DEFAULT ABS <built-in> 83: 6000000000000fc0 0 NOTYPE LOCAL DEFAULT 22 __CTOR_LIST__ 84: 6000000000000fd0 0 NOTYPE LOCAL DEFAULT 23 __DTOR_LIST__ 105 85: 6000000000000fe0 0 NOTYPE LOCAL DEFAULT 24 __JCR_LIST__ 86: 6000000000001088 8 OBJECT LOCAL DEFAULT 27 dtor_ptr 87: 40000000000006f0 128 FUNC LOCAL DEFAULT 12 __do_global_dtors_aux 88: 4000000000000770 128 FUNC LOCAL DEFAULT 12 __do_jv_register_classes 89: 0000000000000000 0 FILE LOCAL DEFAULT ABS hello.c 110 90: 6000000000001090 4 OBJECT LOCAL DEFAULT 27 i 91: 0000000000000000 0 FILE LOCAL DEFAULT ABS function.c 92: 6000000000001098 4 OBJECT LOCAL DEFAULT 27 i 93: 0000000000000000 0 FILE LOCAL DEFAULT ABS auto-host.h 94: 0000000000000000 0 FILE LOCAL DEFAULT ABS <command line> 115 95: 0000000000000000 0 FILE LOCAL DEFAULT ABS <built-in> 96: 6000000000000fc8 0 NOTYPE LOCAL DEFAULT 22 __CTOR_END__ 97: 6000000000000fd8 0 NOTYPE LOCAL DEFAULT 23 __DTOR_END__ 98: 6000000000000fe0 0 NOTYPE LOCAL DEFAULT 24 __JCR_END__ 99: 6000000000000de0 0 OBJECT GLOBAL DEFAULT ABS _DYNAMIC 120 100: 4000000000000a70 144 FUNC GLOBAL HIDDEN 12 __do_global_ctors_aux 101: 6000000000000dd8 0 NOTYPE GLOBAL DEFAULT ABS __fini_array_end 102: 60000000000010a8 8 OBJECT GLOBAL HIDDEN 29 __dso_handle 103: 40000000000009a0 208 FUNC GLOBAL DEFAULT 12 __libc_csu_fini 104: 0000000000000000 176 FUNC GLOBAL DEFAULT UND printf@@GLIBC_2.2 125 105: 40000000000004a0 32 FUNC GLOBAL DEFAULT 10 _init 106: 4000000000000850 128 FUNC GLOBAL DEFAULT 12 function 107: 40000000000005e0 144 FUNC GLOBAL DEFAULT 12 _start 108: 6000000000001094 4 OBJECT GLOBAL DEFAULT 27 global 109: 6000000000000dd0 0 NOTYPE GLOBAL DEFAULT ABS __fini_array_start 130 110: 40000000000008d0 208 FUNC GLOBAL DEFAULT 12 __libc_csu_init 111: 600000000000109c 0 NOTYPE GLOBAL DEFAULT ABS __bss_start 112: 40000000000007f0 96 FUNC GLOBAL DEFAULT 12 main 113: 6000000000000dd0 0 NOTYPE GLOBAL DEFAULT ABS __init_array_end 114: 6000000000000dd8 0 NOTYPE WEAK DEFAULT 20 data_start 135 115: 4000000000000b00 32 FUNC GLOBAL DEFAULT 13 _fini 116: 0000000000000000 704 FUNC GLOBAL DEFAULT UND exit@@GLIBC_2.2 117: 600000000000109c 0 NOTYPE GLOBAL DEFAULT ABS _edata 118: 6000000000000fe8 0 OBJECT GLOBAL DEFAULT ABS _GLOBAL_OFFSET_TABLE_ 119: 60000000000010b0 0 NOTYPE GLOBAL DEFAULT ABS _end 140 120: 6000000000000db8 0 NOTYPE GLOBAL DEFAULT ABS __init_array_start 121: 6000000000001080 4 OBJECT GLOBAL DEFAULT 27 _IO_stdin_used 122: 60000000000010a0 8 OBJECT GLOBAL DEFAULT 28 __libc_ia64_register_back 123: 6000000000000dd8 0 NOTYPE GLOBAL DEFAULT 20 __data_start 124: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses 145 125: 0000000000000000 544 FUNC GLOBAL DEFAULT UND __libc_start_main@@GLIBC_ 126: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
Some things to note
Note I built the executable the "easy" way!
See there are two symbol tables; the
dynsym
and
symtab
ones. We explain
how the dynsym
symbols work
soon, but notice that some of them are
versioned with an
@
symbol.
Note the many symbols that have been included from the
extra object files. Many of them start with
__
to avoid clashing with
any names the programmer might choose. Read through and
pick out the symbols we mentioned before from the object
files and see if they have changed in any way.