Often in a large program, you will separate out code into multiple files to keep related functions together. Each of these files can be compiled into object code: but your final goal is to create a single executable! There needs to be some way combining each of these object files into a single executable. We call this linking.
Note that even if your program does fit in one file it still
needs to be linked against certain system libraries to operate
correctly. For example, the
printf
call is kept in a library
which must be combined with your executable to work. So although
you do not explicilty have to worry about linking in this case,
there is most certainly still a linking process happening to
create your executable.
In the following sections we explain some terms essential to understanding linking.
Variables and functions all have names in source code
which we refer to them by. One way of thinking of a statement
declaring a variable int a
is
that you are telling the compiler "set aside some memory of
sizeof(int)
and from now on
when I use a
it will refer to
this allocated memory. Similarly a function says "store this
code in memory, and when I call
function()
jump to and execute
this code".
In this case, we call a
and function
symbols since they are a symbolic
representation of an area of memory.
Symbols help humans to understand programming. You could
say that the primary job of the compilation process is to remove
symbols -- the processor doesn't know what
a
represents, all it knows is
that it has some data at a particular memory address. The
compilation process needs to convert a +=
2
to something like "increment the value in
memory at 0xABCDE
by 2.
In some C programs, you may have seen the terms
static
and
extern
used with variables.
These modifiers can effect what we call the visibility of
symbols.
Imagine you have split up your program in two files, but some functions need to share a variable. You only want one definition (i.e. memory location) of the shared variable (otherwise it wouldn't be shared!), but both files need to reference it.
To enable this, we declare the variable in one file, and
then in the other file declare a variable of the same name but
with the prefix extern
.
extern
stands for
external and to a human means that this
variable is declared somewhere else.
What extern
says to a
compiler is that it should not allocate any space in memory for
this variable, and leave this symbol in the object code where it
will be fixed up later. The compiler can not possibly know
where the symbol is actually defined but the
linkerdoes, since it is it's job to look at
all object files together and combine them into a single
executable. So the linker will see the symbol left over in the
second file, and say "I've seen that symbol before in file 1,
and I know that it refers to memory location
0x12345
". Thus it can modify
the symbol value to be the memory value of the variable in the
first file.
static
is almost the
opposite of extern
. It places
restrictions on the visiblity of the symbol it modifies. If you
declare a variable with static
that says to the compiler "don't leave any symbols for this in
the object code". This means that when the linker is linking
together object files it will never see that symbol (and so
can't make that "I've seen this before!" connection).
static
is good for separation
and reducing conflicts -- by declaring a variable
static
you can reuse the
variable name in other files and not end up with symbol clashes.
We say we are restricting the visiblity of
the symbol, because we are not allowing the linker to see it.
Contrast this with a more visible symbol (one not declared with
static
) which can be seen by
the linker.
Thus the linking process is really two steps; combining all object files into one exectuable file and then going through each object file to resolve any symbols. This usually requires two passes; one to read all the symbol definitions and take note of unresolved symbols and a second to fix up all those unresolved symbols to the right place.
The final executable should end up with no unresolved symbols; the linker will fail with an error if there are any.[22]
[22] We call this static linking. Dynamic linking is a similar concept done at executable runtime, and is described a little later on.