We know that for the operating system code is considered read only, and separate from data. It seems logical then that if programs can not modify code and have large amounts of common code, instead of replicating it for every executable it should be shared between many executables.
With virtual memory this can be easily done. The physical pages of memory the library code is loaded into can be easily referenced by any number of virtual pages in any number of address spaces. So while you only have one physical copy of the library code in system memory, every process can have access to that library code at any virtual address it likes.
Thus people quickly came up with the idea of a shared library which, as the name suggests, is shared by multiple executables. Each executable contains a reference essentially saying "I need library foo". When the program is loaded, it is up to the system to either check if some other program has already loaded the code for library foo into memory, and thus share it by mapping pages into the executable for that physical memory, or otherwise load the library into memory for the executable.
This process is called dynamic linking because it does part of the linking process "on the fly" as programs are executed in the system.
Libraries are very much like a program that never gets started. They have code and data sections (functions and variables) just like every executable; but no where to start running. They just provide a library of functions for developers to call.
Thus ELF can represent a dynamic library just as it does an executable. There are some fundamental differences, such as there is no pointer to where execution should start, but all shared libraries are just ELF objects like any other executable.
The ELF header has two mutually exclusive flags,
ET_EXEC
and
ET_DYN
to mark an ELF file as
either an executable or a shared object file.
When you compile your program that uses a dynamic library, object files are left with references to the library functions just as for any other external reference.
You need to include the header for
the library so that the compiler knows the specific types of
the functions you are calling. Note the compiler only needs
to know the types associated with a function (such as, it
takes an int
and returns a
char *
) so that it can
correctly allocate space for the function call.[29]
Even though the dynamic linker does a lot of the work for shared libraries, the traditional linker still has a role to play in creating the executable.
The traditional linker needs to leave a pointer in the executable so that the dynamic linker knows what library will satisfy the dependencies at runtime.
The dynamic
section of
the executable requires a
NEEDED
entry for each shared
library that the executable depends on.
Again, we can inspect these fields with the
readelf
program. Below we
have a look at a very standard binary,
/bin/ls
1 $ readelf --dynamic /bin/ls Dynamic segment at offset 0x22f78 contains 27 entries: 5 Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [librt.so.1] 0x0000000000000001 (NEEDED) Shared library: [libacl.so.1] 0x0000000000000001 (NEEDED) Shared library: [libc.so.6.1] 0x000000000000000c (INIT) 0x4000000000001e30 10 ... snip ...
You can see that it specifies three libraries. The most
common library shared by most, if not all, programs on the
system is libc
. There are
also some other libraries that the program needs to run
correctly.
Reading the ELF file directly is sometimes useful, but
the usual way to inspect a dynamically linked executable is
via ldd
.
ldd
"walks" the dependencies
of libraries for you; that is if a library depends on another
library, it will show it to you.
1 $ ldd /bin/ls librt.so.1 => /lib/tls/librt.so.1 (0x2000000000058000) libacl.so.1 => /lib/libacl.so.1 (0x2000000000078000) 5 libc.so.6.1 => /lib/tls/libc.so.6.1 (0x2000000000098000) libpthread.so.0 => /lib/tls/libpthread.so.0 (0x20000000002e0000) /lib/ld-linux-ia64.so.2 => /lib/ld-linux-ia64.so.2 (0x2000000000000000) libattr.so.1 => /lib/libattr.so.1 (0x2000000000310000) $ readelf --dynamic /lib/librt.so.1 10 Dynamic segment at offset 0xd600 contains 30 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libc.so.6.1] 0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0] 15 ... snip ...
We can see above that
libpthread
has been required
from somewhere. If we do a little digging, we can see that
the requirement comes from
librt
.
[29] This has not always been the case with the C standard.
Previously, compilers would assume that any function it did
not know about returned an
int
. On a 32 bit system, the
size of a pointer is the same size as an
int
, so there was no problem.
However, with a 64 bit system, the size of a pointer is
generally twice the size of an
int
so if the function
actually returns a pointer, its value will be destroyed. This
is clearly not acceptable, as the pointer will thus not point
to valid memory. The C99 standard has changed such that you
are required to specify the types of included
functions.