Libraries may contain many functions, and a program may end up including many libraries to get its work done. A program may only use one or two functions from each library of the many available, and depending on the run-time path through the code may use some functions and not others.
As we have seen, the process of dynamic linking is a fairly computationally intensive one, since it involves looking up and searching through many tables. Anything that can be done to reduce the overheads will increase performance.
The Procedure Lookup Table (PLT) facilitates what is called lazy binding in programs. Binding is synonymous with the fix-up process described above for variables located in the GOT. When an entry has been "fixed-up" it is said to be "bound" to its real address.
As we mentioned, sometimes a program will include a function from a library but never actually call that function, depending on user input. The process of binding this function is quite intensive, involving loading code, searching through tables and writing memory. To go through the process of binding a function that is not used is simply a waste of time.
Lazy binding defers this expense until the actual function is called by using a PLT.
Each library function has an entry in the PLT, which initially points to some special dummy code. When the program calls the function, it actually calls the PLT entry (in the same was as variables are referenced through the GOT).
This dummy function will load a few parameters that need to be passed to the dynamic linker for it to resolve the function and then call into a special lookup function of the dynamic linker. The dynamic linker finds the real address of the function, and writes that location into the calling binary over the top of the dummy function call.
Thus, the next time the function is called the address can be loaded without having to go back into the dynamic loader again. If a function is never called, then the PLT entry will never be modified but there will be no runtime overhead.
Things start to get a bit hairy here! If nothing else, you should begin to appreciate that there is a fair bit of work in resolving a dynamic symbol!
Let us consider the simple "hello World" application.
This will only make one library call to
printf
to output the
string to the user.
1 $ cat hello.c #include <stdio.h> 5 int main(void) { printf("Hello, World!\n"); return 0; } 10 $ gcc -o hello hello.c $ readelf --relocs ./hello 15 Relocation section '.rela.dyn' at offset 0x3f0 contains 2 entries: Offset Info Type Sym. Value Sym. Name + Addend 6000000000000ed8 000700000047 R_IA64_FPTR64LSB 0000000000000000 _Jv_RegisterClasses + 0 6000000000000ee0 000900000047 R_IA64_FPTR64LSB 0000000000000000 __gmon_start__ + 0 20 Relocation section '.rela.IA_64.pltoff' at offset 0x420 contains 3 entries: Offset Info Type Sym. Value Sym. Name + Addend 6000000000000f10 000200000081 R_IA64_IPLTLSB 0000000000000000 printf + 0 6000000000000f20 000800000081 R_IA64_IPLTLSB 0000000000000000 __libc_start_main + 0 6000000000000f30 000900000081 R_IA64_IPLTLSB 0000000000000000 __gmon_start__ + 0 25
We can see above that we have a
R_IA64_IPLTLSB
relocation for
our printf
symbol. This is
saying "put the address of symbol printf into memory address
0x6000000000000f10". We have to start digging deeper to find
the exact procedure that gets us the function.
Below we have a look at the disassembly of the
main()
function of the
program.
1 4000000000000790 <main>: 4000000000000790: 00 08 15 08 80 05 [MII] alloc r33=ar.pfs,5,4,0 4000000000000796: 20 02 30 00 42 60 mov r34=r12 5 400000000000079c: 04 08 00 84 mov r35=r1 40000000000007a0: 01 00 00 00 01 00 [MII] nop.m 0x0 40000000000007a6: 00 02 00 62 00 c0 mov r32=b0 40000000000007ac: 81 0c 00 90 addl r14=72,r1;; 40000000000007b0: 1c 20 01 1c 18 10 [MFB] ld8 r36=[r14] 10 40000000000007b6: 00 00 00 02 00 00 nop.f 0x0 40000000000007bc: 78 fd ff 58 br.call.sptk.many b0=4000000000000520 <_init+0xb0> 40000000000007c0: 02 08 00 46 00 21 [MII] mov r1=r35 40000000000007c6: e0 00 00 00 42 00 mov r14=r0;; 40000000000007cc: 01 70 00 84 mov r8=r14 15 40000000000007d0: 00 00 00 00 01 00 [MII] nop.m 0x0 40000000000007d6: 00 08 01 55 00 00 mov.i ar.pfs=r33 40000000000007dc: 00 0a 00 07 mov b0=r32 40000000000007e0: 1d 60 00 44 00 21 [MFB] mov r12=r34 40000000000007e6: 00 00 00 02 00 80 nop.f 0x0 20 40000000000007ec: 08 00 84 00 br.ret.sptk.many b0;;
The call to 0x4000000000000520 must be us calling the
printf
function. We can find
out where this is by looking at the sections with
readelf
.
1 $ readelf --sections ./hello There are 40 section headers, starting at offset 0x25c0: 5 Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 10 ... [11] .plt PROGBITS 40000000000004c0 000004c0 00000000000000c0 0000000000000000 AX 0 0 32 [12] .text PROGBITS 4000000000000580 00000580 00000000000004a0 0000000000000000 AX 0 0 32 15 [13] .fini PROGBITS 4000000000000a20 00000a20 0000000000000040 0000000000000000 AX 0 0 16 [14] .rodata PROGBITS 4000000000000a60 00000a60 000000000000000f 0000000000000000 A 0 0 8 [15] .opd PROGBITS 4000000000000a70 00000a70 20 0000000000000070 0000000000000000 A 0 0 16 [16] .IA_64.unwind_inf PROGBITS 4000000000000ae0 00000ae0 00000000000000f0 0000000000000000 A 0 0 8 [17] .IA_64.unwind IA_64_UNWIND 4000000000000bd0 00000bd0 00000000000000c0 0000000000000000 AL 12 c 8 25 [18] .init_array INIT_ARRAY 6000000000000c90 00000c90 0000000000000018 0000000000000000 WA 0 0 8 [19] .fini_array FINI_ARRAY 6000000000000ca8 00000ca8 0000000000000008 0000000000000000 WA 0 0 8 [20] .data PROGBITS 6000000000000cb0 00000cb0 30 0000000000000004 0000000000000000 WA 0 0 4 [21] .dynamic DYNAMIC 6000000000000cb8 00000cb8 00000000000001e0 0000000000000010 WA 5 0 8 [22] .ctors PROGBITS 6000000000000e98 00000e98 0000000000000010 0000000000000000 WA 0 0 8 35 [23] .dtors PROGBITS 6000000000000ea8 00000ea8 0000000000000010 0000000000000000 WA 0 0 8 [24] .jcr PROGBITS 6000000000000eb8 00000eb8 0000000000000008 0000000000000000 WA 0 0 8 [25] .got PROGBITS 6000000000000ec0 00000ec0 40 0000000000000050 0000000000000000 WAp 0 0 8 [26] .IA_64.pltoff PROGBITS 6000000000000f10 00000f10 0000000000000030 0000000000000000 WAp 0 0 16 [27] .sdata PROGBITS 6000000000000f40 00000f40 0000000000000010 0000000000000000 WAp 0 0 8 45 [28] .sbss NOBITS 6000000000000f50 00000f50 0000000000000008 0000000000000000 WA 0 0 8 [29] .bss NOBITS 6000000000000f58 00000f50 0000000000000008 0000000000000000 WA 0 0 8 [30] .comment PROGBITS 0000000000000000 00000f50 50 00000000000000b9 0000000000000000 0 0 1 [31] .debug_aranges PROGBITS 0000000000000000 00001010 0000000000000090 0000000000000000 0 0 16 [32] .debug_pubnames PROGBITS 0000000000000000 000010a0 0000000000000025 0000000000000000 0 0 1 55 [33] .debug_info PROGBITS 0000000000000000 000010c5 00000000000009c4 0000000000000000 0 0 1 [34] .debug_abbrev PROGBITS 0000000000000000 00001a89 0000000000000124 0000000000000000 0 0 1 [35] .debug_line PROGBITS 0000000000000000 00001bad 60 00000000000001fe 0000000000000000 0 0 1 [36] .debug_str PROGBITS 0000000000000000 00001dab 00000000000006a1 0000000000000001 MS 0 0 1 [37] .shstrtab STRTAB 0000000000000000 0000244c 000000000000016f 0000000000000000 0 0 1 65 [38] .symtab SYMTAB 0000000000000000 00002fc0 0000000000000b58 0000000000000018 39 60 8 [39] .strtab STRTAB 0000000000000000 00003b18 0000000000000479 0000000000000000 0 0 1 Key to Flags: 70 W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific)
That address is (unsurprisingly) in the
.plt
section. So there we
have our call into the PLT! But we're not satisfied with that,
let's keep digging further to see what we can uncover. We
disassemble the .plt
section to see what that call actually does.
1 40000000000004c0 <.plt>: 40000000000004c0: 0b 10 00 1c 00 21 [MMI] mov r2=r14;; 40000000000004c6: e0 00 08 00 48 00 addl r14=0,r2 5 40000000000004cc: 00 00 04 00 nop.i 0x0;; 40000000000004d0: 0b 80 20 1c 18 14 [MMI] ld8 r16=[r14],8;; 40000000000004d6: 10 41 38 30 28 00 ld8 r17=[r14],8 40000000000004dc: 00 00 04 00 nop.i 0x0;; 40000000000004e0: 11 08 00 1c 18 10 [MIB] ld8 r1=[r14] 10 40000000000004e6: 60 88 04 80 03 00 mov b6=r17 40000000000004ec: 60 00 80 00 br.few b6;; 40000000000004f0: 11 78 00 00 00 24 [MIB] mov r15=0 40000000000004f6: 00 00 00 02 00 00 nop.i 0x0 40000000000004fc: d0 ff ff 48 br.few 40000000000004c0 <_init+0x50>;; 15 4000000000000500: 11 78 04 00 00 24 [MIB] mov r15=1 4000000000000506: 00 00 00 02 00 00 nop.i 0x0 400000000000050c: c0 ff ff 48 br.few 40000000000004c0 <_init+0x50>;; 4000000000000510: 11 78 08 00 00 24 [MIB] mov r15=2 4000000000000516: 00 00 00 02 00 00 nop.i 0x0 20 400000000000051c: b0 ff ff 48 br.few 40000000000004c0 <_init+0x50>;; 4000000000000520: 0b 78 40 03 00 24 [MMI] addl r15=80,r1;; 4000000000000526: 00 41 3c 70 29 c0 ld8.acq r16=[r15],8 400000000000052c: 01 08 00 84 mov r14=r1;; 4000000000000530: 11 08 00 1e 18 10 [MIB] ld8 r1=[r15] 25 4000000000000536: 60 80 04 80 03 00 mov b6=r16 400000000000053c: 60 00 80 00 br.few b6;; 4000000000000540: 0b 78 80 03 00 24 [MMI] addl r15=96,r1;; 4000000000000546: 00 41 3c 70 29 c0 ld8.acq r16=[r15],8 400000000000054c: 01 08 00 84 mov r14=r1;; 30 4000000000000550: 11 08 00 1e 18 10 [MIB] ld8 r1=[r15] 4000000000000556: 60 80 04 80 03 00 mov b6=r16 400000000000055c: 60 00 80 00 br.few b6;; 4000000000000560: 0b 78 c0 03 00 24 [MMI] addl r15=112,r1;; 4000000000000566: 00 41 3c 70 29 c0 ld8.acq r16=[r15],8 35 400000000000056c: 01 08 00 84 mov r14=r1;; 4000000000000570: 11 08 00 1e 18 10 [MIB] ld8 r1=[r15] 4000000000000576: 60 80 04 80 03 00 mov b6=r16 400000000000057c: 60 00 80 00 br.few b6;; 40
Let us step through the instructions. Firstly, we add 80 to the value in r1, storing it in r15. We know from before that r1 will be pointing to the GOT, so this is saying "store in r15 80 bytes into the GOT". The next thing we do is load into r16 the value stored in this location in the GOT, and post increment the value in r15 by 8 bytes. We then store r1 (the location of the GOT) in r14 and set r1 to be the value in the next 8 bytes after r15. Then we branch to r16.
In the previous chapter we discussed how functions are actually called through a function descriptor which contains the function address and the address of the global pointer. Here we can see that the PLT entry is first loading the function value, moving on 8 bytes to the second part of the function descriptor and then loading that value into the gp register before calling the function.
But what exactly are we loading? We know that r1 will be pointing to the GOT. We go 80 bytes past the got (0x50)
1 $ objdump --disassemble-all ./hello Disassembly of section .got: 5 6000000000000ec0 <.got>: ... 6000000000000ee8: 80 0a 00 00 00 00 data8 0x02a000000 6000000000000eee: 00 40 90 0a dep r0=r0,r0,63,1 6000000000000ef2: 00 00 00 00 00 40 [MIB] (p20) break.m 0x1 10 6000000000000ef8: a0 0a 00 00 00 00 data8 0x02a810000 6000000000000efe: 00 40 50 0f br.few 6000000000000ef0 <_GLOBAL_OFFSET_TABLE_+0x30> 6000000000000f02: 00 00 00 00 00 60 [MIB] (p58) break.m 0x1 6000000000000f08: 60 0a 00 00 00 00 data8 0x029818000 6000000000000f0e: 00 40 90 06 br.few 6000000000000f00 <_GLOBAL_OFFSET_TABLE_+0x40> 15 Disassembly of section .IA_64.pltoff: 6000000000000f10 <.IA_64.pltoff>: 6000000000000f10: f0 04 00 00 00 00 [MIB] (p39) break.m 0x0 6000000000000f16: 00 40 c0 0e 00 00 data8 0x03b010000 20 6000000000000f1c: 00 00 00 60 data8 0xc000000000 6000000000000f20: 00 05 00 00 00 00 [MII] (p40) break.m 0x0 6000000000000f26: 00 40 c0 0e 00 00 data8 0x03b010000 6000000000000f2c: 00 00 00 60 data8 0xc000000000 6000000000000f30: 10 05 00 00 00 00 [MIB] (p40) break.m 0x0 25 6000000000000f36: 00 40 c0 0e 00 00 data8 0x03b010000 6000000000000f3c: 00 00 00 60 data8 0xc000000000
0x6000000000000ec0 + 0x50 = 0x6000000000000f10, or the
.IA_64.pltoff
section. Now
we're starting to get somewhere!
We can decode the objdump
output so we can see exactly what is being loaded here.
Swapping the byte order of the first 8 bytes
f0 04 00 00 00 00 00 40
we
end up with
0x4000000000004f0
. Now that
address looks familiar! Looking back up at the assemble
output of the PLT we see that address.
The code at
0x4000000000004f0
firstly
puts a zero value into r15, and then branches back to
0x40000000000004c0
. Wait a
minute! That's the start of our PLT section.
We can trace this code through too. Firstly we save the
value of the global pointer
(r2
) then we load three 8
byte values into r16
,
r17
and finally,
r1
. We then branch to the
address in r17
. What we are
seeing here is the actual call into the dynamic linker!
We need to delve into the ABI to understand exactly what
is being loaded at this point. The ABI says two things --
dynamically linked programs must have a special section
(called the
DT_IA_64_PLT_RESERVE
section)
that can hold three 8 byte values. There is a pointer where
this reserved area in the dynamic segment of the binary.
1 Dynamic segment at offset 0xcb8 contains 25 entries: Tag Type Name/Value 5 0x0000000000000001 (NEEDED) Shared library: [libc.so.6.1] 0x000000000000000c (INIT) 0x4000000000000470 0x000000000000000d (FINI) 0x4000000000000a20 0x0000000000000019 (INIT_ARRAY) 0x6000000000000c90 0x000000000000001b (INIT_ARRAYSZ) 24 (bytes) 10 0x000000000000001a (FINI_ARRAY) 0x6000000000000ca8 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 0x0000000000000004 (HASH) 0x4000000000000200 0x0000000000000005 (STRTAB) 0x4000000000000330 0x0000000000000006 (SYMTAB) 0x4000000000000240 15 0x000000000000000a (STRSZ) 138 (bytes) 0x000000000000000b (SYMENT) 24 (bytes) 0x0000000000000015 (DEBUG) 0x0 0x0000000070000000 (IA_64_PLT_RESERVE) 0x6000000000000ec0 -- 0x6000000000000ed8 0x0000000000000003 (PLTGOT) 0x6000000000000ec0 20 0x0000000000000002 (PLTRELSZ) 72 (bytes) 0x0000000000000014 (PLTREL) RELA 0x0000000000000017 (JMPREL) 0x4000000000000420 0x0000000000000007 (RELA) 0x40000000000003f0 0x0000000000000008 (RELASZ) 48 (bytes) 25 0x0000000000000009 (RELAENT) 24 (bytes) 0x000000006ffffffe (VERNEED) 0x40000000000003d0 0x000000006fffffff (VERNEEDNUM) 1 0x000000006ffffff0 (VERSYM) 0x40000000000003ba 0x0000000000000000 (NULL) 0x0 30
Do you notice anything about it? It's the same value as the GOT. This means that the first three 8 byte entries in the GOT are actually the reserved area; thus will always be pointed to by the global pointer.
When the dynamic linker starts it is its duty to fill these values in. The ABI says that the first value will be filled in by the dynamic linker giving this module a unique ID. The second value is the global pointer value for the dynamic linker, and the third value is the address of the function that finds and fixes up the symbol.
sysdeps/ia64/dl-machine.h
)1 /* Set up the loaded object described by L so its unrelocated PLT entries will jump to the on-demand fixup code in dl-runtime.c. */ 5 static inline int __attribute__ ((unused, always_inline)) elf_machine_runtime_setup (struct link_map *l, int lazy, int profile) { extern void _dl_runtime_resolve (void); extern void _dl_runtime_profile (void); 10 if (lazy) { register Elf64_Addr gp __asm__ ("gp"); Elf64_Addr *reserve, doit; 15 /* * Careful with the typecast here or it will try to add l-l_addr * pointer elements */ 20 reserve = ((Elf64_Addr *) (l->l_info[DT_IA_64 (PLT_RESERVE)]->d_un.d_ptr + l->l_addr)); /* Identify this shared object. */ reserve[0] = (Elf64_Addr) l; 25 /* This function will be called to perform the relocation. */ if (!profile) doit = (Elf64_Addr) ((struct fdesc *) &_dl_runtime_resolve)->ip; else { 30 if (GLRO(dl_profile) != NULL && _dl_name_match_p (GLRO(dl_profile), l)) { /* This is the object we are looking for. Say that we really want profiling and the timers are started. */ 35 GL(dl_profile_map) = l; } doit = (Elf64_Addr) ((struct fdesc *) &_dl_runtime_profile)->ip; } 40 reserve[1] = doit; reserve[2] = gp; } return lazy; 45 }
We can see how this gets setup by the dynamic linker by
looking at the function that does this for the binary. The
reserve
variable is set from
the PLT_RESERVE section pointer in the binary. The unique
value (put into reserve[0]
)
is the address of the link map for this
object. Link maps are the internal representation within
glibc
for shared objects. We
then put in the address of
_dl_runtime_resolve
to the
second value (assuming we are not using profiling).
reserve[2]
is finally set to
gp, which has been found from r2 with the
__asm__
call.
Looking back at the ABI, we see that the
relocation index
for the
entry must be placed in r15
and the unique identifier must be passed in
r16
.
r15
has previously been
set in the stub code, before we jumped back to the start of
the PLT. Have a look down the entries, and notice how each
PLT entry loads r15
with an
incremented value? It should come as no surprise if you look
at the relocations the printf
relocation is number zero.
r16
we load up from the
values that have been initialised by the dynamic linker, as
previously discussed. Once that is ready, we can load the
function address and global pointer and branch into the
function.
What happens at this point is the dynamic linker
function _dl_runtime_resolve
is run. It finds the relocation; remember how the relocation
specified the name of the symbol? It uses this name to find
the right function; this might involve loading the library
from disk if it is not already in memory, or otherwise sharing
the code.
The relocation record provides the dynamic linker with the address it needs to "fix up"; remember it was in the GOT and loaded by the initial PLT stub? This means that after the first time the function is called, the second time it is loaded it will get the direct address of the function; short circuiting the dynamic linker.
You've seen the exact mechanism behind the PLT, and consequently the inner workings of the dynamic linker. The important points to remember are
Library calls in your program actually call a stub of code in the PLT of the binary.
That stub code loads an address and jumps to it.
Initially, that address points to a function in the dynamic linker which is capable of looking up the "real" function, given the information in the relocation entry for that function.
The dynamic linker re-writes the address that the stub code reads, so that the next time the function is called it will go straight to the right address.