ELF is an extremely flexible format for representing binary code in a system. By following the ELF standard you can represent a kernel binary just as easily as a normal executable or a system library. The same tools can be used to inspect and operate on all ELF files and developers who understand the ELF file format can translate their skills to most modern UNIX systems.
ELF extends on COFF and gives the header sufficient flexibility to define an arbitrary number of sections, each with it's own properties. This facilitates easier dynamic linking and debugging.
Overall, the file has a file header which describes the file in general and then has pointers to each of the individual sections that make up the file.
1 typedef struct { unsigned char e_ident[EI_NIDENT]; Elf32_Half e_type; 5 Elf32_Half e_machine; Elf32_Word e_version; Elf32_Addr e_entry; Elf32_Off e_phoff; Elf32_Off e_shoff; 10 Elf32_Word e_flags; Elf32_Half e_ehsize; Elf32_Half e_phentsize; Elf32_Half e_phnum; Elf32_Half e_shentsize; 15 Elf32_Half e_shnum; Elf32_Half e_shstrndx; } Elf32_Ehdr;
Above is the description as given in the API documentation. This is the layout of the C structure which defines a ELF header.
1 $ readelf --header /bin/ls ELF Header: 5 Magic: 7f 45 4c 46 01 02 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, big endian Version: 1 (current) OS/ABI: UNIX - System V 10 ABI Version: 0 Type: EXEC (Executable file) Machine: PowerPC Version: 0x1 Entry point address: 0x10002640 15 Start of program headers: 52 (bytes into file) Start of section headers: 87460 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) 20 Number of program headers: 8 Size of section headers: 40 (bytes) Number of section headers: 29 Section header string table index: 28 25 [...]
Above is a more human readable form as present by the readelf program, which is part of GNU binutils.
The e_ident
array is
the first thing at the start of any ELF file, and always
starts with a few "magic" bytes. The first byte is 0x7F and
then the next three bytes are "ELF". You can inspect an ELF
binary to see this for yourself with something like the
hexdump command.
1 ianw@mingus:~$ hexdump -C /bin/ls | more 00000000 7f 45 4c 46 01 02 01 00 00 00 00 00 00 00 00 00 |.ELF............| 5 ... (rest of the program follows) ...
Note the 0x7F to start, then the ASCII encoded "ELF" string. Have a look at the standard and see what the rest of the array defines and what the values are in a binary.
Next we have some flags for the type of machine this
binary is created for. The first thing we can see is that ELF
defines different type sized versions, one for 32 bit and one
for 64 bit versions; here we inspect the 32 bit version. The
difference is mostly that on 64 bit machines addresses
obviously required to be held in 64 bit variables. We can see
that the binary has been created for a big endian machine that
uses 2's complement to represent negative numbers. Skipping
down a bit we can see the
Machine
tells us this is a
PowerPC binary.
The apparently innocuous entry point address seems straight forward enough; this is the address in memory that the program code starts at.
Beginning C programmers are told that main() is the first program called in your program. Using the entry point address we can actually verify that it isn't.
1 $ cat test.c #include <stdio.h> 5 int main(void) { printf("main is : %p\n", &main); return 0; } 10 $ gcc -Wall -o test test.c $ ./test main is : 0x10000430 15 $ readelf --headers ./test | grep 'Entry point' Entry point address: 0x100002b0 $ objdump --disassemble ./test | grep 100002b0 20 100002b0 <_start>: 100002b0: 7c 29 0b 78 mr r9,r1
Above we can see that the entry point is actually a
function called _start
.
Our program didn't define this at all, and the leading
underscore suggests that it is in a separate
namespace. We examine how a program
starts up below.
After that the header contians pointers to where in the file other important parts of the ELF file start, like a table of contents.
The ELF specification provides for symbol
tables which are simply mappings of strings
(symbols) to locations in the file. Symbols are required for
linking; for example assigning a value to a variable
foo
declared as
extern int foo;
would require
the linker to find the address of
foo
, which would involve
looking up "foo" in the symbol table and finding the
address.
Closely related to symbols are
relocations. A relocation is simply a
blank space left to be patched up later. In the previous
example, until the address of
foo
is known it can not be
used. However, on a 32-bit system, we know the
address of
foo
must be a 4-byte value,
so any time the compiler needs to use that address (to say,
assign a value) it can simply leave 4-byes of blank space and
keep a relocation that essentially says to the linker "place
the real value of "foo" into the 4 bytes at this address". As
mentioned, this requires the symbol "foo" to be resolved.
the section called “Relocations” contains further
information on relocations.
The ELF format specifies two "views" of an ELF file -- that which is used for linking and that which is used for execution. This affords significant flexibility for systems designers.
We talk about sections in object code waiting to be linked into an executable. One or more sections map to a segment in the executable.
As we have done before, it is sometimes easier to look at the higher level of abstraction (segments) before inspecting the lower layers.
As we mentioned the ELF file has an header that describes the overall layout of the file. The ELF header actually points to another group of headers called the program headers. These headers describe to the operating system anything that might be required for it to load the binary into memory and execute it. Segments are described by program headers, but so are some other things reuquired to get the executable running.
1 typedef struct { Elf32_Word p_type; Elf32_Off p_offset; 5 Elf32_Addr p_vaddr; Elf32_Addr p_paddr; Elf32_Word p_filesz; Elf32_Word p_memsz; Elf32_Word p_flags; 10 Elf32_Word p_align; }
The definition of the program header is seen above.
You might have noticed from the ELF header definition above
how there were fields
e_phoff
,
e_phnum
and
e_phentsize
; these are
simply the offset in the file where the program headers
start, how many program headers there are and how big each
program header is. With these three bits of information you
can easily find and read the program headers.
As we mentioned, program headers more than just
segments. The p_type
field
defines just what the program header is defining. For
example, if this field is
PT_INTERP
the header is
defined as meaning a string pointer to an
interpreter for the binary file. We
discussed compiled versus interpreted languages previously
and made the distinction that a compiler builds a binary
which can be run in a stand alone fashion. Why should it
need an interpreter? As always, the true picture is a
little more complicated. There are several reasons why a
modern system wants flexibility when loading executable
files, and to do this some information can only be
adequately acquired at the actual time the program is set up
to run. We see this in future chapters where we look into
dynamic linking. Consequently some minor changes might need
to be made to the binary to allow it to work properly at
runtime. Thus the usual interpreter of a binary file is the
dyanmic loader, so called because it
takes the final steps to complete loading of the exectable
and prepare the binary image for running.
Segments are described with a value of
PT_LOAD
in the
p_type
field. Each segment
is then described by the other fields in the program header.
The p_offset
field tells
you how far into the file on disk the data for the segment
is. The p_vaddr
field
tells you what address that data is to live at in virtual
memory (p_addr
describes
the physical address, which is only really useful for small
embedded systems that do not implement virtual memory). The
two flags p_filesz
and
p_memsz
work to tell you
how big the segment is on disk and how big it should be in
memory. If the memory size is greater than the disk size,
then the overlap should be filled with zeros. In this way
you can save considerable space in your binaries by not
having to waste space for empty global variables. Finally
p_flags
indicates the
permissions on the segment. Execute, read and write
permissions can be specified in any combiation; for example
code segements should be marked as read and execute only,
data sections as read and write with no exectue.
There are a few other segment types defined in the program headers, they are described more fully in the standards specification (XXX).
As we have mentioned, sections make up segments. Sections are a way to organise the binary into logical areas to communicate information between the compiler and the linker. In some special binaries, such as the linux kernel, sections are used in more specific ways.
We've seen how segments utimatley come down to a blob of data in a file on disk with some descriptions about where it should be loaded and what permissions it has. (XXX)
Sections have a similar header to segments.
1 typedef struct { Elf32_Word sh_name; Elf32_Word sh_type; 5 Elf32_Word sh_flags; Elf32_Addr sh_addr; Elf32_Off sh_offset; Elf32_Word sh_size; Elf32_Word sh_link; 10 Elf32_Word sh_info; Elf32_Word sh_addralign; Elf32_Word sh_entsize; } 15
Sections have a few more types defined for the
sh_type
field; for example
a section of type
SH_PROGBITS
is defined as a
section that hold binary data for use by the program. Other
flags say if this section is a symbol table (used by the
linker or debugger for example) or maybe something for the
dynamic loader.
There are also more attributes, such as the allocate attribute which flags that this section will need memory allocated for it.
It is probably best to examine sections through an example of them in use. Consier the following program.
1 #include <stdio.h> int big_big_array[10*1024*1024]; 5 char *a_string = "Hello, World!"; int a_var_with_value = 0x100; 10 int main(void) { big_big_array[0] = 100; printf("%s\n", a_string); a_var_with_value += 20; 15 }
1 $ readelf --all ./sections ELF Header: ... 5 Size of section headers: 40 (bytes) Number of section headers: 37 Section header string table index: 34 Section Headers: 10 [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 10000114 000114 00000d 00 A 0 0 1 [ 2] .note.ABI-tag NOTE 10000124 000124 000020 00 A 0 0 4 [ 3] .hash HASH 10000144 000144 00002c 04 A 4 0 4 15 [ 4] .dynsym DYNSYM 10000170 000170 000060 10 A 5 1 4 [ 5] .dynstr STRTAB 100001d0 0001d0 00005e 00 A 0 0 1 [ 6] .gnu.version VERSYM 1000022e 00022e 00000c 02 A 4 0 2 [ 7] .gnu.version_r VERNEED 1000023c 00023c 000020 00 A 5 1 4 [ 8] .rela.dyn RELA 1000025c 00025c 00000c 0c A 4 0 4 20 [ 9] .rela.plt RELA 10000268 000268 000018 0c A 4 25 4 [10] .init PROGBITS 10000280 000280 000028 00 AX 0 0 4 [11] .text PROGBITS 100002b0 0002b0 000560 00 AX 0 0 16 [12] .fini PROGBITS 10000810 000810 000020 00 AX 0 0 4 [13] .rodata PROGBITS 10000830 000830 000024 00 A 0 0 4 25 [14] .sdata2 PROGBITS 10000854 000854 000000 00 A 0 0 4 [15] .eh_frame PROGBITS 10000854 000854 000004 00 A 0 0 4 [16] .ctors PROGBITS 10010858 000858 000008 00 WA 0 0 4 [17] .dtors PROGBITS 10010860 000860 000008 00 WA 0 0 4 [18] .jcr PROGBITS 10010868 000868 000004 00 WA 0 0 4 30 [19] .got2 PROGBITS 1001086c 00086c 000010 00 WA 0 0 1 [20] .dynamic DYNAMIC 1001087c 00087c 0000c8 08 WA 5 0 4 [21] .data PROGBITS 10010944 000944 000008 00 WA 0 0 4 [22] .got PROGBITS 1001094c 00094c 000014 04 WAX 0 0 4 [23] .sdata PROGBITS 10010960 000960 000008 00 WA 0 0 4 35 [24] .sbss NOBITS 10010968 000968 000000 00 WA 0 0 1 [25] .plt NOBITS 10010968 000968 000060 00 WAX 0 0 4 [26] .bss NOBITS 100109c8 000968 2800004 00 WA 0 0 4 [27] .comment PROGBITS 00000000 000968 00018f 00 0 0 1 [28] .debug_aranges PROGBITS 00000000 000af8 000078 00 0 0 8 40 [29] .debug_pubnames PROGBITS 00000000 000b70 000025 00 0 0 1 [30] .debug_info PROGBITS 00000000 000b95 0002e5 00 0 0 1 [31] .debug_abbrev PROGBITS 00000000 000e7a 000076 00 0 0 1 [32] .debug_line PROGBITS 00000000 000ef0 0001de 00 0 0 1 [33] .debug_str PROGBITS 00000000 0010ce 0000f0 01 MS 0 0 1 45 [34] .shstrtab STRTAB 00000000 0011be 00013b 00 0 0 1 [35] .symtab SYMTAB 00000000 0018c4 000c90 10 36 65 4 [36] .strtab STRTAB 00000000 002554 000909 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) 50 I (info), L (link order), G (group), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific) There are no section groups in this file. ... 55 Symbol table '.symtab' contains 201 entries: Num: Value Size Type Bind Vis Ndx Name ... 99: 100109cc 0x2800000 OBJECT GLOBAL DEFAULT 26 big_big_array 60 ... 110: 10010960 4 OBJECT GLOBAL DEFAULT 23 a_string ... 130: 10010964 4 OBJECT GLOBAL DEFAULT 23 a_var_with_value ... 65 144: 10000430 96 FUNC GLOBAL DEFAULT 11 main
Above we have stripped some parts of the readelf output for clarity. We can analyse each part of our simple program and see what happens to it.
Firstly, let us look at the variable
big_big_array
, which as the
name suggests is a fairly large global array. If we skip
down to the symbol table we can see that the variable is at
location 0x100109cc
which
we can correlate to the
.bss
section in the section
listing, since it starts just below it at
0x100109c8
. Note the size,
and how it is quite large. We mentioned that BSS is a
standard part of a binary image since it would be silly to
require that binary on disk have 10 megabytes of space
allocated to it, when all of that space is going to be zero.
Note that this section has a type of
NOBITS
meaning that it does
not have any bytes on disk.
Thus the .bss
section
is defined for global variables whose value should be zero
when the program starts. We have seen how the memory size
can be different to the on disk size in our discussion of
segments; variables being in the
.bss
section are an
indication that they will be given zero value on program
start.
The a_string
variable
lives in the .sdata
section, which stands for small data.
Small data (and the corresponding
.sbss
section) are sections
that can be reached by an offset from some known pointer.
This means it is much faster to get to data in the sections
as there are no extra lookups and loading of addresses into
memory required. On the other hand, most architectures are
limited to the size of immediate values you can add to a
register (e.g. saying r1 = add r2,
70;
70 is an immediate value, as opposed to
say, adding two values stored in registers
r1 = add r2,r3
) and can
thus only offset a certain "small" distance from an address
(XXX).
We can also see that our
a_var_with_value
lives in
the same place.
main
however lives in
the .text
section, as we
expect (remeber the name "text" and "code" are used
interchanably to refer to a program in memory.
1 $ readelf --segments /bin/ls Elf file type is EXEC (Executable file) 5 Entry point 0x100026c0 There are 8 program headers, starting at offset 52 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align 10 PHDR 0x000034 0x10000034 0x10000034 0x00100 0x00100 R E 0x4 INTERP 0x000154 0x10000154 0x10000154 0x0000d 0x0000d R 0x1 [Requesting program interpreter: /lib/ld.so.1] LOAD 0x000000 0x10000000 0x10000000 0x14d5c 0x14d5c R E 0x10000 LOAD 0x014d60 0x10024d60 0x10024d60 0x002b0 0x00b7c RWE 0x10000 15 DYNAMIC 0x014f00 0x10024f00 0x10024f00 0x000d8 0x000d8 RW 0x4 NOTE 0x000164 0x10000164 0x10000164 0x00020 0x00020 R 0x4 GNU_EH_FRAME 0x014d30 0x10014d30 0x10014d30 0x0002c 0x0002c R 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x4 20 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_ r .rela.dyn .rela.plt .init .text .fini .rodata .eh_frame_hdr 25 03 .data .eh_frame .got2 .dynamic .ctors .dtors .jcr .got .sdata .sbss .p lt .bss 04 .dynamic 05 .note.ABI-tag 06 .eh_frame_hdr 07 30
readelf
shows us the
segments and section mappings in the ELF file for the binary
/bin/ls
.
Skipping to the bottom of the output, we can see what
sections have been moved into what segments. So, for
example the .interp
section
is placed into an INTERP
flagged segment. Notice that readelf tells us it is
requesting the interpreter
/lib/ld.so.1
; this is the
dynamic linker which is run to prepare the binary for
execution.
Looking at the two
LOAD
segments we can see
the distinction between text and data. Notice how the first
one has only "read" and "execute" permissions, whilst the
next one has read, write and execute permissions? These
describe the code (r/w) and data (r/w/e) segments.
But data should not need to be executable! Indeed, on most architectures (for example, the most common x86) the data section will not be marked as having the data section executable. However, the example output above was taken from a PowerPC machine which has a slightly different programming model (ABI, see below) requiring that the data section be executable [25]. Such is the life of a systems programmer, where rules were made to be broken!
The other intereseting thing to note is that the file size is the same as the memory size for the code segment, however memory size is greater than the file size for the data segment. This comes from the BSS section which holds zeroed global variables.
Tradionally the primary method of post mortem debugging is referred to as the core dump. The term core comes from the original physical characteristics of magnetic core memory, which uses the orientation of small magnetic rings to store state.
Thus a core dump is simply a complete snapshot of the program as it was running at a particular time. A debugger can then be used to examine this dump and reconstruct the program state. Example 8.10, “Example of creating a core dump and using it with gdb™” shows a sample program that writes to a random memory location in order to force a crash. At this point the processes will be halted and a dump of the current state is recorded.
1 $ cat coredump.c int main(void) { char *foo = (char*)0x12345; 5 *foo = 'a'; return 0; } 10 $ gcc -Wall -g -o coredump coredump.c $ ./coredump Segmentation fault (core dumped) 15 $ file ./core ./core: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from './coredump' $ gdb ./coredump ... 20 (gdb) core core [New LWP 31614] Core was generated by `./coredump'. Program terminated with signal 11, Segmentation fault. #0 0x080483c4 in main () at coredump.c:3 25 3 *foo = 'a'; (gdb)
As the example shows, the debugger
gdb™ requires the original
executable and the core dump to provide the debugging session.
Note that the original executable was built with the
-g
flag, which instructs the
compiler to include all debugging
information. Debugging information created by the
compiler and is kept in special sections of the ELF file. It
describes in detail things like what register values currently
hold which variables used in the code, size of variables,
length of arrays, etc. It is generally in the standard
DWARF format (a pun on the homonym ELF).
Including debugging information can make executable
files and libraries very large; although this data is not
required resident in memory for actually running it can
still take up considerable disk space. Thus the usual
process is to strip this information
from the ELF file. While it is possible to arrange for
shipping of both stripped and unstripped files, most all
current binary distribution methods provide the debugging
information in separate files. The
objcopy™ tool can be used to
extract the debugging information
(--only-keep-debug
) and
then add a link in the original executable to this stripped
information
(--add-gnu-debuglink
).
After this is done, a special section called
.gnu_debuglink
will be
present in the original executable, which contains a hash so
that when a debugging sessions starts the debugger can be
sure it associates the right debugging information with the
right executable.
1 $ gcc -g -shared -o libtest.so libtest.c $ objcopy --only-keep-debug libtest.so libtest.debug $ objcopy --add-gnu-debuglink=libtest.debug libtest.so 5 $ objdump -s -j .gnu_debuglink libtest.so libtest.so: file format elf32-i386 Contents of section .gnu_debuglink: 10 0000 6c696274 6573742e 64656275 67000000 libtest.debug... 0010 52a7fd0a R...
Symbols take up much less space, but are also targets for removal from final output. Once the individual object files of an executable are linked into the single final image there is generally no need for most symbols to remain. As discussed in the section called “Symbols and Relocations” symbols are required to fix up relocation entries, but once this is done the symbols are not strictly necessary for running the final program. On Linux the GNU toolchain strip™ program provides options to remove symbols. Note that some symbols are required to be resolved at run-time (for dynamic linking, the focus of Chapter 9, Dynamic Linking) but these are put in separate dynamic symbol tables so they will not be removed and render the final output useless.
A coredump is really just another ELF file; this illustrates the flexibility of ELF as a binary format.
1 $ readelf --all ./core ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 5 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 10 Type: CORE (Core file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x0 Start of program headers: 52 (bytes into file) 15 Start of section headers: 0 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 15 20 Size of section headers: 0 (bytes) Number of section headers: 0 Section header string table index: 0 There are no sections in this file. 25 There are no sections to group in this file. Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align 30 NOTE 0x000214 0x00000000 0x00000000 0x0022c 0x00000 0 LOAD 0x001000 0x08048000 0x00000000 0x01000 0x01000 R E 0x1000 LOAD 0x002000 0x08049000 0x00000000 0x01000 0x01000 RW 0x1000 LOAD 0x003000 0x489fc000 0x00000000 0x01000 0x1b000 R E 0x1000 LOAD 0x004000 0x48a17000 0x00000000 0x01000 0x01000 R 0x1000 35 LOAD 0x005000 0x48a18000 0x00000000 0x01000 0x01000 RW 0x1000 LOAD 0x006000 0x48a1f000 0x00000000 0x01000 0x153000 R E 0x1000 LOAD 0x007000 0x48b72000 0x00000000 0x00000 0x01000 0x1000 LOAD 0x007000 0x48b73000 0x00000000 0x02000 0x02000 R 0x1000 LOAD 0x009000 0x48b75000 0x00000000 0x01000 0x01000 RW 0x1000 40 LOAD 0x00a000 0x48b76000 0x00000000 0x03000 0x03000 RW 0x1000 LOAD 0x00d000 0xb771c000 0x00000000 0x01000 0x01000 RW 0x1000 LOAD 0x00e000 0xb774d000 0x00000000 0x02000 0x02000 RW 0x1000 LOAD 0x010000 0xb774f000 0x00000000 0x01000 0x01000 R E 0x1000 LOAD 0x011000 0xbfeac000 0x00000000 0x22000 0x22000 RW 0x1000 45 There is no dynamic section in this file. There are no relocations in this file. 50 There are no unwind sections in this file. No version information found in this file. Notes at offset 0x00000214 with length 0x0000022c: 55 Owner Data size Description CORE 0x00000090 NT_PRSTATUS (prstatus structure) CORE 0x0000007c NT_PRPSINFO (prpsinfo structure) CORE 0x000000a0 NT_AUXV (auxiliary vector) LINUX 0x00000030 Unknown note type: (0x00000200) 60 $ eu-readelf -n ./core Note segment of 556 bytes at offset 0x214: Owner Data size Type 65 CORE 144 PRSTATUS info.si_signo: 11, info.si_code: 0, info.si_errno: 0, cursig: 11 sigpend: <> sighold: <> pid: 31614, ppid: 31544, pgrp: 31614, sid: 31544 70 utime: 0.000000, stime: 0.000000, cutime: 0.000000, cstime: 0.000000 orig_eax: -1, fpvalid: 0 ebx: 1219973108 ecx: 1243440144 edx: 1 esi: 0 edi: 0 ebp: 0xbfecb828 eax: 74565 eip: 0x080483c4 eflags: 0x00010286 75 esp: 0xbfecb818 ds: 0x007b es: 0x007b fs: 0x0000 gs: 0x0033 cs: 0x0073 ss: 0x007b CORE 124 PRPSINFO state: 0, sname: R, zomb: 0, nice: 0, flag: 0x00400400 uid: 1000, gid: 1000, pid: 31614, ppid: 31544, pgrp: 31614, sid: 31544 80 fname: coredump, psargs: ./coredump CORE 160 AUXV SYSINFO: 0xb774f414 SYSINFO_EHDR: 0xb774f000 HWCAP: 0xafe8fbff <fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss tm pbe> 85 PAGESZ: 4096 CLKTCK: 100 PHDR: 0x8048034 PHENT: 32 PHNUM: 8 90 BASE: 0 FLAGS: 0 ENTRY: 0x8048300 UID: 1000 EUID: 1000 95 GID: 1000 EGID: 1000 SECURE: 0 RANDOM: 0xbfecba1b EXECFN: 0xbfecdff1 100 PLATFORM: 0xbfecba2b NULL LINUX 48 386_TLS index: 6, base: 0xb771c8d0, limit: 0x000fffff, flags: 0x00000051 index: 7, base: 0x00000000, limit: 0x00000000, flags: 0x00000028 105 index: 8, base: 0x00000000, limit: 0x00000000, flags: 0x00000028
In Example 8.12, “Example of using readelf™
and eu-readelf™ to examine a
coredump.” we can see
an examination of the core file produced by Example 8.10, “Example of creating a core dump and using it with gdb™” using firstly the
readelf™ tool. There are no
sections, relocations or other extraneous information in the
file that may be required for loading an executable or
library; it simply consists of a series of program headers
describing LOAD
segments.
These segments are raw data dumps, created by the kernel, of
the current memory allocations.
The other component of the core dump is the
NOTE
sections which contain
data necessary for debugging but not necessarily captured in
straight snapshot of the memory allocations. The
eu-readelf™ program used in the
second part of the figure provides a more complete view of the
data by decoding it.
The PRSTATUS
note gives
a range of interesting information about the process as it was
running; for example we can see from
cursig
that the program
received a signal 11, or segmentation fault, as we would
expect. Along with process number information, it also
includes a dump of all the current registers. Given the
register values, the debugger can reconstruct the stack state
and hence provide a backtrace; combined
with the symbol and debugging information from the original
binary the debugger can show exactly how you reached the
current point of execution.
Another interesting output is the current
auxiliary vector
(AUXV
), discussed in the section called “Kernel communication to programs”. The
386_TLS
describes
global descriptor table entries used
for the x86 implementation of thread-local
storage (see the section called “Fast System Calls” for more information on
use of segmentation, and the section called “Threads” for
information on threads[26]).
The kernel creates the core dump file within the bounds
of the current ulimit
settings — since a program using a lot of memory could
result in a very large dump, potentially filling up disk and
making problems even worse, generally the
ulimit
is set low or even
at zero, since most non-developers have little use for a
core dump file. However the core dump remains the single
most useful way to debug an unexpected situation in a
postmortem fashion.
[25] For those that are curious, the PowerPC ABI calls stubs for functions in dynamic libraries directly in the GOT, rather than having them bounce through a seperate PLT entry. Thus the processor needs exectute permissions for the GOT section, which you can see is embedded in the data segment. This should make sense after reading the dyanmic linking chapter!
[26] For a multi-threaded
application, there would be duplicate entries for each
thread running. The debugger will understand this, and it
is how gdb™ implements the
thread
command to show and
switch between threads.