ELF

ELF
Prev	Chapter 8. Behind the process	Next

ELF is an extremely flexible format for representing binary code in a system. By following the ELF standard you can represent a kernel binary just as easily as a normal executable or a system library. The same tools can be used to inspect and operate on all ELF files and developers who understand the ELF file format can translate their skills to most modern UNIX systems.

ELF in depth

ELF extends on COFF and gives the header sufficient flexibility to define an arbitrary number of sections, each with it's own properties. This facilitates easier dynamic linking and debugging.

Figure 8.1. ELF Overview

ELF File Header

Overall, the file has a file header which describes the file in general and then has pointers to each of the individual sections that make up the file.

Example 8.1. The ELF Header

  1 
                  typedef struct {
            unsigned char e_ident[EI_NIDENT];
            Elf32_Half    e_type;
  5         Elf32_Half    e_machine;
            Elf32_Word    e_version;
            Elf32_Addr    e_entry;
            Elf32_Off     e_phoff;
            Elf32_Off     e_shoff;
 10         Elf32_Word    e_flags;
            Elf32_Half    e_ehsize;
            Elf32_Half    e_phentsize;
            Elf32_Half    e_phnum;
            Elf32_Half    e_shentsize;
 15         Elf32_Half    e_shnum;
            Elf32_Half    e_shstrndx;
    } Elf32_Ehdr;

Above is the description as given in the API documentation. This is the layout of the C structure which defines a ELF header.

Example 8.2. The ELF Header, as shown by readelf

  1 
                  $ readelf --header /bin/ls
    
    ELF Header:
  5   Magic:   7f 45 4c 46 01 02 01 00 00 00 00 00 00 00 00 00 
      Class:                             ELF32
      Data:                              2's complement, big endian
      Version:                           1 (current)
      OS/ABI:                            UNIX - System V
 10   ABI Version:                       0
      Type:                              EXEC (Executable file)
      Machine:                           PowerPC
      Version:                           0x1
      Entry point address:               0x10002640
 15   Start of program headers:          52 (bytes into file)
      Start of section headers:          87460 (bytes into file)
      Flags:                             0x0
      Size of this header:               52 (bytes)
      Size of program headers:           32 (bytes)
 20   Number of program headers:         8
      Size of section headers:           40 (bytes)
      Number of section headers:         29
      Section header string table index: 28
    
 25   [...]

Above is a more human readable form as present by the readelf program, which is part of GNU binutils.

The e_ident array is the first thing at the start of any ELF file, and always starts with a few "magic" bytes. The first byte is 0x7F and then the next three bytes are "ELF". You can inspect an ELF binary to see this for yourself with something like the hexdump command.

Example 8.3. Inspecting the ELF magic number

  1 
                  ianw@mingus:~$ hexdump -C /bin/ls | more
    00000000  7f 45 4c 46 01 02 01 00  00 00 00 00 00 00 00 00  |.ELF............|
    
  5 ... (rest of the program follows) ...

Note the 0x7F to start, then the ASCII encoded "ELF" string. Have a look at the standard and see what the rest of the array defines and what the values are in a binary.

Next we have some flags for the type of machine this binary is created for. The first thing we can see is that ELF defines different type sized versions, one for 32 bit and one for 64 bit versions; here we inspect the 32 bit version. The difference is mostly that on 64 bit machines addresses obviously required to be held in 64 bit variables. We can see that the binary has been created for a big endian machine that uses 2's complement to represent negative numbers. Skipping down a bit we can see the Machine tells us this is a PowerPC binary.

The apparently innocuous entry point address seems straight forward enough; this is the address in memory that the program code starts at.

Beginning C programmers are told that main() is the first program called in your program. Using the entry point address we can actually verify that it isn't.

Example 8.4. Investigating the entry point

  1 
                  $ cat test.c
    #include <stdio.h>
    
  5 int main(void)
    {
            printf("main is : %p\n", &main);
            return 0;
    }
 10 
    $ gcc -Wall -o test test.c
    
    $ ./test
    main is : 0x10000430
 15 
    $ readelf --headers ./test | grep 'Entry point'
      Entry point address:               0x100002b0
    
    $ objdump --disassemble ./test | grep 100002b0
 20 100002b0 <_start>:
    100002b0:       7c 29 0b 78     mr      r9,r1

Above we can see that the entry point is actually a function called _start. Our program didn't define this at all, and the leading underscore suggests that it is in a separate namespace. We examine how a program starts up below.

After that the header contians pointers to where in the file other important parts of the ELF file start, like a table of contents.

Symbols and Relocations

The ELF specification provides for symbol tables which are simply mappings of strings (symbols) to locations in the file. Symbols are required for linking; for example assigning a value to a variable foo declared as extern int foo; would require the linker to find the address of foo, which would involve looking up "foo" in the symbol table and finding the address.

Closely related to symbols are relocations. A relocation is simply a blank space left to be patched up later. In the previous example, until the address of foo is known it can not be used. However, on a 32-bit system, we know the address of foo must be a 4-byte value, so any time the compiler needs to use that address (to say, assign a value) it can simply leave 4-byes of blank space and keep a relocation that essentially says to the linker "place the real value of "foo" into the 4 bytes at this address". As mentioned, this requires the symbol "foo" to be resolved. the section called “Relocations” contains further information on relocations.

Sections and Segments

The ELF format specifies two "views" of an ELF file -- that which is used for linking and that which is used for execution. This affords significant flexibility for systems designers.

We talk about sections in object code waiting to be linked into an executable. One or more sections map to a segment in the executable.

Segments

As we have done before, it is sometimes easier to look at the higher level of abstraction (segments) before inspecting the lower layers.

As we mentioned the ELF file has an header that describes the overall layout of the file. The ELF header actually points to another group of headers called the program headers. These headers describe to the operating system anything that might be required for it to load the binary into memory and execute it. Segments are described by program headers, but so are some other things reuquired to get the executable running.

Example 8.5. The Program Header

  1 
                  typedef struct {
              Elf32_Word p_type;
              Elf32_Off  p_offset;
  5           Elf32_Addr p_vaddr;
              Elf32_Addr p_paddr;
              Elf32_Word p_filesz;
              Elf32_Word p_memsz;
              Elf32_Word p_flags;
 10           Elf32_Word p_align;
    }

The definition of the program header is seen above. You might have noticed from the ELF header definition above how there were fields e_phoff, e_phnum and e_phentsize; these are simply the offset in the file where the program headers start, how many program headers there are and how big each program header is. With these three bits of information you can easily find and read the program headers.

As we mentioned, program headers more than just segments. The p_type field defines just what the program header is defining. For example, if this field is PT_INTERP the header is defined as meaning a string pointer to an interpreter for the binary file. We discussed compiled versus interpreted languages previously and made the distinction that a compiler builds a binary which can be run in a stand alone fashion. Why should it need an interpreter? As always, the true picture is a little more complicated. There are several reasons why a modern system wants flexibility when loading executable files, and to do this some information can only be adequately acquired at the actual time the program is set up to run. We see this in future chapters where we look into dynamic linking. Consequently some minor changes might need to be made to the binary to allow it to work properly at runtime. Thus the usual interpreter of a binary file is the dyanmic loader, so called because it takes the final steps to complete loading of the exectable and prepare the binary image for running.

Segments are described with a value of PT_LOAD in the p_type field. Each segment is then described by the other fields in the program header. The p_offset field tells you how far into the file on disk the data for the segment is. The p_vaddr field tells you what address that data is to live at in virtual memory (p_addr describes the physical address, which is only really useful for small embedded systems that do not implement virtual memory). The two flags p_filesz and p_memsz work to tell you how big the segment is on disk and how big it should be in memory. If the memory size is greater than the disk size, then the overlap should be filled with zeros. In this way you can save considerable space in your binaries by not having to waste space for empty global variables. Finally p_flags indicates the permissions on the segment. Execute, read and write permissions can be specified in any combiation; for example code segements should be marked as read and execute only, data sections as read and write with no exectue.

There are a few other segment types defined in the program headers, they are described more fully in the standards specification (XXX).

Sections

As we have mentioned, sections make up segments. Sections are a way to organise the binary into logical areas to communicate information between the compiler and the linker. In some special binaries, such as the linux kernel, sections are used in more specific ways.

We've seen how segments utimatley come down to a blob of data in a file on disk with some descriptions about where it should be loaded and what permissions it has. (XXX)

Sections have a similar header to segments.

Example 8.6. Sections

  1 
                  typedef struct {
              Elf32_Word sh_name;
              Elf32_Word sh_type;
  5           Elf32_Word sh_flags;
              Elf32_Addr sh_addr;
              Elf32_Off  sh_offset;
              Elf32_Word sh_size;
              Elf32_Word sh_link;
 10           Elf32_Word sh_info;
              Elf32_Word sh_addralign;
              Elf32_Word sh_entsize;
    }
    
 15

Sections have a few more types defined for the sh_type field; for example a section of type SH_PROGBITS is defined as a section that hold binary data for use by the program. Other flags say if this section is a symbol table (used by the linker or debugger for example) or maybe something for the dynamic loader.

There are also more attributes, such as the allocate attribute which flags that this section will need memory allocated for it.

It is probably best to examine sections through an example of them in use. Consier the following program.

Example 8.7. Sections

  1 
                  #include <stdio.h>
    
    int big_big_array[10*1024*1024];
  5 
    char *a_string = "Hello, World!";
    
    int a_var_with_value = 0x100;
    
 10 int main(void)
    {
    	big_big_array[0] = 100;
    	printf("%s\n", a_string);
    	a_var_with_value += 20;
 15 }

Example 8.8. Sections readelf output

  1 
                  $ readelf --all ./sections
    ELF Header:
     ...
  5   Size of section headers:           40 (bytes)
      Number of section headers:         37
      Section header string table index: 34
    
    Section Headers:
 10   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
      [ 0]                   NULL            00000000 000000 000000 00      0   0  0
      [ 1] .interp           PROGBITS        10000114 000114 00000d 00   A  0   0  1
      [ 2] .note.ABI-tag     NOTE            10000124 000124 000020 00   A  0   0  4
      [ 3] .hash             HASH            10000144 000144 00002c 04   A  4   0  4
 15   [ 4] .dynsym           DYNSYM          10000170 000170 000060 10   A  5   1  4
      [ 5] .dynstr           STRTAB          100001d0 0001d0 00005e 00   A  0   0  1
      [ 6] .gnu.version      VERSYM          1000022e 00022e 00000c 02   A  4   0  2
      [ 7] .gnu.version_r    VERNEED         1000023c 00023c 000020 00   A  5   1  4
      [ 8] .rela.dyn         RELA            1000025c 00025c 00000c 0c   A  4   0  4
 20   [ 9] .rela.plt         RELA            10000268 000268 000018 0c   A  4  25  4
      [10] .init             PROGBITS        10000280 000280 000028 00  AX  0   0  4
      [11] .text             PROGBITS        100002b0 0002b0 000560 00  AX  0   0 16
      [12] .fini             PROGBITS        10000810 000810 000020 00  AX  0   0  4
      [13] .rodata           PROGBITS        10000830 000830 000024 00   A  0   0  4
 25   [14] .sdata2           PROGBITS        10000854 000854 000000 00   A  0   0  4
      [15] .eh_frame         PROGBITS        10000854 000854 000004 00   A  0   0  4
      [16] .ctors            PROGBITS        10010858 000858 000008 00  WA  0   0  4
      [17] .dtors            PROGBITS        10010860 000860 000008 00  WA  0   0  4
      [18] .jcr              PROGBITS        10010868 000868 000004 00  WA  0   0  4
 30   [19] .got2             PROGBITS        1001086c 00086c 000010 00  WA  0   0  1
      [20] .dynamic          DYNAMIC         1001087c 00087c 0000c8 08  WA  5   0  4
      [21] .data             PROGBITS        10010944 000944 000008 00  WA  0   0  4
      [22] .got              PROGBITS        1001094c 00094c 000014 04 WAX  0   0  4
      [23] .sdata            PROGBITS        10010960 000960 000008 00  WA  0   0  4
 35   [24] .sbss             NOBITS          10010968 000968 000000 00  WA  0   0  1
      [25] .plt              NOBITS          10010968 000968 000060 00 WAX  0   0  4
      [26] .bss              NOBITS          100109c8 000968 2800004 00  WA  0   0  4
      [27] .comment          PROGBITS        00000000 000968 00018f 00      0   0  1
      [28] .debug_aranges    PROGBITS        00000000 000af8 000078 00      0   0  8
 40   [29] .debug_pubnames   PROGBITS        00000000 000b70 000025 00      0   0  1
      [30] .debug_info       PROGBITS        00000000 000b95 0002e5 00      0   0  1
      [31] .debug_abbrev     PROGBITS        00000000 000e7a 000076 00      0   0  1
      [32] .debug_line       PROGBITS        00000000 000ef0 0001de 00      0   0  1
      [33] .debug_str        PROGBITS        00000000 0010ce 0000f0 01  MS  0   0  1
 45   [34] .shstrtab         STRTAB          00000000 0011be 00013b 00      0   0  1
      [35] .symtab           SYMTAB          00000000 0018c4 000c90 10     36  65  4
      [36] .strtab           STRTAB          00000000 002554 000909 00      0   0  1
    Key to Flags:
      W (write), A (alloc), X (execute), M (merge), S (strings)
 50   I (info), L (link order), G (group), x (unknown)
      O (extra OS processing required) o (OS specific), p (processor specific)
    
    There are no section groups in this file.
     ...
 55 
    Symbol table '.symtab' contains 201 entries:
       Num:    Value  Size Type    Bind   Vis      Ndx Name
    ...
        99: 100109cc 0x2800000 OBJECT  GLOBAL DEFAULT   26 big_big_array
 60 ...
       110: 10010960     4 OBJECT  GLOBAL DEFAULT   23 a_string
    ...
       130: 10010964     4 OBJECT  GLOBAL DEFAULT   23 a_var_with_value
    ...
 65    144: 10000430    96 FUNC    GLOBAL DEFAULT   11 main

Above we have stripped some parts of the readelf output for clarity. We can analyse each part of our simple program and see what happens to it.

Firstly, let us look at the variable big_big_array, which as the name suggests is a fairly large global array. If we skip down to the symbol table we can see that the variable is at location 0x100109cc which we can correlate to the .bss section in the section listing, since it starts just below it at 0x100109c8. Note the size, and how it is quite large. We mentioned that BSS is a standard part of a binary image since it would be silly to require that binary on disk have 10 megabytes of space allocated to it, when all of that space is going to be zero. Note that this section has a type of NOBITS meaning that it does not have any bytes on disk.

Thus the .bss section is defined for global variables whose value should be zero when the program starts. We have seen how the memory size can be different to the on disk size in our discussion of segments; variables being in the .bss section are an indication that they will be given zero value on program start.

The a_string variable lives in the .sdata section, which stands for small data. Small data (and the corresponding .sbss section) are sections that can be reached by an offset from some known pointer. This means it is much faster to get to data in the sections as there are no extra lookups and loading of addresses into memory required. On the other hand, most architectures are limited to the size of immediate values you can add to a register (e.g. saying r1 = add r2, 70; 70 is an immediate value, as opposed to say, adding two values stored in registers r1 = add r2,r3) and can thus only offset a certain "small" distance from an address (XXX).

We can also see that our a_var_with_value lives in the same place.

main however lives in the .text section, as we expect (remeber the name "text" and "code" are used interchanably to refer to a program in memory.

Sections and Segments together

Example 8.9. Sections and Segments

  1 
                  $ readelf --segments /bin/ls
    
    Elf file type is EXEC (Executable file)
  5 Entry point 0x100026c0
    There are 8 program headers, starting at offset 52
    
    Program Headers:
      Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
 10   PHDR           0x000034 0x10000034 0x10000034 0x00100 0x00100 R E 0x4
      INTERP         0x000154 0x10000154 0x10000154 0x0000d 0x0000d R   0x1
          [Requesting program interpreter: /lib/ld.so.1]
      LOAD           0x000000 0x10000000 0x10000000 0x14d5c 0x14d5c R E 0x10000
      LOAD           0x014d60 0x10024d60 0x10024d60 0x002b0 0x00b7c RWE 0x10000
 15   DYNAMIC        0x014f00 0x10024f00 0x10024f00 0x000d8 0x000d8 RW  0x4
      NOTE           0x000164 0x10000164 0x10000164 0x00020 0x00020 R   0x4
      GNU_EH_FRAME   0x014d30 0x10014d30 0x10014d30 0x0002c 0x0002c R   0x4
      GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x4
    
 20  Section to Segment mapping:
      Segment Sections...
       00
       01     .interp
       02     .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_ r .rela.dyn .rela.plt .init .text .fini .rodata .eh_frame_hdr
 25    03     .data .eh_frame .got2 .dynamic .ctors .dtors .jcr .got .sdata .sbss .p lt .bss
       04     .dynamic
       05     .note.ABI-tag
       06     .eh_frame_hdr
       07
 30

readelf shows us the segments and section mappings in the ELF file for the binary /bin/ls.

Skipping to the bottom of the output, we can see what sections have been moved into what segments. So, for example the .interp section is placed into an INTERP flagged segment. Notice that readelf tells us it is requesting the interpreter /lib/ld.so.1; this is the dynamic linker which is run to prepare the binary for execution.

Looking at the two LOAD segments we can see the distinction between text and data. Notice how the first one has only "read" and "execute" permissions, whilst the next one has read, write and execute permissions? These describe the code (r/w) and data (r/w/e) segments.

But data should not need to be executable! Indeed, on most architectures (for example, the most common x86) the data section will not be marked as having the data section executable. However, the example output above was taken from a PowerPC machine which has a slightly different programming model (ABI, see below) requiring that the data section be executable ^[25]. Such is the life of a systems programmer, where rules were made to be broken!

The other intereseting thing to note is that the file size is the same as the memory size for the code segment, however memory size is greater than the file size for the data segment. This comes from the BSS section which holds zeroed global variables.

Debugging

Tradionally the primary method of post mortem debugging is referred to as the core dump. The term core comes from the original physical characteristics of magnetic core memory, which uses the orientation of small magnetic rings to store state.

Thus a core dump is simply a complete snapshot of the program as it was running at a particular time. A debugger can then be used to examine this dump and reconstruct the program state. Example 8.10, “Example of creating a core dump and using it with gdb™” shows a sample program that writes to a random memory location in order to force a crash. At this point the processes will be halted and a dump of the current state is recorded.

Example 8.10. Example of creating a core dump and using it with gdb™

  1 
                  $ cat coredump.c
    int main(void) {
    	char *foo = (char*)0x12345;
  5 	*foo = 'a';
    
    	return 0;
    }
    
 10 $ gcc -Wall -g -o coredump coredump.c
    
    $ ./coredump
    Segmentation fault (core dumped)
    
 15 $ file ./core
    ./core: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from './coredump'
    
    $ gdb ./coredump
    ...
 20 (gdb) core core
    [New LWP 31614]
    Core was generated by `./coredump'.
    Program terminated with signal 11, Segmentation fault.
    #0  0x080483c4 in main () at coredump.c:3
 25 3		*foo = 'a';
    (gdb)

Symbols and Debugging Information

As the example shows, the debugger gdb™ requires the original executable and the core dump to provide the debugging session. Note that the original executable was built with the -g flag, which instructs the compiler to include all debugging information. Debugging information created by the compiler and is kept in special sections of the ELF file. It describes in detail things like what register values currently hold which variables used in the code, size of variables, length of arrays, etc. It is generally in the standard DWARF format (a pun on the homonym ELF).

Including debugging information can make executable files and libraries very large; although this data is not required resident in memory for actually running it can still take up considerable disk space. Thus the usual process is to strip this information from the ELF file. While it is possible to arrange for shipping of both stripped and unstripped files, most all current binary distribution methods provide the debugging information in separate files. The objcopy™ tool can be used to extract the debugging information (--only-keep-debug) and then add a link in the original executable to this stripped information (--add-gnu-debuglink). After this is done, a special section called .gnu_debuglink will be present in the original executable, which contains a hash so that when a debugging sessions starts the debugger can be sure it associates the right debugging information with the right executable.

Example 8.11. Example of stripping debugging information into separate files using objcopy™

  1 
                  $ gcc -g -shared -o libtest.so libtest.c
    $ objcopy --only-keep-debug libtest.so libtest.debug
    $ objcopy --add-gnu-debuglink=libtest.debug libtest.so
  5 $ objdump -s -j .gnu_debuglink libtest.so
    
    libtest.so:     file format elf32-i386
    
    Contents of section .gnu_debuglink:
 10  0000 6c696274 6573742e 64656275 67000000  libtest.debug...
     0010 52a7fd0a                             R...

Symbols take up much less space, but are also targets for removal from final output. Once the individual object files of an executable are linked into the single final image there is generally no need for most symbols to remain. As discussed in the section called “Symbols and Relocations” symbols are required to fix up relocation entries, but once this is done the symbols are not strictly necessary for running the final program. On Linux the GNU toolchain strip™ program provides options to remove symbols. Note that some symbols are required to be resolved at run-time (for dynamic linking, the focus of Chapter 9, Dynamic Linking) but these are put in separate dynamic symbol tables so they will not be removed and render the final output useless.

Inside coredumps

A coredump is really just another ELF file; this illustrates the flexibility of ELF as a binary format.

Example 8.12. Example of using readelf™ and eu-readelf™ to examine a coredump.

  1 
                  $ readelf --all ./core
    ELF Header:
      Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  5   Class:                             ELF32
      Data:                              2's complement, little endian
      Version:                           1 (current)
      OS/ABI:                            UNIX - System V
      ABI Version:                       0
 10   Type:                              CORE (Core file)
      Machine:                           Intel 80386
      Version:                           0x1
      Entry point address:               0x0
      Start of program headers:          52 (bytes into file)
 15   Start of section headers:          0 (bytes into file)
      Flags:                             0x0
      Size of this header:               52 (bytes)
      Size of program headers:           32 (bytes)
      Number of program headers:         15
 20   Size of section headers:           0 (bytes)
      Number of section headers:         0
      Section header string table index: 0
    
    There are no sections in this file.
 25 
    There are no sections to group in this file.
    
    Program Headers:
      Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
 30   NOTE           0x000214 0x00000000 0x00000000 0x0022c 0x00000     0
      LOAD           0x001000 0x08048000 0x00000000 0x01000 0x01000 R E 0x1000
      LOAD           0x002000 0x08049000 0x00000000 0x01000 0x01000 RW  0x1000
      LOAD           0x003000 0x489fc000 0x00000000 0x01000 0x1b000 R E 0x1000
      LOAD           0x004000 0x48a17000 0x00000000 0x01000 0x01000 R   0x1000
 35   LOAD           0x005000 0x48a18000 0x00000000 0x01000 0x01000 RW  0x1000
      LOAD           0x006000 0x48a1f000 0x00000000 0x01000 0x153000 R E 0x1000
      LOAD           0x007000 0x48b72000 0x00000000 0x00000 0x01000     0x1000
      LOAD           0x007000 0x48b73000 0x00000000 0x02000 0x02000 R   0x1000
      LOAD           0x009000 0x48b75000 0x00000000 0x01000 0x01000 RW  0x1000
 40   LOAD           0x00a000 0x48b76000 0x00000000 0x03000 0x03000 RW  0x1000
      LOAD           0x00d000 0xb771c000 0x00000000 0x01000 0x01000 RW  0x1000
      LOAD           0x00e000 0xb774d000 0x00000000 0x02000 0x02000 RW  0x1000
      LOAD           0x010000 0xb774f000 0x00000000 0x01000 0x01000 R E 0x1000
      LOAD           0x011000 0xbfeac000 0x00000000 0x22000 0x22000 RW  0x1000
 45 
    There is no dynamic section in this file.
    
    There are no relocations in this file.
    
 50 There are no unwind sections in this file.
    
    No version information found in this file.
    
    Notes at offset 0x00000214 with length 0x0000022c:
 55   Owner                 Data size	Description
      CORE                 0x00000090	NT_PRSTATUS (prstatus structure)
      CORE                 0x0000007c	NT_PRPSINFO (prpsinfo structure)
      CORE                 0x000000a0	NT_AUXV (auxiliary vector)
      LINUX                0x00000030	Unknown note type: (0x00000200)
 60 
    $ eu-readelf -n ./core
    
    Note segment of 556 bytes at offset 0x214:
      Owner          Data size  Type
 65   CORE                 144  PRSTATUS
        info.si_signo: 11, info.si_code: 0, info.si_errno: 0, cursig: 11
        sigpend: <>
        sighold: <>
        pid: 31614, ppid: 31544, pgrp: 31614, sid: 31544
 70     utime: 0.000000, stime: 0.000000, cutime: 0.000000, cstime: 0.000000
        orig_eax: -1, fpvalid: 0
        ebx:     1219973108  ecx:     1243440144  edx:              1
        esi:              0  edi:              0  ebp:     0xbfecb828
        eax:          74565  eip:     0x080483c4  eflags:  0x00010286
 75     esp:     0xbfecb818
        ds: 0x007b  es: 0x007b  fs: 0x0000  gs: 0x0033  cs: 0x0073  ss: 0x007b
      CORE                 124  PRPSINFO
        state: 0, sname: R, zomb: 0, nice: 0, flag: 0x00400400
        uid: 1000, gid: 1000, pid: 31614, ppid: 31544, pgrp: 31614, sid: 31544
 80     fname: coredump, psargs: ./coredump 
      CORE                 160  AUXV
        SYSINFO: 0xb774f414
        SYSINFO_EHDR: 0xb774f000
        HWCAP: 0xafe8fbff  <fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss tm pbe>
 85     PAGESZ: 4096
        CLKTCK: 100
        PHDR: 0x8048034
        PHENT: 32
        PHNUM: 8
 90     BASE: 0
        FLAGS: 0
        ENTRY: 0x8048300
        UID: 1000
        EUID: 1000
 95     GID: 1000
        EGID: 1000
        SECURE: 0
        RANDOM: 0xbfecba1b
        EXECFN: 0xbfecdff1
100     PLATFORM: 0xbfecba2b
        NULL
      LINUX                 48  386_TLS
        index: 6, base: 0xb771c8d0, limit: 0x000fffff, flags: 0x00000051
        index: 7, base: 0x00000000, limit: 0x00000000, flags: 0x00000028
105     index: 8, base: 0x00000000, limit: 0x00000000, flags: 0x00000028

In Example 8.12, “Example of using readelf™ and eu-readelf™ to examine a coredump.” we can see an examination of the core file produced by Example 8.10, “Example of creating a core dump and using it with gdb™” using firstly the readelf™ tool. There are no sections, relocations or other extraneous information in the file that may be required for loading an executable or library; it simply consists of a series of program headers describing LOAD segments. These segments are raw data dumps, created by the kernel, of the current memory allocations.

The other component of the core dump is the NOTE sections which contain data necessary for debugging but not necessarily captured in straight snapshot of the memory allocations. The eu-readelf™ program used in the second part of the figure provides a more complete view of the data by decoding it.

The PRSTATUS note gives a range of interesting information about the process as it was running; for example we can see from cursig that the program received a signal 11, or segmentation fault, as we would expect. Along with process number information, it also includes a dump of all the current registers. Given the register values, the debugger can reconstruct the stack state and hence provide a backtrace; combined with the symbol and debugging information from the original binary the debugger can show exactly how you reached the current point of execution.

Another interesting output is the current auxiliary vector (AUXV), discussed in the section called “Kernel communication to programs”. The 386_TLS describes global descriptor table entries used for the x86 implementation of thread-local storage (see the section called “Fast System Calls” for more information on use of segmentation, and the section called “Threads” for information on threads^[26]).

The kernel creates the core dump file within the bounds of the current ulimit settings — since a program using a lot of memory could result in a very large dump, potentially filling up disk and making problems even worse, generally the ulimit is set low or even at zero, since most non-developers have little use for a core dump file. However the core dump remains the single most useful way to debug an unexpected situation in a postmortem fashion.

^[25]For those that are curious, the PowerPC ABI calls stubs for functions in dynamic libraries directly in the GOT, rather than having them bounce through a seperate PLT entry. Thus the processor needs exectute permissions for the GOT section, which you can see is embedded in the data segment. This should make sense after reading the dyanmic linking chapter!

^[26]For a multi-threaded application, there would be duplicate entries for each thread running. The debugger will understand this, and it is how gdb™ implements the thread command to show and switch between threads.