ELF (OS side)

References:

ELF Data Types

(Elf32_Half, Elf32_Off, Elf32_Addr, Elf32_Word, Elf32_Sword)

Three important headers

  • ELF Header
  • Section Header table. Used during Linking.
  • Program Header table. Used during Execution.

ELF Header

ELF file has many headers, but only one header has fixed placement: the ELF header, present at the beginning of every file.

ELF header provides information about the file, such as the machine type, architecture and byte order, etc. as well as a means of identifying and checking whether the file is valid; also provides information about other sections in the file.

// ?
# define ELF_NIDENT	16
 
typedef struct {
	uint8_t		e_ident[ELF_NIDENT];
	Elf32_Half	e_type;
	Elf32_Half	e_machine;
	Elf32_Word	e_version;
	Elf32_Addr	e_entry;
	Elf32_Off	e_phoff;
	Elf32_Off	e_shoff;
	Elf32_Word	e_flags;
	Elf32_Half	e_ehsize;
	Elf32_Half	e_phentsize;
	Elf32_Half	e_phnum;
	Elf32_Half	e_shentsize;
	Elf32_Half	e_shnum;
	Elf32_Half	e_shstrndx;
} Elf32_Ehdr;

ELF Section Header Table

ELF file contains a lot of different types of section and their relavant headers, not all of them are present in the every file, and there is no guarantee on which order they appear in. Thus, in order to parse and process these sections, the ELF format also defines section headers, which contain information such as section names, sizes, locations and other relevant information. The list of all the section headers in an ELF image is referred to as the section header table.

// https://github.com/freebsd/freebsd/blob/master/sys/sys/elf32.h

/*
 * Section header.
 */

typedef struct {
	Elf32_Word	sh_name;	/* Section name (index into the
					   section header string table). */
	Elf32_Word	sh_type;	/* Section type. */
	Elf32_Word	sh_flags;	/* Section flags. */
	Elf32_Addr	sh_addr;	/* Address in memory image. */
	Elf32_Off	sh_offset;	/* Offset in file. */
	Elf32_Word	sh_size;	/* Size in bytes. */
	Elf32_Word	sh_link;	/* Index of a related section. */
	Elf32_Word	sh_info;	/* Depends on section type. */
	Elf32_Word	sh_addralign;	/* Alignment in bytes. */
	Elf32_Word	sh_entsize;	/* Size of each entry in section. */
} Elf32_Shdr;

sh_name does not point directly to a string. Instead it points to the offset of a string in the section name string table. The index of the table itself is defined in the ELF header by the field e_shstrndx.

sh_addr ???

sh_offset is the position in the ELF image file, as an offset from the beginning of the file.

sh_type stores the type of the section. The value is of the the enum ShT_Type (see below).

// 
enum ShT_Types {
	SHT_NULL	= 0,   // Null section
	SHT_PROGBITS	= 1,   // Program information
	SHT_SYMTAB	= 2,   // Symbol table
	SHT_STRTAB	= 3,   // String table
	SHT_RELA	= 4,   // Relocation (w/ addend)
	SHT_HASH = 5,	
	SHT_DYNAMIC = 6,	
	SHT_NOTE = 7, 	
	SHT_NOBITS = 8,	// Not present in file
	SHT_REL = 9, 	// Relocation (no addend)
	SHT_SHLIB = 10,
	SHT_DYNSYM = 11,
	SHT_LOPROC = 0x70000000,
	SHT_HIPROC = 0x7fffffff, // Values in this inclusive range are reserved for processor-specific semantics
	SHT_LOUSER = 0x80000000, // the lower bound of the range of indexes reserved for
application programs.
	SHT_HIUSER = 0xffffffff, // Section types between SHT_LOUSER and
SHT_HIUSER may be used by the application, without conflicting with
current or future system-defined section types.
};

A full list of section types in LLVM:

Section Types in LLVM

sh_addr: If the section will appear in the memory image of a process, this member gives the address at which the section’s first byte should reside. Otherwise, the member contains 0.

sh_link: This member holds a section header table index link, whose interpretation depends on the section type.

sh_info: This member holds extra information, whose interpretation depends on the section type.

Note: PROGBITS will have not sh_link and sh_info by default.

sh_link and sh_info

Another table from Oracle book:

section types and sh_link/info
sh_entsize sh_flag

sh_entsize: Some sections hold a table of fixed-size entries, such as a symbol table. For such a section, this member gives the size in bytes of each entry. The member contains 0 if the section does not hold a table of fixed-size entries.

sh_flag stores bit flags to describe the section attributes.

A list of section flags/attributes enabled in sh_flag in LLVM:

Section Flags in LLVM

To access section header:

  • e_shoff in ELF header gives the offset of first section header (NULL).
  • e_shnum in ELF header gives the total num of section headers in the file.
  • Section headers are continuous. Given pointer to the first entry, subsequent entries can be accessed with simple pointer arithmetic or array operations.

    // ?
    static inline Elf32_Shdr *elf_sheader(Elf32_Ehdr *hdr) {
    	return (Elf32_Shdr *)((int)hdr + hdr->e_shoff);
    }
     
    static inline Elf32_Shdr *elf_section(Elf32_Ehdr *hdr, int idx) {
    	return &elf_sheader(hdr)[idx];
    }

ELF Program Header Table

A program header defines information about how the ELF program behaves once it’s been loaded, as well as runtime linking information.

Files used to build a process image (execute a program) must have a program header table; relocatabe files do not need one.

ELF program headers (much like section headers) are all grouped together to make up the program header table.

// ?
typedef struct {
	Elf32_Word		p_type;
	Elf32_Off		p_offset;
	Elf32_Addr		p_vaddr;
	Elf32_Addr		p_paddr;
	Elf32_Word		p_filesz;
	Elf32_Word		p_memsz;
	Elf32_Word		p_flags;
	Elf32_Word		p_align;
} Elf32_Phdr;

Reference: Program Header (Linker and Libraries Guide)

“An excutable or shared object file’s program header table is an array of structures, each describing a segment or other information that the system needs to prepare the program for execution. An object file segment contains one or more sections”.

Segment Contents: Text segments contain read-only instructions and data. Data segments contain writable data and instructions. See more about Sengment Contenst.

A PT_DYNAMIC program header element points at the .dynamic section. The .got and .plt sections also hold information related to position-independent code and dynamic linking.

The .plt can reside in a text or a data segment, depending on the processor. See processor specific GOT, and Processor specific PLT.

The .bss section has the type SHT_NOBITS. Normally, these uninitialized data reside at the the end of the segment, thereby making p_memsz larger than p_filesz in the associated program header element.

  • p_type. The kind of segment this array element describes or how to interpret the array element’s information. Example types:
    • PT_NULL, 0.
    • PT_LOAD, 1. A loadable segment. Described by p_filesz and p_memsz. The bytes from the file are mapped to the beginning of the memory segment. If the segment’s memory size (p_memsz) is larger than the file size (p_filesz), the extra bytes are defined to hold the value 0 and to follow the segment’s initialized area. The file size cannot be larger than the memory size. Loadable segment entries in the program header table appear in ascending order, sorted on the p_vaddr member.
    • PT_DYNAMIC, 2.
    • PT_INTERP, 3.
    • PT_NOTE, 4.
    • PT_SHLIB, 5.
    • PT_PHDR, 6.
    • PT_LOSUNW, 0x6fff.fffa.
  • p_offset. The offset from the beginning of the file at which the first byte of the segment resides.
  • p_vaddr. The virtual address at which the first byte of the segment resides in the memory.
  • p_paddr. The segment’s physical address for systems in which physical addressing is relevant. Because the system ignores physical addressing for applicatin programs, this member has unspecified contents for executable files and shared objects.
  • p_filesz.
  • p_memsz.
  • p_flags. Flags relavant to the segment. Examples:
    • PF_X, 0x1, Execute
    • PF_W, 0x2, Write
    • PF_R, 0x4, Read
    • PF_MASKPROC, 0xf000,0000. Unspecified.
  • p_align. a positive, integral power of 2. p_vaddr % p_align = p_offset % p_align.

Special Sections

Various sections in ELF are pre-defined asn hold program and control information. These Sections are used by the operating system and have different types and attributes for different operating systems.

Section names with a dot . prefix are reserved for the system. Applications may use names without the prefix to avoid conflicts with system sections.

An object file may have more than one section with the same name.

Executables are created from individual object files and libraries through the linking process. The linker’s tasks include:

  • resolves the references (including subroutines and data references) among the different object files
  • adjust the absolute references in the object files
  • relocates instructions.

The linking and loading processes require information defined in the object files and store this information in specific sections such as .dynamic.

There are also sections for program control, including .bss, .data, .data1, .rodata, and .rodata1, and sections for debugging, such as .debug, .line, etc.

A list of special sections for the ELF specification:

special sections

Symbol Table Section

Symbol table is a section (or a number of sections) that defines the location, type, visibility and other traits of various symbols declared in the original source, created during compilation or linking, or otherwise present in the file.

More info Symbol Table

String Table Section

A number of consecutive zero-terminated strings.

The object file use these strings to represent symbol and section names.

.strtab, the default string table.

.shstrtab, the section string table.

.dynstr, the string table for dynamic linking.

Anytime the loading process needs access to a string, it uses an offset into one of the string tables.

sh_size specifies the size of the string table in the corresponding section header entry.

The simplest program loader may copy all string tables into memory, but a more complete solution would omit any that are not necessary during runtime. Notably those not flagged with SHF_ALLOC in their respective section header (such as .shstrtab, since section names aren’t used in program runtime).

BSS

.bss: a block of memory which has been zeroed. (global vars haven’t been init or init to 0 or null).

Type (sh_type) is SHT_NOBITS, which means not present in the object file space, but must be allocated during runtime.

BSS should be allocated before performing any operation that relies on relative addressing (such as relocation), as failing to do so can cause code to reference garbage memory or fault.

Any section that is of type SHT_NOBITS and has the attribute SHF_ALLOC should be allocated early on duing program loading.

// ?
static int elf_load_stage1(Elf32_Ehdr *hdr) {
	Elf32_Shdr *shdr = elf_sheader(hdr);
 
	unsigned int i;
	// Iterate over section headers
	for(i = 0; i < hdr->e_shnum; i++) {
		Elf32_Shdr *section = &shdr[i];
 
		// If the section isn't present in the file
		if(section->sh_type == SHT_NOBITS) {
			// Skip if it the section is empty
			if(!section->sh_size) continue;
			// If the section should appear in memory
			if(section->sh_flags & SHF_ALLOC) {
				// Allocate and zero some memory
				void *mem = malloc(section->sh_size);
				memset(mem, 0, section->sh_size);
 
				// Assign the memory offset to the section offset
				section->sh_offset = (int)mem - (int)hdr;
				DEBUG("Allocated memory for a section (%ld).\n", section->sh_size);
			}
		}
	}
	return 0;
}

Relocation Sections

Position independent code.

A relocation section is a table of relocation entries.

Two types of relocation section entry: SHT_RELA, relocation with explicit addend; SHT_REL, relocation without explicit addend. A given relocation section only have one type of entry.

// ?
typedef struct {
	Elf32_Addr		r_offset;
	Elf32_Word		r_info;
} Elf32_Rel;
 
typedef struct {
	Elf32_Addr		r_offset;
	Elf32_Word		r_info;
	Elf32_Sword		r_addend;
} Elf32_Rela;

r_info upper byte points to a symbol in the symbol table, meaning to which the relocation applies; lower byte stores the type of relocation.sh_link in the relocation section header stores the index of the symbol table section header.

Num of entries = section size sh_size / entry size sh_entsize

Each relocation table is specific to a single section.

CTOR/DTOR

.ctor/.dtor section stores the addresses of global constructor and destructors.

Global constructors are supposed to have run before your main function.

The section is a table of pointers, and each pointer is a function that must be executed as global constructor/deconstructors.

More at CTOR/DTOR

  • Loading
  • Reference How programs get run: ELF binaries How programs get run: execve() system calls Linux src: fs/binfmt_elf.c load_elf_binary() load_elf_phdrs(), load the program headers load_elf_interp(), load_elf_library(), ? elf_core_dump() ELF文件的加载过程(load_elf_binary函数详解)–Linux

  • Relocation Section
  • References: ELF specification Computer Systems: A Programmer’s Perspective, Chapter 7.7 PLT and GOT - the key to code sharing and dynamic libraries GOT and PLT for pwning Relocation is the process of connecting symbolic references with symbolic definitions. For example, when a program calls a function, the associated call instruction must transfer control to the proper destination address at execution. In other words, relocatable files must have information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process’s program image.

  • Text
  • Reference reference

  • Dwarf Debugging Information
  • References: DWARF Debugging Information Format .debug_frame section. DWARF in LLVM How debuggers work: Part 3 - Debugging information Machine code -> source code file, function name, and line numbers DWARF sections .debug sections, and all the sections begin with .debug: .debug_info .debug_loc .debug_frame … DWARF Format DWARF: Debugging Information Entry(DIE). Each DIE has a tag – its type, and a set of attributes.

  • Symbol Table Section
  • Q&A How does LLVM generate Symbol Table section in an object file? References: ELF specification Computer Systems: A Programmer’s Perspective, Chapter 7.6 Symbol Table Section The section .symtab holds a symbol table. The object file use the symbol table to locate and relocate a program’s symbolic definitions and references. First entry is always undefined symbol. If a file has a loadable segment that includes the symbol table, this symbol section’s attributes will include the SHF_ALLOC bit; otherwise the bit will be off.

  • How Global is stored and accessed in Binary?
  • Reference 1 Calling Global Constructors – CTOR/DTOR ↩

Created Jul 12, 2020 // Last Updated Aug 5, 2020

If you could revise
the fundmental principles of
computer system design
to improve security...

... what would you change?