Loading

Reference

Program Header Table

An ELF file for an executable program (rather than a shared library or an object file), must always contain a program header table near the start of the file, after the ELF header; each entry in this table provides information that is needed to run the program. elf The kernel only really cares about three types of program header entries:

  • Entry with PT_LOAD type. This type of entry describes areas of the new program’s running memory. This includes code and data sections that come from the executable file, together with the size of a BSS section.
  • Entry with PT_INTERP type. This type of entry identifies the run-time linker needed to assemble the complete program during dynamic linking process.
  • Entry with GNU_STACK type. This type of entry stores a one-bit value flagging whether the program’s stack should be made executable or not.

Initial Load: struct linux_binprm

do_execve() in fs/exec.c. The main purpose of this function is to build a new struct linux_binprm instance that describes the current program invocation operation.

  • char buf[BINPRM_BUF_SIZE]; is filled with the first chunk (128 bytes) of data from the program file. This data will be used later to detect the binary format so it can be processed appropriately.
  • struct file *file.
  • const char * filename; name of the binary as seen by procps
  • const char * interp; name the the binary really executed. Most of the time the same as filename, but could be differnt for binfmt_{misc,script} */
  • bprm_mm_init() function. (not found in latest Linux)
  • p is set to point at the end of memory space. The value of p will be updated (downward) as more information is added to the new program’s stack.
  • cred is a separately allocated object of type struct cred.

Next, the information about the program invocation is copied into the top of new program’s stack, using the local copy_strings() and copy_strings_kernel() utility functions. The stack will contain: file name,

struct linux_binprm

do_execve definition:

do_execve()

Determine Binary Handler: struct linux_binfmt

With a completed struct linux_binprm, program exection is performed in exec_binprm() and (more importantly) search_binary_handler(). search_binary_handler() iterates over a list of struct linux_binfmt objects, each of which provides a handler for a particular format of binary programs.

For each struct linux_binfmt handler object, the load_binary() function pointer is called, passing in the linux_binprm object. If the handler code supports the binary format, it does whatever is needed to prepare the program for execution and return success (>=0). Otherwise, the handler returns a failure code (<0) and iteration continues with the next handler.

If no format that can handle the program have been found (and the program appears to be binary rather than text, at least according to the first four bytes), then the code will also attempt to load a module named “binfmt-XXXX”, where XXXX is the hex value of bytes three and four in the program file. This an old mechanism (added in 1996 for Linux 1.3.57) to allow for a more dynamic way of associating binary format handlers with formats; the more recent binfmt_misc mechanism allows a mroe flexible way of doing something similar.

Loading ELF binary

Handled by load_elf_binary() function.

  • check ELF header follows ELF format.
  • read the full ELF program header into some scratch space.
  • Preparing attributes of the new program
    • loop over the program header entries, checking for an PT_INTERP and whether the program’s stack should be executable from PT_GUN_STACK entry.
    • Initialize the new program with these attributes that are not inherited from the old programs;
    • more info about the attribute behaviors from the exec specification in Single Unix Specification
    • see a summary of attributes involved from table 28-4 of The linux programing interface
  • call flush_old_exec(): clears up the state in the kernel that refers to the previous program.
    • previous threads,
    • signal-handling info,
    • timers,
    • location of the exe file (visible at /proc/pid/exe)
    • virtual memory mappings, I/O operations, uprobes.
    • personality
  • call setup_new_exec(): set up a kernel’s internal state for the program.
    • call __set_task_comm(): set the task’s comm field, the basename of the invoked file name, used as the thread name, accessible to user space via PR_GET_NAME PR_SET_NAME prctl() operations.
    • call flush_singal_handlers(): set up singal handlers for the new program.
    • call do_close_on_exec(): close all of the old program’s file descriptors that have the O_CLOEXEC flag set; other file descriptors will be inherited by the new program.
  • Set up virtual memory.

  • set up credentials: install_exec_creds(). This function let any Linux Security Module(LSM) know about the change in credentials(through the bprm_committing_creds and bprm_committed_creds LSM hooks), and the inner commit_creds() function performs the assignment.

  • launch the program.

    • the saved user-space CPU registers are overwritten with suitable values for the start of the new program.
    • call start_thread() sets the saved instruction pointer to the entry point of the program (or dynamic linker), and sets the saved stack pointer to the entry point of the top of the stack (from the p field in linux_binprm).
    • execve() returns to the user space.
    • user space process start execution with a new memory space, with the new program being loaded in.
  • Init and Fini sections
  • References: Initialization and Termination Sections Dynamic object can supply code that provides for runtime initialization and termination processing. The initialization code of a dynamic object is executed once each time the dynamic object is loaded in a process. The termination code of a dynamic object is executed once each time the dynamic object is unloaded from a process or at process termination. This code can be encapsulated in one of two section types, either an array of function pointers or a single code block.

  • Freebsd
  • Q&A How to load a section from file and read its contents? see example code parse_notes() in sys/kern/imgact_elf.c ———————– References: sys/kern/kern_exec.c sys/sys/elf_generic.h sys/kern/imgact_elf.c The overview of how a execve system call loads a binary and start executing a binary file is discussed based on the Linux previously. Here this post aims to track more details on the source code about how the ELF file is parsed and mapped into the address space in the FreeBSD.

Created Jul 30, 2020 // Last Updated Jul 30, 2020

If you could revise
the fundmental principles of
computer system design
to improve security...

... what would you change?