Reference
load_elf_binary()
load_elf_phdrs()
, load the program headersload_elf_interp()
,load_elf_library()
, ?elf_core_dump()
An ELF file for an executable program (rather than a shared library or an object file), must always contain a program header table near the start of the file, after the ELF header; each entry in this table provides information that is needed to run the program. elf The kernel only really cares about three types of program header entries:
PT_LOAD
type. This type of entry describes areas of the new program’s running memory. This includes code and data sections that come from the executable file, together with the size of a BSS
section.PT_INTERP
type. This type of entry identifies the run-time linker needed to assemble the complete program during dynamic linking process.GNU_STACK
type. This type of entry stores a one-bit value flagging whether the program’s stack should be made executable or not.do_execve()
in fs/exec.c
. The main purpose of this function is to build a new struct linux_binprm
instance that describes the current program invocation operation.
char buf[BINPRM_BUF_SIZE];
is filled with the first chunk (128 bytes) of data from the program file. This data will be used later to detect the binary format so it can be processed appropriately.struct file *file
.const char * filename;
name of the binary as seen by procpsconst char * interp;
name the the binary really executed. Most of the time the same as filename, but could be differnt for binfmt_{misc,script} */bprm_mm_init()
function. (not found in latest Linux)p
is set to point at the end of memory space. The value of p will be updated (downward) as more information is added to the new program’s stack.cred
is a separately allocated object of type struct cred
.Next, the information about the program invocation is copied into the top of new program’s stack, using the local copy_strings()
and copy_strings_kernel()
utility functions. The stack will contain: file name,
// include/linux/binfmts.h
/*
* This structure is used to hold the arguments that are used when loading binaries.
*/
struct linux_binprm {
#ifdef CONFIG_MMU
struct vm_area_struct *vma;
unsigned long vma_pages;
#else
# define MAX_ARG_PAGES 32
struct page *page[MAX_ARG_PAGES];
#endif
struct mm_struct *mm;
unsigned long p; /* current top of mem */
unsigned long argmin; /* rlimit marker for copy_strings() */
unsigned int
/*
* True after the bprm_set_creds hook has been called once
* (multiple calls can be made via prepare_binprm() for
* binfmt_script/misc).
*/
called_set_creds:1,
/*
* True if most recent call to the commoncaps bprm_set_creds
* hook (due to multiple prepare_binprm() calls from the
* binfmt_script/misc handlers) resulted in elevated
* privileges.
*/
cap_elevated:1,
/*
* Set by bprm_set_creds hook to indicate a privilege-gaining
* exec has happened. Used to sanitize execution environment
* and to set AT_SECURE auxv for glibc.
*/
secureexec:1,
/*
* Set by flush_old_exec, when exec_mmap has been called.
* This is past the point of no return, when the
* exec_update_mutex has been taken.
*/
called_exec_mmap:1;
#ifdef __alpha__
unsigned int taso:1;
#endif
unsigned int recursion_depth; /* only for search_binary_handler() */
struct file * file;
struct cred *cred; /* new credentials */
int unsafe; /* how unsafe this exec is (mask of LSM_UNSAFE_*) */
unsigned int per_clear; /* bits to clear in current->personality */
int argc, envc;
const char * filename; /* Name of binary as seen by procps */
const char * interp; /* Name of the binary really executed. Most
of the time same as filename, but could be
different for binfmt_{misc,script} */
unsigned interp_flags;
unsigned interp_data;
unsigned long loader, exec;
struct rlimit rlim_stack; /* Saved RLIMIT_STACK used during exec. */
char buf[BINPRM_BUF_SIZE];
} __randomize_layout;
do_execve
definition:
// fs/exec.c
// https://elixir.bootlin.com/linux/latest/source/fs/exec.c
do_execve() -> do_execveat_common() -> __do_execve_file()
/*
* sys_execve() executes a new program.
*/
static int __do_execve_file(int fd, struct filename *filename,
struct user_arg_ptr argv,
struct user_arg_ptr envp,
int flags, struct file *file)
{
char *pathbuf = NULL;
struct linux_binprm *bprm;
struct files_struct *displaced;
int retval;
if (IS_ERR(filename))
return PTR_ERR(filename);
/*
* We move the actual failure in case of RLIMIT_NPROC excess from
* set*uid() to execve() because too many poorly written programs
* don't check setuid() return code. Here we additionally recheck
* whether NPROC limit is still exceeded.
*/
if ((current->flags & PF_NPROC_EXCEEDED) &&
atomic_read(¤t_user()->processes) > rlimit(RLIMIT_NPROC)) {
retval = -EAGAIN;
goto out_ret;
}
/* We're below the limit (still or again), so we don't want to make
* further execve() calls fail. */
current->flags &= ~PF_NPROC_EXCEEDED;
retval = unshare_files(&displaced);
if (retval)
goto out_ret;
retval = -ENOMEM;
bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
if (!bprm)
goto out_files;
retval = prepare_bprm_creds(bprm);
if (retval)
goto out_free;
check_unsafe_exec(bprm);
current->in_execve = 1;
if (!file)
file = do_open_execat(fd, filename, flags);
retval = PTR_ERR(file);
if (IS_ERR(file))
goto out_unmark;
sched_exec();
bprm->file = file;
if (!filename) {
bprm->filename = "none";
} else if (fd == AT_FDCWD || filename->name[0] == '/') {
bprm->filename = filename->name;
} else {
if (filename->name[0] == '\0')
pathbuf = kasprintf(GFP_KERNEL, "/dev/fd/%d", fd);
else
pathbuf = kasprintf(GFP_KERNEL, "/dev/fd/%d/%s",
fd, filename->name);
if (!pathbuf) {
retval = -ENOMEM;
goto out_unmark;
}
/*
* Record that a name derived from an O_CLOEXEC fd will be
* inaccessible after exec. Relies on having exclusive access to
* current->files (due to unshare_files above).
*/
if (close_on_exec(fd, rcu_dereference_raw(current->files->fdt)))
bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
bprm->filename = pathbuf;
}
bprm->interp = bprm->filename;
retval = bprm_mm_init(bprm);
if (retval)
goto out_unmark;
retval = prepare_arg_pages(bprm, argv, envp);
if (retval < 0)
goto out;
retval = prepare_binprm(bprm);
if (retval < 0)
goto out;
retval = copy_strings_kernel(1, &bprm->filename, bprm);
if (retval < 0)
goto out;
bprm->exec = bprm->p;
retval = copy_strings(bprm->envc, envp, bprm);
if (retval < 0)
goto out;
retval = copy_strings(bprm->argc, argv, bprm);
if (retval < 0)
goto out;
retval = exec_binprm(bprm);
if (retval < 0)
goto out;
/* execve succeeded */
current->fs->in_exec = 0;
current->in_execve = 0;
rseq_execve(current);
acct_update_integrals(current);
task_numa_free(current, false);
free_bprm(bprm);
kfree(pathbuf);
if (filename)
putname(filename);
if (displaced)
put_files_struct(displaced);
return retval;
out:
if (bprm->mm) {
acct_arg_size(bprm, 0);
mmput(bprm->mm);
}
out_unmark:
current->fs->in_exec = 0;
current->in_execve = 0;
out_free:
free_bprm(bprm);
kfree(pathbuf);
out_files:
if (displaced)
reset_files_struct(displaced);
out_ret:
if (filename)
putname(filename);
return retval;
}
With a completed struct linux_binprm
, program exection is performed in exec_binprm()
and (more importantly) search_binary_handler()
.
search_binary_handler()
iterates over a list of struct linux_binfmt
objects, each of which provides a handler for a particular format of binary programs.
For each struct linux_binfmt
handler object, the load_binary()
function pointer is called, passing in the linux_binprm
object. If the handler code supports the binary format, it does whatever is needed to prepare the program for execution and return success (>=0). Otherwise, the handler returns a failure code (<0) and iteration continues with the next handler.
If no format that can handle the program have been found (and the program appears to be binary rather than text, at least according to the first four bytes), then the code will also attempt to load a module named “binfmt-XXXX”, where XXXX is the hex value of bytes three and four in the program file. This an old mechanism (added in 1996 for Linux 1.3.57) to allow for a more dynamic way of associating binary format handlers with formats; the more recent binfmt_misc
mechanism allows a mroe flexible way of doing something similar.
Handled by load_elf_binary()
function.
PT_INTERP
and whether the program’s stack should be executable from PT_GUN_STACK
entry.exec
specification in Single Unix Specificationflush_old_exec()
: clears up the state in the kernel that refers to the previous program.
/proc/pid/exe
)setup_new_exec()
: set up a kernel’s internal state for the program.
__set_task_comm()
: set the task’s comm
field, the basename of the invoked file name, used as the thread name, accessible to user space via PR_GET_NAME
PR_SET_NAME
prctl()
operations.flush_singal_handlers()
: set up singal handlers for the new program.do_close_on_exec()
: close all of the old program’s file descriptors that have the O_CLOEXEC
flag set; other file descriptors will be inherited by the new program.Set up virtual memory.
setup_arg_pages()
: set up the kernel’s memory tracking structures and update new location of the stack.PT_LOAD
segments in the program fileset up credentials: install_exec_creds()
. This function let any Linux Security Module(LSM) know about the change in credentials(through the bprm_committing_creds
and bprm_committed_creds
LSM hooks), and the inner commit_creds()
function performs the assignment.
launch the program.
start_thread()
sets the saved instruction pointer to the entry point of the program (or dynamic linker), and sets the saved stack pointer to the entry point of the top of the stack (from the p
field in linux_binprm
).execve()
returns to the user space.References: Initialization and Termination Sections Dynamic object can supply code that provides for runtime initialization and termination processing. The initialization code of a dynamic object is executed once each time the dynamic object is loaded in a process. The termination code of a dynamic object is executed once each time the dynamic object is unloaded from a process or at process termination. This code can be encapsulated in one of two section types, either an array of function pointers or a single code block.
Q&A How to load a section from file and read its contents? see example code parse_notes() in sys/kern/imgact_elf.c ———————– References: sys/kern/kern_exec.c sys/sys/elf_generic.h sys/kern/imgact_elf.c The overview of how a execve system call loads a binary and start executing a binary file is discussed based on the Linux previously. Here this post aims to track more details on the source code about how the ELF file is parsed and mapped into the address space in the FreeBSD.
If you could revise
the fundmental principles of
computer system design
to improve security...
... what would you change?