References:
The section .symtab
holds a symbol table. The object file use the symbol table to locate and relocate a program’s symbolic definitions and references.
First entry is always undefined symbol.
If a file has a loadable segment that includes the symbol table, this symbol section’s attributes will include the SHF_ALLOC
bit; otherwise the bit will be off.
// one symbol table entry.
typedef struct {
Elf32_Word st_name; // an index into the symbol name stored in symbol string table section.
Elf32_Addr st_value; // the value of the associated symbol. may be an address, absolute value, etc.
Elf32_Word st_size; // symbol's size, e.g. data object's size; 0 if unknown, or no size.
uint8_t st_info; // symbol's type and binding attributes.
uint8_t st_other;
Elf32_Half st_shndx; // Every symbol is defined in relation to some section; this memober holds the relavant section header table index.
} Elf32_Sym;
The symbol table info st_info
holds the type and binding attributes for the symbol.
STB_LOCAL/0
, local symbols not visible outside of objec file containing their definition. Local symbols of the same name may exist in multiple files without interfering with each other.STB_GLOBAL/1
, Global symbols are visible to all object files being combined. One file’s definition of a global symbol will satisfy another file’s undefined reference to the same global symbol.STB_WEAK/2
, Weak symbols resemble global symbols, but their definitions have lower precedence. ???STB_LOPROC/13 -- STB_HIPROC/15
, processor specific semantics.STT_NOTYPE/0
, no type is specified for the symbol.STT_OBJECT/1
, a data object, such as variable, an array, and so on.STT_FUNC/2
, a function or other executable code.STT_SECTION/3
, a section symbol. This type of entry is primarily used for relocation and normally have STB_LOCAL
binding.STT_FILE/4
, a file symbol. has STB_LOCAL
binding, its section index is SHB_ABS
, and it precedes the other STB_LOCAL
symbols for the file, if it is present.STT_LOPROC/13 -- STT_HIPROC/15
, processor specific semantics.Symbol table entries for different object file types have slightly different interpretation for the symbol value st_value
member.
SHN_COMMON
. In this case, The symbol labels a common block that has not yet been allocated. The symbol’s value gives alignment constraints, similar to a section’s sh_addralign
member. That is, the link editor will allocate the storage for the symbol at an address that is a multiple of st_value
. The symbol’s size tells how many bytes are required.st_value
is an offset from the beginning of the section that st_shndx
identifies.st_value
holds a virtual address. The make these files’ symbol more useful for the dynamic linker, the section offset (file interpretation) gives way to a virtual address (memory interpretation) for which the section number is irrelavant.Each symbols is assigned to some section of the object file, denoted by the section field st_shndx
, which is an index into the section header table.
However, there are three special pseudosections that do not have entries in the section header table (Note these pseudosections exit only in relocatable object files; will be removed in executbale object files):
st_value
field gives the alignment requirement, and st_size
gives the minumum size.Distinction between COMMON
and .bss
section is subtle. Modern versions of GCC assign symbols in relocatable object files to COMMON
and .bss
using the following convention:
References:
The linker resolves symbol references by associating each reference with exactl yone symbol definition from the symbol tables of its input relocatable object files.
Symbol resolution is straightforward for references to local symbols that are defined in the same module as the reference. The compiler allows only one definition of each local symbol per module. The compiler also ensures that static local variables, which get local linker symbols, have unique names.
Resolving references to global symbols, however, is tricker. When the compiler encounters a symbol (either a variable of function name) that is not defined in the current module, it assumes that it is defined in some other module, generates a linker symbol table entry, and leaves it for the linker to handle.
Reference: Computer Systems: A Programmer’s Perspective, Chapter 7.6.1
The input to the linker is a collection of relocatable object modules. Each of these modules defines a set of symbols, some of which are local(visible only to the module that defines it), and some of which are global (visible to other modules).
If multiple modules define global symbols with the same name, Linux compilation system will use the following rules (which can cause unintential bugs).
-fno-common
or -Werror
can help to avoid some of these bugs.Reference: Computer Systems: A Programmer’s Perspective, Chapter 7.6.2
Related modules can be packed into a single file called a static library, which can then be supplied as input to the linker. When the system builds the output executable, the linker copies only the object modules in the library that are referenced by the application program.
In a static library, related functions are compiled into separate object modules and then packaged in a single static library file.
At link time, the linker will only copy the object modules that are referenced by the program, which reduces the size of the executable on disk and in memory.
On Linux systems, static libraries are stored on disk in a particular file format known as an archive. An archive is a collection of concatenated relocatable object files, with a header that describes the size and location of each member object file. Archive filenames are denoted with the .a
suffix.
Use AR
tool to create a static library: ar rcs libvector.a addvec.o multvec.o
To link a static library (say libvector.a
), into a program (say main.c
):
addvec.o
is referenced by main.o
, it will copy addvec.o
into the executable.multvec.o
is referenced by main.o
, then it will not copy this module into the executable.To resolve references using static libraries, the linker will scan the relocatable object files and archives left to right in the same sequential order that they appear on the compiler driver’s command line. During this scan, the linker maintains a set E
of relocatable object files that will be merged to form the executable, a set U
of unresolved symbols (i.e., symbols referred to but not yet defined), and a set D
of symbols that have been defined in previous input files:
E
, U
, and D
are empty.f
on the command line, the linker determines if f
is an object file or an archive.
f
to E
. Updates U
and D
to reflect the symbol definitions and references in f
and proceeds to the next input file.f
is an archive, the linker attempts to match the unresolved symbols in U
against the symbols defined by the members of the archive. If any member m
defines a symbol that resolves a reference in U
, then m
is added to E
, and the linker updates U
and D
to reflect the symbol definitions and references in m
. This process iterates over the member object files in the archive until a fixed point is reached where U
and D
no longer change. At this point, any member object files not contained in E
are simply discarded and the linker proceeds to the next file.U
is nonempty when the linker finishes scanning the input files on the command line, it prints an error and terminates. Otherwise, it merges and relocates the object files in E
to build the output executable file.Common symbol is for back-compatibility purpose. Using common symbol now is a bad practice.
References:
Common symbols are a feature that allow a programmer to define several variables of the same name in different source files. This is in contrast with the more popular way of duing extern
to reference a variable defined in another file.
Use GCC flag -fno-common
to avoid these multi-place-defined symbols by preventing compiler to use common sections.
// lld/ELF/Symbols.h
// Represents a common symbol.
//
// On Unix, it is traditionally allowed to write variable definitions
// without initialization expressions (such as "int foo;") to header
// files. Such definition is called "tentative definition".
//
// Using tentative definition is usually considered a bad practice
// because you should write only declarations (such as "extern int
// foo;") to header files. Nevertheless, the linker and the compiler
// have to do something to support bad code by allowing duplicate
// definitions for this particular case.
//
// Common symbols represent variable definitions without initializations.
// The compiler creates common symbols when it sees varaible definitions
// without initialization (you can suppress this behavior and let the
// compiler create a regular defined symbol by -fno-common).
//
// The linker allows common symbols to be replaced by regular defined
// symbols. If there are remaining common symbols after name resolution is
// complete, they are converted to regular defined symbols in a .bss
// section. (Therefore, the later passes don't see any CommonSymbols.)
class CommonSymbol : public Symbol {
public:
CommonSymbol(InputFile *file, StringRefZ name, uint8_t binding,
uint8_t stOther, uint8_t type, uint64_t alignment, uint64_t size)
: Symbol(CommonKind, file, name, binding, stOther, type),
alignment(alignment), size(size) {}
static bool classof(const Symbol *s) { return s->isCommon(); }
uint32_t alignment;
uint64_t size;
};
If you could revise
the fundmental principles of
computer system design
to improve security...
... what would you change?