.text
sections from two object (relocatable) files into one executable binary?
.dot
according to each section.LinkerScript::assignAddresses()
Writer<ELFT>::finalizeSections()
=> Writer<ELFT>::finalizeAddressDependentContent()
=> LinkerScript::assignAddresses()
;.text
sections?References:
Mapping between an InputSection
and an OutputSection
: N to 1 Mapping.
// lld/ELF/InputSection.h
// This is a section that is added directly to an output section
// instead of needing special combination via a synthetic section. This
// includes all input sections with the exceptions of SHF_MERGE and
// .eh_frame. It also includes the synthetic sections themselves.
class InputSection : public InputSectionBase {
public:
...
};
Each input section is assigned to at most one output section. One output section contain one or more input sections.
The function that get the output section that holds the input section is getOutputSection()
, or getParent()
.
// lld/ELF/InputSection.cpp
OutputSection *SectionBase::getOutputSection() {
InputSection *sec;
if (auto *isec = dyn_cast<InputSection>(this))
sec = isec;
else if (auto *ms = dyn_cast<MergeInputSection>(this))
sec = ms->getParent();
else if (auto *eh = dyn_cast<EhInputSection>(this))
sec = eh->getParent();
else
return cast<OutputSection>(this);
return sec ? sec->getParent() : nullptr;
}
//
OutputSection *InputSection::getParent() const {
return cast_or_null<OutputSection>(parent);
}
Tracking the parent
of an InputSection
(InputSectionBase
):
// This corresponds to a section of an input file.
class InputSectionBase : public SectionBase {
// Input sections are part of an output section. Special sections
// like .eh_frame and merge sections are first combined into a
// synthetic section that is then added to an output section. In all
// cases this points one level up.
SectionBase *parent = nullptr;
OutputSection *getParent() const;
};
// lld/ELF/OutputSection.h
// This represents a section in an output file.
// It is composed of multiple InputSections.
// The writer creates multiple OutputSections and assign them unique,
// non-overlapping file offsets and VAs.
class OutputSection final : public BaseCommand, public SectionBase {
public:
OutputSection(StringRef name, uint32_t type, uint64_t flags);
static bool classof(const SectionBase *s) {
return s->kind() == SectionBase::Output;
}
...
};
In LinkerScript
, the sectionCommands
vector stores all the output sections as BaseCommand *
.
Another vector, outputSections
, also stores the output sections.
What is the difference here?
sectionCommands
is filled in during addOrphanSections
. It:sectionCommands
.outputSections
??finalizeSections()
sectionCommands
contain 49 sections. Each section addr is 0x0, same as input file..got
.outputSections
contain 0 sections.finalizeSections()
sortSections();
, the sectionCommands
is scanned and added to outputSections
if the section is of type OutputSection
. See code below.createPhdrs()
: outputSections
has 39 sections.finalizeSections()
,
sectionCommands
contain 39 sections.outputSections
contain 39 sections.Code picking an output section from sectionCommands
to put it in outputSections
:
// Create output section objects and add them to OutputSections.
template <class ELFT> void Writer<ELFT>::finalizeSections() {
...
sortSections();
// Now that we have the final list, create a list of all the
// OutputSections for convenience.
for (BaseCommand *base : script->sectionCommands)
if (auto *sec = dyn_cast<OutputSection>(base))
outputSections.push_back(sec);
}
// lld/ELF/LinkerScript.h
class LinkerScript final{
// SECTIONS command list.
std::vector<BaseCommand *> sectionCommands;
};
Sections have the same name from different object files (input sections) will me merged into one output section. A list of input sections can be retrieved from each output section. For example, the following code in LLVM LLD iterates each output section and list all the input sections it holds:
// lld/ELF/Writer.cpp
// check ownership section
// refer code from checkExecuteOnly for iteration through all sections
for (OutputSection *os : outputSections){
DebugLL("Output Section " + os->name +
" has the following input sections:\n");
for (InputSection *isec : getInputSections(os)){
DebugLLS("\t\tInput Section:" + toString(isec) + "\n");
}
}
In LLD, defined as PhdrEntry
// lld/ELF/Writer.h
// This describes a program header entry.
// Each contains type, access flags and range of output sections that will be
// placed in it.
struct PhdrEntry {
PhdrEntry(unsigned type, unsigned flags)
: p_align(type == llvm::ELF::PT_LOAD ? config->maxPageSize : 0),
p_type(type), p_flags(flags) {}
void add(OutputSection *sec);
uint64_t p_paddr = 0;
uint64_t p_vaddr = 0;
uint64_t p_memsz = 0;
uint64_t p_filesz = 0;
uint64_t p_offset = 0;
uint32_t p_align = 0;
uint32_t p_type = 0;
uint32_t p_flags = 0;
OutputSection *firstSec = nullptr;
OutputSection *lastSec = nullptr;
bool hasLMA = false;
uint64_t lmaOffset = 0;
};
The method add(OutputSection *sec)
, will put one output section to the segment this entry describes.
The class only tracks:
lastSec
, the last section being added to this segment;firstSec
, the first section being added to this segment;p_align
, the alignment of this segment, it is a largest alignment required by any output section it holds;update the output section’s ptLoad
field. The ptLoad
field in the output section is used to compute the section offset for the sections in PT_LOAD
segment. Formula: Off = Off_first + VA - VA_first
, where Off_first
and VA_first
is file offset and VA of the first section in PT_LOAD. Questions:
Where does this section offset is used and stored to? Can the OS access it?
// lld/ELF/Writer.cpp
void PhdrEntry::add(OutputSection *sec) {
lastSec = sec;
if (!firstSec)
firstSec = sec;
p_align = std::max(p_align, sec->alignment);
if (p_type == PT_LOAD)
sec->ptLoad = this;
}
Program header entries are grouped into partitions in LLD, but there is only one paritition as seen in the testing code. See LLD Partitions below.
Call Path:
// lld/ELF/{Driver|Writer}.cpp
LinkerDriver::link<ELFT>() // lld/ELF/Driver.cpp
=> writeResult<ELFT>() // lld/ELF/Driver.cpp
=> Writer<ELFT>().run() // lld/ELF/Writer.cpp
=> Writer<ELFT>::finalizeSections() // lld/ELF/Writer.cpp
=> ...
=> for (Partition &part: partitions)
part.phdrs = script->hasPhdrsCommands()?
script->createPhdrs() : createPhdrs(part)
=> createPhdrs(part)
{
// lld/ELF/Writer.cpp: Writer<ELFT>::createPhdrs(Partition &part)
addHdr(PT_PHDR, PF_R);
..
addHdr(PT_INTERP, flags);
..
addHdr(PT_LOAD, flags);
..
// create a new PT_LOAD segment when a section has new flags.
uint64_t newFlags = computeFlags(sec->getPhdrFlags());
load = addHdr(PT_LOAD, newFlags);
load->add(sec);
..
addHdr(PT_GNU_EH_FRAME, ...);
..
addHdr(PT_GNU_STACK, perm)->p_memsz = config->zStackSize;
..
note = addHdr(PT_NOTE, PF_R);
note->add(sec);
..
// here we add program header entry for capsule ownership
addHdr(PT_CAPSULE_OWNERSHIP, own_flag)->add(own_text);
addHdr(PT_CAPSULE_OWNERSHIP, own_flag)->add(own_data);
}
If the linker script does not give program header commands (which is the usual case),
then the writer will create program headers according to the default policies.
This is implemented in Writer<ELFT>::createPhdrs(Partition &part)
.
Steps in the code to determine which section goes to which program header entry:
addHdr
.)// lld/ELF/Writer.cpp
// Decide which program headers to create and which sections to include in each
// one.
template <class ELFT>
std::vector<PhdrEntry *> Writer<ELFT>::createPhdrs(Partition &part) {
std::vector<PhdrEntry *> ret;
auto addHdr = [&](unsigned type, unsigned flags) -> PhdrEntry * {
ret.push_back(make<PhdrEntry>(type, flags));
return ret.back();
};
unsigned partNo = part.getNumber();
bool isMain = partNo == 1;
// Add the first PT_LOAD segment for regular output sections.
uint64_t flags = computeFlags(PF_R);
PhdrEntry *load = nullptr;
// nmagic or omagic output does not have PT_PHDR, PT_INTERP, or the readonly
// PT_LOAD.
if (!config->nmagic && !config->omagic) {
// The first phdr entry is PT_PHDR which describes the program header
// itself.
if (isMain)
addHdr(PT_PHDR, PF_R)->add(Out::programHeaders);
else
addHdr(PT_PHDR, PF_R)->add(part.programHeaders->getParent());
// PT_INTERP must be the second entry if exists.
if (OutputSection *cmd = findSection(".interp", partNo))
addHdr(PT_INTERP, cmd->getPhdrFlags())->add(cmd);
// Add the headers. We will remove them if they don't fit.
// In the other partitions the headers are ordinary sections, so they don't
// need to be added here.
if (isMain) {
load = addHdr(PT_LOAD, flags);
load->add(Out::elfHeader);
load->add(Out::programHeaders);
}
}
// PT_GNU_RELRO includes all sections that should be marked as
// read-only by dynamic linker after processing relocations.
// Current dynamic loaders only support one PT_GNU_RELRO PHDR, give
// an error message if more than one PT_GNU_RELRO PHDR is required.
PhdrEntry *relRo = make<PhdrEntry>(PT_GNU_RELRO, PF_R);
bool inRelroPhdr = false;
OutputSection *relroEnd = nullptr;
for (OutputSection *sec : outputSections) {
if (sec->partition != partNo || !needsPtLoad(sec))
continue;
if (isRelroSection(sec)) {
inRelroPhdr = true;
if (!relroEnd)
relRo->add(sec);
else
error("section: " + sec->name + " is not contiguous with other relro" +
" sections");
} else if (inRelroPhdr) {
inRelroPhdr = false;
relroEnd = sec;
}
}
for (OutputSection *sec : outputSections) {
if (!(sec->flags & SHF_ALLOC))
break;
if (!needsPtLoad(sec))
continue;
// Normally, sections in partitions other than the current partition are
// ignored. But partition number 255 is a special case: it contains the
// partition end marker (.part.end). It needs to be added to the main
// partition so that a segment is created for it in the main partition,
// which will cause the dynamic loader to reserve space for the other
// partitions.
if (sec->partition != partNo) {
if (isMain && sec->partition == 255)
addHdr(PT_LOAD, computeFlags(sec->getPhdrFlags()))->add(sec);
continue;
}
// Segments are contiguous memory regions that has the same attributes
// (e.g. executable or writable). There is one phdr for each segment.
// Therefore, we need to create a new phdr when the next section has
// different flags or is loaded at a discontiguous address or memory
// region using AT or AT> linker script command, respectively. At the same
// time, we don't want to create a separate load segment for the headers,
// even if the first output section has an AT or AT> attribute.
uint64_t newFlags = computeFlags(sec->getPhdrFlags());
bool sameLMARegion =
load && !sec->lmaExpr && sec->lmaRegion == load->firstSec->lmaRegion;
if (!(load && newFlags == flags && sec != relroEnd &&
sec->memRegion == load->firstSec->memRegion &&
(sameLMARegion || load->lastSec == Out::programHeaders))) {
load = addHdr(PT_LOAD, newFlags);
flags = newFlags;
}
load->add(sec);
}
// Add a TLS segment if any.
PhdrEntry *tlsHdr = make<PhdrEntry>(PT_TLS, PF_R);
for (OutputSection *sec : outputSections)
if (sec->partition == partNo && sec->flags & SHF_TLS)
tlsHdr->add(sec);
if (tlsHdr->firstSec)
ret.push_back(tlsHdr);
// Add an entry for .dynamic.
if (OutputSection *sec = part.dynamic->getParent())
addHdr(PT_DYNAMIC, sec->getPhdrFlags())->add(sec);
if (relRo->firstSec)
ret.push_back(relRo);
// PT_GNU_EH_FRAME is a special section pointing on .eh_frame_hdr.
if (part.ehFrame->isNeeded() && part.ehFrameHdr &&
part.ehFrame->getParent() && part.ehFrameHdr->getParent())
addHdr(PT_GNU_EH_FRAME, part.ehFrameHdr->getParent()->getPhdrFlags())
->add(part.ehFrameHdr->getParent());
// PT_OPENBSD_RANDOMIZE is an OpenBSD-specific feature. That makes
// the dynamic linker fill the segment with random data.
if (OutputSection *cmd = findSection(".openbsd.randomdata", partNo))
addHdr(PT_OPENBSD_RANDOMIZE, cmd->getPhdrFlags())->add(cmd);
if (config->zGnustack != GnuStackKind::None) {
// PT_GNU_STACK is a special section to tell the loader to make the
// pages for the stack non-executable. If you really want an executable
// stack, you can pass -z execstack, but that's not recommended for
// security reasons.
unsigned perm = PF_R | PF_W;
if (config->zGnustack == GnuStackKind::Exec)
perm |= PF_X;
addHdr(PT_GNU_STACK, perm)->p_memsz = config->zStackSize;
}
// PT_OPENBSD_WXNEEDED is a OpenBSD-specific header to mark the executable
// is expected to perform W^X violations, such as calling mprotect(2) or
// mmap(2) with PROT_WRITE | PROT_EXEC, which is prohibited by default on
// OpenBSD.
if (config->zWxneeded)
addHdr(PT_OPENBSD_WXNEEDED, PF_X);
if (OutputSection *cmd = findSection(".note.gnu.property", partNo))
addHdr(PT_GNU_PROPERTY, PF_R)->add(cmd);
// Create one PT_NOTE per a group of contiguous SHT_NOTE sections with the
// same alignment.
PhdrEntry *note = nullptr;
for (OutputSection *sec : outputSections) {
if (sec->partition != partNo)
continue;
if (sec->type == SHT_NOTE && (sec->flags & SHF_ALLOC)) {
if (!note || sec->lmaExpr || note->lastSec->alignment != sec->alignment)
note = addHdr(PT_NOTE, PF_R);
note->add(sec);
} else {
note = nullptr;
}
}
return ret;
}
Refrences:
LLD’s partition feature allows a program (which may be an executable or a shared library) to be split inptu multiple pieces, or partitions. A partitioned program consists of a main partitin together with a number of loadable partions. The loadable partitions depend on the main partition in a similar way to a regular ELF shared object dependency, but unlike a shared object, the main partition and the loadable partitions share a virtual address space at link time, and each loadable partition is assigned a fixed offset from the main partition. This allows the loadable partitions to refer to code and data in the main parition directly withtout the binary size and performance overhead of PLTs, GOTs or symbol table entries.
Call Path:
// lld/ELF/{Driver|Writer}.cpp
LinkerDriver::link<ELFT>() // lld/ELF/Driver.cpp
=> writeResult<ELFT>() // lld/ELF/Driver.cpp
=> Writer<ELFT>().run() // lld/ELF/Writer.cpp
=> Writer<ELFT>::finalizeSections() // lld/ELF/Writer.cpp
=> ...
// Some symbols are defined in term of program headers. Now that we
// have the headers, we can find out which sections they point to.
setReservedSymbolSections();
finalizeSynthetic(in.bss);
finalizeSynthetic(in.bssRelRo);
finalizeSynthetic(in.symTabShndx);
finalizeSynthetic(in.shStrTab);
finalizeSynthetic(in.strTab);
finalizeSynthetic(in.got);
finalizeSynthetic(in.mipsGot);
finalizeSynthetic(in.igotPlt);
finalizeSynthetic(in.gotPlt);
finalizeSynthetic(in.relaIplt);
finalizeSynthetic(in.relaPlt);
finalizeSynthetic(in.plt);
finalizeSynthetic(in.iplt);
finalizeSynthetic(in.ppc32Got2);
finalizeSynthetic(in.partIndex);
“This function assign address as instructed by linker script SECTIONS sub-commands. Doing that allows us to use final VA values.”
After this, all output sections have their addr
field being set to be the virtual address which is the address referred by the loader when they are being loaded.
.dot
according to each section.LinkerScript::assignAddresses()
Writer<ELFT>::finalizeSections()
=> Writer<ELFT>::finalizeAddressDependentContent()
=> LinkerScript::assignAddresses()
;Call Path:
Writer::finalizeSections()
=> Writer<ELFT>::finalizeAddressDependentContent()
=> script->assignAddresses(); // multiple places
Writer::finalizeSections()
=> Writer<ELFT>::optimizeBasicBlockJumps() // if (config->optmizeBBJumps)
=> script->assignAddresses();
const Defined *LinkerScript::assignAddresses() // lld/ELF/LinkerScript.cpp
=>
// lld/ELF/LinkerScript.cpp
// Here we assign addresses as instructed by linker script SECTIONS
// sub-commands. Doing that allows us to use final VA values, so here
// we also handle rest commands like symbol assignments and ASSERTs.
// Returns a symbol that has changed its section or value, or nullptr if no
// symbol has changed.
const Defined *LinkerScript::assignAddresses() {
if (script->hasSectionsCommand) {
// With a linker script, assignment of addresses to headers is covered by
// allocateHeaders().
dot = config->imageBase.getlaterValueOr(0);
} else {
// Assign addresses to headers right now.
dot = target->getImageBase();
Out::elfHeader->addr = dot;
Out::programHeaders->addr = dot + Out::elfHeader->size;
dot += getHeaderSize();
}
auto deleter = std::make_unique<AddressState>();
ctx = deleter.get();
errorOnMissingSection = true;
switchTo(aether);
SymbolAssignmentMap oldValues = getSymbolAssignmentValues(sectionCommands);
for (BaseCommand *base : sectionCommands) {
if (auto *cmd = dyn_cast<SymbolAssignment>(base)) {
cmd->addr = dot;
assignSymbol(cmd, false);
cmd->size = dot - cmd->addr;
continue;
}
assignOffsets(cast<OutputSection>(base));
}
ctx = nullptr;
return getChangedSymbolAssignment(oldValues);
}
// lld/ELF/LinkerScript.h
// Temporary state used in processSectionCommands() and assignAddresses()
// that must be reinitialized for each call to the above functions, and must
// not be used outside of the scope of a call to the above functions.
struct AddressState {
AddressState();
uint64_t threadBssOffset = 0;
OutputSection *outSec = nullptr;
MemoryRegion *memRegion = nullptr;
MemoryRegion *lmaRegion = nullptr;
uint64_t lmaOffset = 0;
};
Reference reference phdrsCommands (empty for most binary) In LinkerScript: // lld/ELF/LinkerScript.h classLinkerScript final{ // PHDRS command list. std::vector<PhdrsCommand> phdrsCommands; }; In LLD Code updated in two ways; based on the linker script if exists; by the linker itself if no linker script is given (true for binary and relocatable outputs); The driven function to update the program header commands, in void Writer<ELFT>::finalizeSections() // lld/ELF/Writer.cpp // Create output section objects and add them to OutputSections.
If you could revise
the fundmental principles of
computer system design
to improve security...
... what would you change?