Linking


Q&A

  • How to merge the .text sections from two object (relocatable) files into one executable binary?
    • How/When to determine the virtual address of each .text segment?
      • scan and parse each output section commands, update .dot according to each section.
      • see LinkerScript::assignAddresses()
      • called in Writer<ELFT>::finalizeSections() => Writer<ELFT>::finalizeAddressDependentContent() => LinkerScript::assignAddresses();
    • How/When to update the other sections that related to the relocated .text sections?

References:

Handling In/Out Sections

Mapping between an InputSection and an OutputSection: N to 1 Mapping.

// lld/ELF/InputSection.h

// This is a section that is added directly to an output section
// instead of needing special combination via a synthetic section. This
// includes all input sections with the exceptions of SHF_MERGE and
// .eh_frame. It also includes the synthetic sections themselves.
class InputSection : public InputSectionBase {
public:
   ...
};

Each input section is assigned to at most one output section. One output section contain one or more input sections.

The function that get the output section that holds the input section is getOutputSection(), or getParent().

// lld/ELF/InputSection.cpp

OutputSection *SectionBase::getOutputSection() {
  InputSection *sec;
  if (auto *isec = dyn_cast<InputSection>(this))
    sec = isec;
  else if (auto *ms = dyn_cast<MergeInputSection>(this))
    sec = ms->getParent();
  else if (auto *eh = dyn_cast<EhInputSection>(this))
    sec = eh->getParent();
  else
    return cast<OutputSection>(this);
  return sec ? sec->getParent() : nullptr;
}

// 

OutputSection *InputSection::getParent() const {
  return cast_or_null<OutputSection>(parent);
}

Tracking the parent of an InputSection (InputSectionBase):

// This corresponds to a section of an input file.
class InputSectionBase : public SectionBase {
  // Input sections are part of an output section. Special sections
  // like .eh_frame and merge sections are first combined into a
  // synthetic section that is then added to an output section. In all
  // cases this points one level up.
  SectionBase *parent = nullptr;
  OutputSection *getParent() const;
};

OutputSection

// lld/ELF/OutputSection.h

// This represents a section in an output file.
// It is composed of multiple InputSections.
// The writer creates multiple OutputSections and assign them unique,
// non-overlapping file offsets and VAs.
class OutputSection final : public BaseCommand, public SectionBase {
public:
  OutputSection(StringRef name, uint32_t type, uint64_t flags);

  static bool classof(const SectionBase *s) {
    return s->kind() == SectionBase::Output;
  }
  ...
};

In LinkerScript, the sectionCommands vector stores all the output sections as BaseCommand *. Another vector, outputSections, also stores the output sections.

What is the difference here?

  • Creation/Filling time:
    • sectionCommands is filled in during addOrphanSections. It:
    • scans all the input sections, and maps the input section to a proper output section.
    • will also create the output section if it does not exits and this output section as a member of sectionCommands.
    • outputSections ??
  • Before finalizeSections()
    • sectionCommands contain 49 sections. Each section addr is 0x0, same as input file.
    • Contain synthesized sections .got.
    • outputSections contain 0 sections.
  • During finalizeSections()
    • After sortSections();, the sectionCommands is scanned and added to outputSections if the section is of type OutputSection. See code below.
    • Before createPhdrs(): outputSections has 39 sections.
  • After finalizeSections(),
    • sectionCommands contain 39 sections.
    • outputSections contain 39 sections.

Code picking an output section from sectionCommands to put it in outputSections:

// Create output section objects and add them to OutputSections.
template <class ELFT> void Writer<ELFT>::finalizeSections() {
  ...
  sortSections();

  // Now that we have the final list, create a list of all the
  // OutputSections for convenience.
  for (BaseCommand *base : script->sectionCommands)
    if (auto *sec = dyn_cast<OutputSection>(base))
      outputSections.push_back(sec);
}
// lld/ELF/LinkerScript.h

class LinkerScript final{

  // SECTIONS command list.
  std::vector<BaseCommand *> sectionCommands;

};

Sections have the same name from different object files (input sections) will me merged into one output section. A list of input sections can be retrieved from each output section. For example, the following code in LLVM LLD iterates each output section and list all the input sections it holds:

iterating output sections

SyntheticSection

Program Header Generation

Program Header Entry class

In LLD, defined as PhdrEntry

// lld/ELF/Writer.h

// This describes a program header entry.
// Each contains type, access flags and range of output sections that will be
// placed in it.
struct PhdrEntry {
  PhdrEntry(unsigned type, unsigned flags)
      : p_align(type == llvm::ELF::PT_LOAD ? config->maxPageSize : 0),
        p_type(type), p_flags(flags) {}
  void add(OutputSection *sec);

  uint64_t p_paddr = 0;
  uint64_t p_vaddr = 0;
  uint64_t p_memsz = 0;
  uint64_t p_filesz = 0;
  uint64_t p_offset = 0;
  uint32_t p_align = 0;
  uint32_t p_type = 0;
  uint32_t p_flags = 0;

  OutputSection *firstSec = nullptr;
  OutputSection *lastSec = nullptr;
  bool hasLMA = false;

  uint64_t lmaOffset = 0;
};

The method add(OutputSection *sec), will put one output section to the segment this entry describes. The class only tracks:

  • lastSec, the last section being added to this segment;
  • firstSec, the first section being added to this segment;
  • p_align, the alignment of this segment, it is a largest alignment required by any output section it holds;
  • update the output section’s ptLoad field. The ptLoad field in the output section is used to compute the section offset for the sections in PT_LOAD segment. Formula: Off = Off_first + VA - VA_first, where Off_first and VA_first is file offset and VA of the first section in PT_LOAD. Questions:

    1. When does the section’s offset being determined using the above formula?
    2. Why not use section header to determine the section offset?
    3. Where does this section offset is used and stored to? Can the OS access it?

      // lld/ELF/Writer.cpp
      
      void PhdrEntry::add(OutputSection *sec) {
      lastSec = sec;
      if (!firstSec)
      firstSec = sec;
      p_align = std::max(p_align, sec->alignment);
      if (p_type == PT_LOAD)
      sec->ptLoad = this;
      }
      

Create Program Header Entries

Program header entries are grouped into partitions in LLD, but there is only one paritition as seen in the testing code. See LLD Partitions below.

Call Path:

// lld/ELF/{Driver|Writer}.cpp
LinkerDriver::link<ELFT>() // lld/ELF/Driver.cpp
  => writeResult<ELFT>() // lld/ELF/Driver.cpp
     => Writer<ELFT>().run() // lld/ELF/Writer.cpp
        => Writer<ELFT>::finalizeSections() // lld/ELF/Writer.cpp
           => ...
           => for (Partition &part: partitions)
                  part.phdrs = script->hasPhdrsCommands()? 
                        script->createPhdrs() : createPhdrs(part)
              => createPhdrs(part)
                {
                  // lld/ELF/Writer.cpp: Writer<ELFT>::createPhdrs(Partition &part)
                  addHdr(PT_PHDR, PF_R);
                  ..
                  addHdr(PT_INTERP, flags);
                  ..
                  addHdr(PT_LOAD, flags);
                  .. 
                  // create a new PT_LOAD segment when a section has new flags.
                  uint64_t newFlags = computeFlags(sec->getPhdrFlags());
                  load = addHdr(PT_LOAD, newFlags);
                  load->add(sec);
                  ..
                  addHdr(PT_GNU_EH_FRAME, ...);
                  ..
                  addHdr(PT_GNU_STACK, perm)->p_memsz = config->zStackSize;
                  ..
                  note = addHdr(PT_NOTE, PF_R);
                  note->add(sec);
                  ..
                  // here we add program header entry for capsule ownership
                  addHdr(PT_CAPSULE_OWNERSHIP, own_flag)->add(own_text);
                  addHdr(PT_CAPSULE_OWNERSHIP, own_flag)->add(own_data);
                }

If the linker script does not give program header commands (which is the usual case), then the writer will create program headers according to the default policies. This is implemented in Writer<ELFT>::createPhdrs(Partition &part).

Steps in the code to determine which section goes to which program header entry:

  1. (To add an header entry, use lambda func addHdr.)
code Writer::createPhdrs()

LLD Partitions

Refrences:

LLD’s partition feature allows a program (which may be an executable or a shared library) to be split inptu multiple pieces, or partitions. A partitioned program consists of a main partitin together with a number of loadable partions. The loadable partitions depend on the main partition in a similar way to a regular ELF shared object dependency, but unlike a shared object, the main partition and the loadable partitions share a virtual address space at link time, and each loadable partition is assigned a fixed offset from the main partition. This allows the loadable partitions to refer to code and data in the main parition directly withtout the binary size and performance overhead of PLTs, GOTs or symbol table entries.

Address Resolution

  1. GOT/PLT table: linker will redirect all external references to a dynamic library to an entry in these tables.
  2. For capsule ownership section, a similar approach will be used: recreate each entry with an address pointing to the txt/data section, and with ownership values.
  3. Now to understand how does these work, the questions seek to answer:
    • How/Where does the linker create these GOT/PLT entries?
    • How/Where does the linker update the corresponding sections’ references to the global variables/function ptrs?

Call Path:

// lld/ELF/{Driver|Writer}.cpp
LinkerDriver::link<ELFT>() // lld/ELF/Driver.cpp
  => writeResult<ELFT>() // lld/ELF/Driver.cpp
     => Writer<ELFT>().run() // lld/ELF/Writer.cpp
        => Writer<ELFT>::finalizeSections() // lld/ELF/Writer.cpp
           => ...
              // Some symbols are defined in term of program headers. Now that we
              // have the headers, we can find out which sections they point to.
              setReservedSymbolSections();

              finalizeSynthetic(in.bss);
              finalizeSynthetic(in.bssRelRo);
              finalizeSynthetic(in.symTabShndx);
              finalizeSynthetic(in.shStrTab);
              finalizeSynthetic(in.strTab);
              finalizeSynthetic(in.got);
              finalizeSynthetic(in.mipsGot);
              finalizeSynthetic(in.igotPlt);
              finalizeSynthetic(in.gotPlt);
              finalizeSynthetic(in.relaIplt);
              finalizeSynthetic(in.relaPlt);
              finalizeSynthetic(in.plt);
              finalizeSynthetic(in.iplt);
              finalizeSynthetic(in.ppc32Got2);
              finalizeSynthetic(in.partIndex);

assignAddresses()

“This function assign address as instructed by linker script SECTIONS sub-commands. Doing that allows us to use final VA values.”

After this, all output sections have their addr field being set to be the virtual address which is the address referred by the loader when they are being loaded.

  • scan and parse each output section commands, update .dot according to each section.
  • see LinkerScript::assignAddresses()
  • called in Writer<ELFT>::finalizeSections() => Writer<ELFT>::finalizeAddressDependentContent() => LinkerScript::assignAddresses();

Call Path:

Writer::finalizeSections()
=> Writer<ELFT>::finalizeAddressDependentContent()
   => script->assignAddresses(); // multiple places

Writer::finalizeSections()
=> Writer<ELFT>::optimizeBasicBlockJumps() // if (config->optmizeBBJumps)
   => script->assignAddresses();

const Defined *LinkerScript::assignAddresses() // lld/ELF/LinkerScript.cpp
=> 
// lld/ELF/LinkerScript.cpp


// Here we assign addresses as instructed by linker script SECTIONS
// sub-commands. Doing that allows us to use final VA values, so here
// we also handle rest commands like symbol assignments and ASSERTs.
// Returns a symbol that has changed its section or value, or nullptr if no
// symbol has changed.
const Defined *LinkerScript::assignAddresses() {
  if (script->hasSectionsCommand) {
    // With a linker script, assignment of addresses to headers is covered by
    // allocateHeaders().
    dot = config->imageBase.getlaterValueOr(0);
  } else {
    // Assign addresses to headers right now.
    dot = target->getImageBase();
    Out::elfHeader->addr = dot;
    Out::programHeaders->addr = dot + Out::elfHeader->size;
    dot += getHeaderSize();
  }

  auto deleter = std::make_unique<AddressState>();
  ctx = deleter.get();
  errorOnMissingSection = true;
  switchTo(aether);

  SymbolAssignmentMap oldValues = getSymbolAssignmentValues(sectionCommands);
  for (BaseCommand *base : sectionCommands) {
    if (auto *cmd = dyn_cast<SymbolAssignment>(base)) {
      cmd->addr = dot;
      assignSymbol(cmd, false);
      cmd->size = dot - cmd->addr;
      continue;
    }
    assignOffsets(cast<OutputSection>(base));
  }

  ctx = nullptr;
  return getChangedSymbolAssignment(oldValues);
}
AddressState
// lld/ELF/LinkerScript.h

  // Temporary state used in processSectionCommands() and assignAddresses()
  // that must be reinitialized for each call to the above functions, and must
  // not be used outside of the scope of a call to the above functions.
  struct AddressState {
    AddressState();
    uint64_t threadBssOffset = 0;
    OutputSection *outSec = nullptr;
    MemoryRegion *memRegion = nullptr;
    MemoryRegion *lmaRegion = nullptr;
    uint64_t lmaOffset = 0;
  };

finalizeInputSections()

Ending

  • Phdrs_cmds
  • Reference reference phdrsCommands (empty for most binary) In LinkerScript: // lld/ELF/LinkerScript.h classLinkerScript final{ // PHDRS command list. std::vector<PhdrsCommand> phdrsCommands; }; In LLD Code updated in two ways; based on the linker script if exists; by the linker itself if no linker script is given (true for binary and relocatable outputs); The driven function to update the program header commands, in void Writer<ELFT>::finalizeSections() // lld/ELF/Writer.cpp // Create output section objects and add them to OutputSections.

Created Jul 28, 2020 // Last Updated Aug 6, 2020

If you could revise
the fundmental principles of
computer system design
to improve security...

... what would you change?