The `MC` Layer in LLVM


Q & A

  • Where is the symbol table section being composed and written to the object file?

References:

The LLVM MC Project

This section is mostly from: llvm-mc, with partial info from LLVM CodeGen.

  • Instruction Printer.
    • MCInstPrint: MCInst -> textual representation.
    • Not aware of sections/directives, so independent of object file format.
    • A new lowering pass: MachineInstr -> MCInst.
  • Instruction Encoder.
    • MCCodeEmitter: MCInst -> bytes & relocations.
    • Same lowering code as MachineInstr -> MCInst.
  • Instruction Parser.
    • TargetAsmParser: parse .s file.
    • lexer is largely shared/reused code.
    • parser is all target-specific. Via Opcode & operand matching to determine concrete instruction.
    • more abstract-than-MCInst.
  • Instruction Decoder.
    • MCDisassembler: turns an abstract series of bytes (implemented with the MemoryObject API, to handle remote disassembly) into MCInst and a size.
  • Assembly Parser.

    • MCStreamer interface: take MemoryBuffer object as input.
    • handle all the directives and other gunk that is in an .s file but not an instruction.
  • MCStreamer. an assembler API.

    • one virtual method per directive, such as
      • EmitLable,
      • EmtSymbolAttribute,
      • SwitchSection,
      • EmitValue, for .byte, .word
      • EmitInstruction, output an MCInst to the streamer.
    • One implementation by MCAsmStreamer assembly printer.
      • This implements support for printing to .s file.
      • It uses Instruction Printer to format the MCInsts.
      • It simply prints out a directive for each method (e.g. EmitValue -> .byte).
    • Another implementation by MCObjectStreamer (or `)
      • writes out a .o file.
      • It implements a full assembler.
    • For target specific directives, the MCStreamer has a MCTargetStreamer class instance.
      • Each target that needs it defines a class FooTargetStreamer that inherits from this MCTargetStreamer and is a lot like MCStreamer itself.
      • it will have one method per directive, and
      • two classes that inherit from FooTargetStreamer:
        • FooTargetAsmStreamer : public FooTargetStreamer: a target asm streamer just prints it (emitFnStart -> .fnstart), and
        • FooTargetELFStreamer : public FooTargetStreamer: the object streamer implements the assembler logic for it.
      • see code here
  • Assembler Backend.

  • Compiler Integration.

    • Compiler backend now invokes the same MCStreamer interface to emit code;
    • can read code back with the asmparser and results in the same MCStreamer calls as when the code generator directly invokes them.
  • Classes: MCInst, MCSymbol (a label in .s), MCSection, MCExpr.

  • Usage 01: disassembler library

    • llvm/include/llvm-c/Disassembler.h
  • Usage 02: standalone assembler

  • Usage 03: Compiler-integrated assembler

    • Compiler -> MCStreamer -> MCInsts -> assembler backend.
    • inline assembly: a new temporary assembly parser to handle the inline asm, and talk with MCStreamer instance.

To Add a new Target in MC

Reference

  1. add a subclass of AsmPrinter for your target, converting MachineFunctions into MC label constructs.
    • For ELF, COFF, or MachO target, many reusable functions available, defined in TargetLoweringObjectFile class.
  2. add a instruction printer for your target. Converts MCInst to your ISA instruction as raw_ostream text.
    • Most of this is automatically generated from the .td file (when you specify something like add $dst, $src1, $src2 in the file), but you need to implement routines to print operands.
  3. add code to lower MachineInstr to an MCInst.
    • ususally implemented in <Target>MCInstLower.cpp.
    • responsible for translating into corresponding MCLabels from:
      • jump table entries
      • constant pool indices
      • global variable addresses
    • responsible for expanding pseudo ops used by the code generator into the actual machine instructions.
    • The MCInst then is fed into the instruction printer or encoder.
  4. Optionally, add a subclass of MCCodeEmitter, to lower MCInsts into machine code bytes and relocations.
    • This is important if you want to support direct .o file emission, or would like to implement an assembler for your target.

Emitting function stack size information

Reference

A section containing metadata on functino stack sizes will be emitted when TargetLoweringObjectFile::StackSizeSection is not null and TargetOptions::EmitSstackSizeSection is set (via -stack-size-section).

The section will contain an array of pairs of function symbol values (pointer size) and stack size (unsigned LEB128). The stack size values only include the space allocated in the function prologue. Functions with dynamic stack allocations are not included.

Handling Sections

Reference: The LLVM Target-Independent Code Generator

MCStreamer API that can be implemented to output an ELF .o file (or .s file, etc). Its API correspond directly to what you see in a .s file.

The MCContext class: the owener of a variety of uniqued data structures at the MC Layer, including symbols, sections, etc.

The MCSymbol class: represents a symbol (aka label) in the assembly file. Created by MCContext and uniqued there. Can be compared for pointer equivalence to find out if they are the same symbol. (But pointer equivalence does not mean two label will end up in two differen address; since two different label can be used to point to the same address). Two kinds of symbols:

  • assembler temporary symbols. These are used and processed by the assembler but are discarded when the object file is produced.
    • Usually distincted by adding a prefix to the lable, for example L labels in MachO.
  • normal symbols.

The MCSection class. It represents an object-file specific section.

  • It is subclassed by object file specific implementations: MCSectionMachO, MCSectionCOFF, MCSectionELF) and these are created and uniqued by MCContext.
  • MCStreamer has a notion of current section SectionStack.back().first or MCSectionSubPair(), which can be changed via SwitchToSection() method (which corresponds to a .section directive in a .s file). See more .section directive

The MCInst class. A target-independent representation of an instruction.

  • Elf Writer
  • References: reference // class .member struct ELFWriter{ ELFObjectWriter &OWriter; } // parent --> child MCObjectWriter -> ELFObjectWriter States: ELFObjectWriter &OWriter support::endian::Writer W. An adapter to write values to a stream in a particular byte order. unsigned LastLocalSymbolIndex. This holds the symbol table index of the last local symbol. unsigned StringTableIndex. This holds the .strtab section index. unsigned SymbolTableIndex. This holds the .symtabl section index. std::vector<const MCSectionELF *> SectionTable.

  • New Section
  • Reference reference To emit a new section, e.g. .newsection, to an object (ELF) file: add an option to clang to enable options. need to pass option from clang frontend to backend. Option finally goes to TargetOptions::EmitNewSection Option access in different classes: via TargetMachine instance: TM.Options.EmitNewSection Can be used in AsmPrinter, etc. via MachineFunction instance: MF.getTarget().Options.EmitNewSection update AsmPrinter class in LLVM CodeGen to emit the new section.

  • Asm Printer
  • Reference reference llvm/include/llvm/CodeGen/AsmPrinter.h llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp AsmPrinter is a pass at MachineFunction level: class AsmPrinter : public MachineFunctionPass. It acts as a driver for the MC layer streamers. It defines the MachineFunction pass that calls OutStreamer->emitXXX() to emit all the stuff to the binary or assembly files. The (misnamed) target-independent AsmPrinter class implements the general lowering process converting MachineFunction’s into MC label constructs. target-specific subclasses of AsmPrinter, such as SparcAsmPrinter, MipsAsmPrinter target specific functions such as emitFunctionBodyStart()/End() Additionally, for other code generation other than MachineFunction labels:

  • Mc Assembler
  • Reference reference // lib/MC/MCAssembler.cpp bool MCAssembler::registerSection(MCSection &Section) { if (Section.isRegistered()) return false; Sections.push_back(&Section); Section.setIsRegistered(true); return true; } // include/llvm/MC/MCAssembler.h /// \name Section List Access /// @{ iterator begin() { return Sections.begin(); } const_iterator begin() const { return Sections.begin(); } iterator end() { return Sections.end(); } const_iterator end() const { return Sections.end(); } size_t size() const { return Sections.size(); }

  • MCContext
  • Reference include/llvm/MC/MCContext.h lib/MC/MCContext.cpp Context object for machine code objects. This class owns all the sections that it creates. MCSectionELF *MCContext::getELFSection() Search ELFUniquingMap return if hit create new section using createELFSectionImp() if miss. MCSectionELF *MCContext::createELFSectionImpl() get existing or create a new MCSymbolELF *R set binding ELF::STB_LOCAL set type ELF::STT_SECTION create a new MCSectionELF *Ret create new MCDataFragment() *F insert F to Ret->getFragmentList() F->setParent(Ret) R->setFragment(F) return R, the MCSectionELF

  • Object File
  • Reference LLVM Source code Parent -> Child: MCObjectFileInfo -> TargetLoweringObjectFile -> TargetLoweringObjectFileELF -> MipsTargetObjectFile MCObjectFileInfo SourceCode of class MCObjectFileInfo llvm/MC/MCObjectFileInfo.h llvm/lib/MC/MCObjectFileInfo.cpp States: MCSection *TextSection, *DataSection, *BSSSection MCSection *ReadOnlySection. Not required. MCSection *LSDASection. Section of Language Specific Data Area (LSDA). To support languages with exception handling. DWARF sections DwarfLineSection DwarfLineStrSection DwarfStrSection … Sections Initialization Initialized in function MCObjectFileInfo::initELFMCObjectFileInfo(const Triple &T, bool Large). code // lib/MC/MCObjectFileInfo.

  • MCStreamer
  • References: LLVM MCStreamer Class Reference llvm/include/llvm/MCStreamer.h llvm/lib/MC/MCStreamer.cpp MCStreamer class is an abstract API that is implemented in different ways (e.g. to output a .s file, output an ELF .o file, etc). It is effectively an “assembler API”. MCStreamer has one method per directive, such as EmitLabel, EmitSymbolAttribute, SwitchSection, etc, which directly correspond to assembly level directives. Two implementations of MCStreamer: MCAsmStreamer is a straightforward impl that prints out a directive for each method.

  • ELF (LLVM Side)
  • ELF Basics llvm-objcopy: object copy and editing tool Relocation wiki Object File Editing llvm-objcopy [options] input [output]: --add-section <section=file>, add a section named <section> with the content of <file> to the output. --dump-section <section>=<file>, dump the content of section <section> into the file <file> --discard-all, -x, remove most local symbols from the output. ELF Section Flags // include/llvm/BinaryFormat/ELF.h // Section flags. enum : unsigned { // Section data should be writable during execution.

Created Nov 23, 2019 // Last Updated Jul 27, 2020

If you could revise
the fundmental principles of
computer system design
to improve security...

... what would you change?