Learning LLVM

Q & A

  • What is the LLVM address space?

    • John: better name ‘namespace’: one namespace for memory load/store; another namespace for IO port load/store.
  • What is data layout string in LLVM?

  • What kind of change we need to make to enable fat pointers for a legacy code?

  • Exceptions
  • References: reference More Landingpad References: LLVM Exception Handling A landing pad corresponds roughly to the code found in the catch portion of a try/catch sequence. When execution resumes at a landing pad, it receives an exception structure and a selector value corresponding to the type of exception thrown. The selector is then used to determine which catch should actually process the exception. More

  • Add New Pass to LLVM pipeline
  • References: Writing LLVM Pass in 2018 – Part III Steps createXXXPass function initializeXXXPass function INITIALIZE_PASS_BEGIN/END/DEPENDENCY code. Put initializeXXXPass in the right place. update LinkAllPasses.h Put createXXXPass in the right place. More

  • Table Gen
  • References: TableGen Overview TableGen Programmer’s Reference TableGen Backend Developer’s Guide #### More

  • Debug Info
  • References: Source Level Debugging with LLVM Debugger Intrinsic Functions LLVM uses several intrinsic functions (name prefixed with “llvm.dbg”) to track source local variables through optimization and code generation. void @llvm.dbg.addr(metadata, metadata, metadata). Information about a local element (e.g., variable) first argument is metadata holding the address of the variable, typically a static alloca in the function entry block. Second argument is a local variable containing a description of the variable.

  • Unique Naming Mechanisms in LLVM
  • References: llvm/include/llvm/IR/ValueSymbolTable.h ValueSymbolTable ValueSymbolTable: provides a symbol table of name/value pairs. It is essentially a std::map, but has a controlled interface provided by LLVM as well as ensuring uniqueness of names. classValueSymbolTable { ... private: ValueName *makeUniqueName(Value *V, SmallString<256> &UniqueName); /// This method adds the provided value \p N to the symbol table. The Value /// must have a name which is used to place the value in the symbol table.

  • LTO
  • Q&A How does the LTO help the optimizer to avoid relying on conservative escape analysis? References: LLVM Link Time Optimization A tight integration between the linker and LLVM optimizer. The linker treats LLVM bitcode files like native object files and allows mixing and matching among them. The linker use libLTO, a shared object, to handle LLVM bitcode files. The linker input allows the optmizer to avoid relying on conservative escape analysis.

  • Extending
  • References: reference More Add a new SelectionDAG node References: Extending LLVM: Adding instructions, intrinsics, types, etc More Add a new Instruction to LLVM References: Extending LLVM: Adding instructions, intrinsics, types More Cheri CSetBounds Intrinsics References: Extending LLVM: Adding instructions, intrinsics, types Mips::CSetBounds new instruction support in LLVM IR intrinsics td file llvm/include/IR/IntrinsicsCHERICap.td def int_cheri_cap_bounds_set : Intrinsic<[llvm_fatptr_ty], [llvm_fatptr_ty, llvm_anyint_ty], [IntrNoMem, IntrWillReturn]>; def int_cheri_cap_bounds_set_exact : Intrinsic<[llvm_fatptr_ty], [llvm_fatptr_ty, llvm_anyint_ty], [IntrNoMem, IntrWillReturn]>; target td file llvm/lib/Target/Mips/MipsInstrCheri.

  • Metadata
  • References: llvm/unittests/IR/MetadataTest.cpp

  • Tools
  • LLVM Command Guide An Example Using the LLVM Tool Chain clang hello.c -o hello clang -O3 -emit-llvm hello.c -c -o hello.bc clang -S -emit-llvm hello.c -o hello.ll lli hello.bc # invoke LLVM JIT llvm-dis < hello.bc | less # .bc to .ll llvm-as hello.ll -o hello.bc # .ll to bc llc hello.bc -o hello.s llc hello.ll -o hello.s llvm-lit: LLVM integrated tester. Regression Tests References: LLVM doc: Writing new regression tests Lit documentation More Cheri References: reference # llvm/utils/lit/lit/llvm/config.

  • Asm
  • Q & A Can assembly language being compiled to LLVM IR so that it can be analysed? Yes, lots of tools can do this ‘lifting’. see assembly to llvm IR translators Assembly to LLVM IR llvm-mctoll remill libqemu llvm-qemu Mc Toll Reference 1 reference ↩ Remill Reference 1 To add new instructions to Remill: https://github.com/lifting-bits/remill/blob/master/docs/ADD_AN_INSTRUCTION.md SEM: semantic of an instruction. ISEL: An instruction ‘selection’.

  • Proj
  • Reference 1 root dir Makefile Variables PROJECT_NAME LLVM_SRC_ROOT LLVM_OBJ_ROOT PROJ_SRC_ROOT PROJ_OBJ_ROOT PROJ_INSTALL_ROOT LEVEL include Makefile.config from $(LLVM_OBJ_ROOT) include Makefile.rules from $(LLVM_SRC_ROOT) two ways to set all these variables write your own Makefiles pre-made LLVM sample project Source Tree Layout lib/ include/ tools/ test/ Variables in subdir Makefile LEVEL DIRS PARALLEL_DIRS OPTIONAL_DIRS Variables for Building Libraries (lib/) LIBRARYNAME BUILD_ARCHIVE SHARED_LIBRARY Variables for Building Programs (tool/)

  • Hacking
  • Reference 1 Types Reference 1 To Print type info: use dump() Type ↩ reference ↩

  • Attr
  • Q&A How to add new attributes and propagate it using LLVM? How about attributes in assembly language? LLVM References: How to use attributes Adding an annotation that gets to the backend How to add an attribute An attribute can be a single “enum” value (the enum being the [Attribute::AttrKind] enum), a string representing a target-dependent attribute, or an attribute-value pair. Some examples:

  • Addrspace
  • Cite: relation between address spaces and physical memory locations On Wed, Mar 23, 2016 at 8:01 PM, David Chisnall via llvm-dev < llvm-dev at lists.llvm.org> wrote: On 23 Mar 2016, at 11:35, Mohammad Norouzi wrote: > > Thanks for the reply. > > On Wed, Mar 23, 2016 at 10:43 AM, James Molloy wrote: > Hi, > > Address spaces in LLVM are an abstract concept and LLVM attaches no internal meaning to address spaces, apart from: > > - Location 0 in address space 0 is ‘nullptr’ and a pointer to this cannot be dereferenced in a well formed program.

  • Instvisitor
  • Reference 1 Class InstVisitor: Base class for instruction visitors Instruction visitors are used when you want to perform different actions for different kinds of instructions without having to use lots of casts and a big switch statement (in your code, that is). To define your own visitor, inherit from this class, specifying your new type for the ‘SubClass’ template parameter, and “override” visitXXX functions in your class. I say “override” because this class is defined in terms of statically resolved overloading, not virtual functions.

  • LLVM IR
  • Q&A Reference 1 Example: class A { public: int f; } A* __capability a = new A; a->f = 42;%call = tail call i8 addrspace(200)* @operator new(unsigned long)(i64 zeroext 4) %f = bitcast i8 addrspace(200)* %call to i32 addrspace(200)* store i32 42, i32 addrspace(200)* %f LLVM Language Reference Manual ↩

  • Data Layout
  • (From 1) layout specification: A module may specify a target specific data layout string that specifies how data is to be laid out in memory. The IR syntax for the data layout is simply: target datalayout = "layout specification" (From 2) The XXXTargetMachine constructor will specify a TargetDescription string that determines the data layout for the target machine, including characteristics such as pointer size, alignment, and endianness. For example, the constructor for SparcTargetMachine contains the following:

Created Aug 25, 2019 // Last Updated Sep 28, 2020

If you could revise
the fundmental principles of
computer system design
to improve security...

... what would you change?