Cheri C Model

An implementation of C abstract machine that can run legacy C code with Strong memory protection guarantees.

References:

  1. Beyond the PDP-11: Architectural support for a memory-safe C abstract machine. 2015, ASPLOS. paper, slides

  2. ISO/IEC 9899:2011 Information technology – Programming languages – C. link

  3. Is address space 1 reserved? LLVMdev, 2015. link

  4. CheriABI: Enforcing Valide Pointer Provenance and Minimizinig Pointer Privilege in the POSIX C Run-time Environment, ASPLOS, 2019.

  5. Into the depths of C: elaborating the de facto standards. ACM SIGPLAN Notices, 51(6), pp.1-15. 2016. pdf

  6. Exploring C Semantics and Pointer Provenance, POPL, 2019.


Question/Proposals

the #pragma

Pure-capability enabled for zlib, but still need the pragma changes to keep compatable with MIPS ABI.

  • What does the pragma do?
  • What will happen if we enable pure-capability for the kernel? will we also need a pragma, or we will eliminate all the pragma?
Refine Compilers instead of CHERI ISA
  • Instead of refine CHERI ISA to support the idioms, is it possible to change the Compiler to converting those unsafe idioms into a safe one in the older CHERI ISA?
Refine C abstract machine
  • How could we extend C abstract machine to include more memory safety related semantics, such as malloc, kernel’s context switching, signal handler dispatching, I/O memory, MMU, etc.
  • Can SVA be viewed as an extended C abstract machine? But implements all extensions as a library rather than C language semantics?
Compiler assisted Temporal Safty in CHERI?
  1. Tracking all the pointer->integer->pointer flows. Make the collector accurate first. Then find a way to optimize it.

    • might track the integer with one more bit, indicating whether it is converted from pointers or not.
    • allow pointer to integer, but once casted to/from a pointer, the integer automatically becomes a capability and will never be casted back again (in architecture’s view, not in user’s view).
  2. Use SAFECode Strategy for a quoted safe.


Common Pointer Idioms

  • Around 2M lines of C code surveyed
  • Thousands of instances found
  • Breaking them is not acceptable

Summary of Difficult Idioms

CHERI := Fat Pointers ++

Original Fat Pointers:

  • Describe a pointer
  • Add metadata

Capabilities:

  • Unforgeable
  • Monotonic length and permissions
  • Grant rights

Original C Abstract Machine

malloc() is outside of the abstract machine.

  • Integer arithmetic and then cast to pointers, mostly in malloc().
  • The C specification indicates that each block of memory returned by malloc() is an object. And it is undefined behavior to use it after calling free().
  • The memory that has not been returned by malloc() is not yet part of the C abstract machine.
  • The compiler makes sufficient allowances to permit these functions to be implemented in C atop some more primitive functionality, mmap() or brk(), which deals with pages of memory.

Address translation (MMUs) is outside of the C abstract machine

  • memory in CHERI terminology always mean virtual memory.

Unions to subvert the type system

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type… This might be a trap representation.

This requirement is useful for low-level contexts: it is possbile to subvert the type system and interpret memory as differnet formats.

Code pointers as data

  • C is intended to be usable on microcontrollers with separate address spaces for code and data.
  • POSIX breaks this separation by introducing void *dlsym(...) function, used to look up a symbol in a shared library.
  • Notion of shared libraries is beyond the scope of the C language specification.
  • Unfortunately, looking up function pointers is a common use for dlsym, and is not defined behaviour in C.

Const in C type system is not strict

  • memchr in C specification, takes a const-qualified pointer as the first argument, and returns a non-const pointer derived from it.

Provenance of pointers in C is broken

  • pointer -> integer -> pointer. e.g. xor linked list: each node has a pointer that is the address of the previous node xor’d with the address of the next node, allowing traversal in both directions.

  • unused bits in a pointer used to store information.

Temporal safety in C is difficult

  • Casting integer to pointer makes accurate garbage collection impossible. Because any integer value may potentially be combined with others to form a valid address.
  • It is not possible to implement a copying or relocating garbage collector if it is possible for object addresses to escape from the collector.

  • Efficient implementation of full temporal safety will have unexpected behavior for much existing code.

CHERI Refined

Refine CHERI to meet C Abstract model.

Examples:

  • v1 -> v2 : prevent integer loaded into capability registers -> allow propagation of tags.

    • memcpy() does not need to aware the existence of pointers in the copied data.
    • unions too.
  • v2 -> v3: add ‘fat pointers’ style, an offset.

    • supporting arbitrary pointer arithmetic and comparison: CPrtCmp

    • allow more permission fields

      • additional hw checks, such as GC, info-flow tracking, integrity in concurrency, etc.

      • const by removing the store permission. __input and __output to discard permissions.

  • CToPtr and CFromPtr

    • will lose bound in traditional pointers.
    • must be used carefully
    • only in hybrid environment
  • __capability qualifier for hybrid compilation mode.

  • v3 supports storing data in unused bit of a pointer (must in bound)

More refinement ongoing:

  • Function Pointers as Capability

    • v3 can support (but not yet) using a code capability for every function: CJALR(capability jump and link register), such that when a function is executing, it is impossible to jump out of it without an explicit call.
    • legacy compilers and linkers place constants close to the functions and depend on the program counter address to locate globals. Thus use function pointers as capabilities would break these applications.
  • A relocating generational garbage collector.

    • use tagged memory to distinguish capbilities and other data.
    • already implemented, but need to determine how much software will be broken on it.

Evaluation

Idioms Support

Summary of idioms supported by different interpretations of C abstract machine
Summary of idioms supported by different interpretations of C abstract machine
  • x86/PDP-11/MIPS baselines,
  • HardBound, Intel MPX,
  • CHERIv2: no offset, a pointer addition decreases the range.
  • CHERIv3: with offset, etc.
  • a translator from C code into a simple abstract machine interpreter, to quickly modify the abstract machine and run the test cases extracted from the idioms to see which fail.
    • relaxed: integer can be modified but still points to valid object
    • strict: integer not allowd to be modified.

Benchmarking

100MHz Stratix IV FPGA. DDR DRAM is faster then CPU, so cache misses are less costly.

  • Olden. pointer intensive.
  • Dhrystone. Less pointer intensive.
  • tcpdump
  • zlib

Code Changes

Lines of code changed to port from MIPS to CHERIv2 and CHERIv3
Lines of code changed to port from MIPS to CHERIv2 and CHERIv3
  • Annotation are manually added to understand their placement. But compiler can represent pointers using capalibities internally, avoiding the need for manual annotations.

  • Semantic Changes:

    • tcpdump on cheriv2: 1.6K lines to avoid pointer subtraction.
    • tcpdump on cheriv3: 2 lines to mark readonly access to the packet being parsed rather then the whole packet buffer.
  • Compiler support:

    • a new ABI in which all pointers are implemented as capabilities;
    • references to on-stack objects are derived from a stack capability.
    • zlib version 1: header annotation for compatability with MIPS ABI. A single pragma at the start and end of the library header.
    • zlib version 2: copying structures when they are passed across the library boundary.
Overhead of CHERI-zlib
Overhead of CHERI-zlib normalized against zlib compiled for a conventional MIPS ISA
Created Jul 1, 2019 // Last Updated Jul 13, 2021

If you could revise
the fundmental principles of
computer system design
to improve security...

... what would you change?