[2019 Micro] CHERIvoke: Characterising Pointer Revocation using CHERI Capabilities for Temporal Memory Safety.

References:

  • CHERIvoke: Characterising Pointer Revocation using CHERI Capabilities for Temporal Memory Safety. pdf. MICRO, 2019.

Overview

A new allocator with quanrantine buffer that stores “to-be-revoked” segments.

  • Quanrantine Buffer (delayed sweeping): a list of object addresses that are freed but not safe to be reused yet.
    • These addresses are kept in a cached list of addresses of freed object;
    • addresses in this buffer cannot be reused (not really freed yet);
    • only do sweep when the buffer is full; and address is available for reuse after sweep (here have the real free);
    • sweep all memory that could contain references to the heap;
    • invalidate any capability references that points to any region in the quanrantine buffer;
    • sweep using a shadow map to store revocation metadata (1 bit for every 16-byte granule of the heap; 1128);
  • Revocation Shadow Map: bit-map for quarantined objects (fast look-up):
    • bit-mapped tags for all heap memory;
    • 1-bit for 16 byte of heap;
    • every allocation in quanrantine buffer is ‘painted’ in this map, indicating a to-be-revoked region in the heap;
  • To Sweep:
    • scan all memory for references
    • for each reference, perform a look-up at this map to determine whether to revoke the reference (capability/pointer);
    • use the base of the reference to detect if it is pointing into a revoked object;

new allocator

dlmalloc_cherivoke to replace dlmalloc.

(- set the bounds of its returns to match the requested allocation.)

Quanrantine buffer:

  • maintains a quarantine buffer proportional to heap size;
  • free() insert allocations to a quanrantine buffer;
  • when certain limit reaches, dlmalloc_cherivoke logs a simulated sweep event and returns all chunks in the quarantine buffer to the internal free list.

Shadow map:

  • each mmap() call is accompanied by a smaller mapping at a fixed transform/shifting from the original allocation.
  • when a region is unmapped, its corresponding shadow map is also unmapped.
  • delays shadow-space operation until a simulated sweep is triggered.
    • before the sweep event, we traverse the quanrantined chunks in the buffer and set shadow-map bits for each.
    • after a sweep event, these bits are cleared.
  • shadow-map painting precedure optimized: large and ligned contiguous regions use byte, half-word, word, and double-word store instructions when possible, rather than setting individual bits.

Sweep Optimization:

  • Sweep locations: anywhere that can contain references to the heap.

    • heap,
    • stack,
    • register files,
    • global segments(.data, .bss)
  • Goal: a highly optimized inner loop.

  • Optimization:

    • should fully utilize the DRAM bandwidth of the system;
    • can extends direct memory access (DMA) engines or digital signal processors (DSPs) in the system to perform this loop at bus speed and without CPU involvement.

sweep loop

simulation for x86-64

(Chapter 5.3 Sweeping Cost):

  • dump the core image periodically when the quarantine buffer is full;
  • preprocess the image to identify all virtual addresses that lie within regions of the core dump;
  • zero all non-pointer words. This way, a test against zero will simulate the ability to test the capability tag.
  • core dump also preserves the revocation shadow map, which is used during the sweep.

Evaluation

CHERI FPGA platform (derived from 1). 64-bit MIPS IV ISA. MIPS R4000. BERI in Bluespec SystemVerilog[^2014-isca-c4]. Altera Stratix IV FPGA. 100MHz, single core, 256KiB LLC, 6-stage in-order scalar pipeline, 1GiB DDR2, with Branch predictor. FreeBSD 10.

CHERIvoke on a modern x86-64 machine. Intel Core i7-7820HK CPU, 2.9GHz, 4 cores 8 threads, 8MiB LLC, 14-18 stage out-of-order superscalar pipeline, AVX2 support, 16GiB DDR4 2400, FreeBSD 12.0.

“To measure their impact on performance, we have desigend experiments to evaluate CHERIvoke revocation on a modern x86-64 machine to establish performance expectations for a wide deployment of mature CHERI implementations.”

“Perform revocation sweeps on the CHERI FPGA implementation over application dumps taken from our x86 system, allowing us to measure data elimination for applications that are not yet able to execute natively on the CHERI-MIPS architecture. “

Reference 2

[^2014-isca-c4] Bluespec SystemVerilog Version 3.8 Reference Guide, Bluespec, Inc., Waltham, MA, November 2004.


  1. https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201406-isca2014-cheri.pdf ↩
  2. CHERIvoke: Characterising Pointer Revocation using CHERI Capabilities for Temporal Memory Safety. pdf. MICRO, 2019. ↩
Created Oct 28, 2019 // Last Updated Nov 19, 2021

If you could revise
the fundmental principles of
computer system design
to improve security...

... what would you change?