Code-Pointer Integrity
References:
Goal
Guarantees the integrity of all code pointers in a program, e.g. function pointers, saved return addresses), and thereby prevents all control-flow hijack attacks, including return-oriented programming.
Challenges
- Hard to make Low level languages safe (C/C++) while preserving their benefits including performance and flexible programming patterns.
- Data execution prevention(DEP): return-to-libc [^c37] attack, ROP[^c44] [^c8], Turning-complete computations.
- Address Space Layout Randomization (ASLR): defeated by pointer leaks, side channel attacks [^c22], and just-in-time code reuse attacks [^c45].
- Stack Cookies [^c14]. Only against continuous buffer overflows.
- CFI.
- Above impose overheads and are bypassable as shown in survey
- Some require changes to how the programmers should write code.
In order to render control-flow hijack impossible, it is sufficient to guarantee the integrity of code pointers, i.e., those that are used to determine the targets of indirect control-flow transfers (indirect calls, indirect jumps, or returns).
Solution Overview
A partial memory safety solution aiming to protect code pointers only:
- Split process memory into a safe region and a regular region.
- Use static analysis to identify the set of memory objects that must be protected in order to guarantee memory safety for code pointers.
- All memory objects that contain code pointers and all data pointers used to access code pointers indirectly.
- Store them in safe region.
- Safe region use static analysis and runtime check to ensure full memory safety for this region.
- Unsafe region behaves the same as before, no runtime overhead.
Design
Definitions:
- A pointer dereference is safe iff the memory it accesses lies within the target object on which the dereferenced pointer is based.
- Execution of a program is memory-safe iff all pointer dereferences in the execution are safe.
- A pointer is based on a target object X iff the pointer is obtained at runtime by (stricter than C99’s based on definition)
- i) allocating X on the heap;
- ii) explicitly taking the address of X, if allocated statically, such as local or global variable, or is a control flow target (including return locations, whose addresses are implicitly taken and stored on the stack when calling a function.)
- iii) taking the address of a sub-object y of X (e.g., a field in the X struct), or
- iv) computing a pointer expression involving operands that are either themselves based on object X, or are not pointers. (e.g., pointer arithmetic, array indexing, or simply copying a pointer)
Code-Pointer Integrity Property:
A program execution satisfies the code-pointer integrity property iff
all its dereferences that either dereference or access sensitive pointers are safe.
- Sensitive pointers are code pointers and pointers that may later be used to access sensitive pointers. (Recursive definition)
- The integrity of the based-on metadata associated with sensitive pointers requires that pointers used to update sensitive pointers be sensitive as well. (Fig.1)
- two pointers:
*p=&q
- The notion of a sensitive pointer is dynamic.
void *
pointer can be sensitive when it points at a sensitive pointer at runtime, but not sensitive when it points to an integer.
Determining precisely the set of pointers that are sensitive can only be done at run time.
However, the CPI property can still be enforced using any over-approximation of this set, and such over-approximations can be obtained at compile time, using static analysis.
CPI enforcement mechanism: A combination of static instrumentation and runtime support.
- Static analysis pass: identify all sensitive pointers in P and all instructions that operate on them.
- type-based static analysis: a pointer is sensitive if its type is sensitive.
- Sensitive types:
- Pointers to functions,
- pointers to sensitive types,
- pointers to composite types that contains one or more members of sensitive types,
- universal pointers (i.e.,
void *
and opaque pointers to forward-declared structs or classes.)
- A programmer could additionally indicate, if desired, other types to be considered sensitive
struct ucred
used in FreeBSD kernel to store process UIDs and jail information.
- All pointers that a compiler or runtime creates implicitly (such as return addresses, C++ virtual table pointers, and setjmp buffers) are sensitive as well.
- To identify all instructions manipulate sensitive pointers:
- Pointer dereferences;
- Pointer arithmetic;
- Memory (de-)allocation operations that calls:
- standard libarary functions;
- C++ new/delete operators;
- manually annotated custom allocators;
- Notes on over approximation of sensitive pointer set:
- It may include universal pointers that never end up pointing to sensitive values at runtime. e.g.
char *
pointers in C/C++ standard are allowed to point to object of any type.
- As heuristic, we assume not universal on these
char *
pointers that
- are passed to the standard
libc
string manipulation functions or
- that are assigned to point to string manipulation functions or
- that are assigned to point to string constants
- Neither the over-approximation nor the
char *
heuristic affect the security guarantees provided by CPI:
- Over-approximation merely introduces extra overhead
- heuristic errors may result in false violation reports (though we never observed any in practice).
- Notes on optimization of performance:
- the
void *
arguments in memset
memcpy
: CPI will instrument all accesses inside the functions, regardless whether they are operating on sensitive pointers or not. To optimize:
- Use static analysis to detect the real types of the arguments prior to being cast to
void *
.
- Subsequent instrumentation pass handles them separately using type-specific versions of the corresponding memory manipulation functions.
- Note on the soundness:
- Type-based static analysis with a data-flow analysis that handles most practical cases of unsafe pointer casts and casts between pointers and integers.
- Instrumentation pass: rewrite P to “protect” all sensitive pointers, i.e. store them in a separate, safe memory region, and associate, propagate, and check their based-on metadata.
- Instruction level isolation mechanism that prevents non-protected memory operations from accessing the safe region.
- For performance reasons, return addresses are stored on the stack separately from the rest of the code pointers using a safe stack mechanism.
Evaluation
Open problems
More