Arraychecks
Reference
Extension to JK and JKRL.
Summary of Jones-Kelly from 2006ICSE:
Jones-Kelly inserts the following checks (ignoring any later optimizaiton) on each arithmetic operation involving a pointer value:
- JK1. check the source pointer is not the invalid value (-2);
- JK2. find referent object for the source pointer value using the table;
- JK3. check that the result pointer value is within the bounds of this referent object plus the extra byte. If the result pointer exceeds the bounds, the result -2 is returned to mark the pointer value as valid.
- JK4. Finally, on any load or store, perform checks[JK1-JK3] but JK3 checks the source pointer itself.
Argument in ICSE 2006:
- if dereference -2 is not allowed by operating system, the JK4 check before loads/stores is strictly not necessary for bound checking; but useful for dangling pointers and cast errors.
- JK2 is most expensive.
- If JK2 did not found object, runtime check is skipped, and array bound violations may not be detected.
- One byte padded to allocated objects.
- Not all object are padded during analysis: when inserting padding between two adjacent parameters could cause the memory layout of parameters to differ in check and unchecked code. Therefore: no pad to any function call if the call may invoke an unchecked function & no pad to formal parameters in any function that may be called from unchecked code. AND indirect calls must be treated conservatively.
Summary of Ruwase-Lam Extension
To solve the illegal immediate problem that larger than 1 byte out of bounds, Ruwase and Lam extend the JK algorithm essentially by tracking the intended referent of pointers explicitly but only in the case where a pointer moves out of bounds of its intended referent.
For every out-of-bounds pointer, an extra OOB object is allocated to hold some metadata for the pointer. The pointer itself is modified to point to the OOB object. The OOB contains actual pointer value, and the address of intended object (saved when the pointer first goes out of bounds).
All OOB objects are stored in the hash table. The hash table is checked only before accessing the OOB object to ensure it is a valid OOB object address.
All further arithmetic on the OOB pointer is performed on the value in the OOB object.
If the pointer value comes back within bounds, the original pointer is restored to its current value and the OOB object is deallocated.
Argument in ICSE 2006:
Extra overhead for out-of-bound pointers
When object deallocated, having to scan OOB object hash table to deallocate any OOB objects that are corresponding to the freed object.
Key improvements to JKRL
Exploiting automatic pool allocation for much faster searches for referent objects;
- Instead of using one large splay tree for the entire program, we maintain one splay tree per pool;
- The lookup for each pointer access is much faster.
An extra level of indirection in the RL approach for OOB pointers that eliminates the need for run-time checks on most loads and stores.
Extension to Automatic Pool Allocation
Handing non-heap data: original pool allocation only create pools for heap allocated data; Previous memory safety work assigns pool descriptors for all global and stack objects, but did not change how they are allocated.
- Pool allocation is extended to handle globals and stack data, also have partions for globals and stack, avoiding large splay trees.
- modified to create “dummy” pool descriptors for nodes that included only global or stack objects: A logical handle to a compiler-chosen subset of memory objects.
- automatically ensures the objects are created in the appropriate function. (e.g, in main if the node includes any globals).
- Record each object in the splay tree for the correpsonding pool.
- in
main
for global objects
- in appropriate function for stack allocated variables (many local variables are promoted to registers and do not need to be stack-allocated or recorded).
OOB pointers
Ruwas-Lam exetension to handle OOB pointers requires expensive checks on all loads/stores in the program (before the optimization via elimination of redundant checks).
In this work, don’t have to check all load/stores.
A separate OOB hash-table per pool.
Instead of returning the address of the OOB object and recording it in the out-of-bound pointer variable, we return an illegal (reserved) address. (trap upon access).
A second hash table for each pool: mapping the returned value and the OOB object.
- When do pointer arithmetic on such OOB values, search for OOB object
On every pointer arithmetic $ p = q + c$:
- check whether $q$ is reserved address;
- if not reserved: do bound check; if out of bounds, create new OOB object, new reserved address, and their mappings in hash table, assign $p$ as reserved address;
- if reserved: get OOB; get original pointer value from OOB; do arithmetic; check new pointer value: if out of bounds, a new OOB; if in bounds, assign to $p$.
Weakness
- (From 2006 ICSE, ch 3.2.1 ) For pointers with no known pool descriptors: no bounds check.
- May not known either if the pointer q is derived from an integer-to-pointer cast or from unchecked code (e.g, as a result of a call the an external function).
- For int-to-ptr cast, which result in a U (Unknown) flag, the pointer may actually point to an object in some pool, but we don’t know which pool at compile time. We do not check pointer arithmetic on such pointers, which is weaker than Jones-Kelly as it allows bound violation goes undetected.