SAFECode

Questions/Todos

  • Runtime check of SAFECode.

  • Function Pointers in SAFECode?

    • LLM: Function pointers are checked against a static CFG at runtime (some could be checked at compile time).
  • Where does the pool meta-data stored and used?

  • Can we also store meta-data for all pointers in type-unknown regions?

  • What is affine transformations? used in Control-C

Overview

Memory safety in C language.

Input:

  1. a program written in C;
  2. The result of a flow-insensitive, field-sensitive, unification-based pointer analysis on that program.
    • Includes both points-to information and type information for some subset of memory objects.
    • The analysis may use various forms of context-sensitivity.
  3. A call graph computed for the program.

Type inference in DSA: (2005TR1:) DSA attemps to compute type information for every “points-to set” in the program by inferring the intended type based on the uses of pointers to a points-to set object, and not based on the type declarations or cast operations in the program.

Insights

(From 2006DinakarPhD) Automatic pool allocation partitions the heap into regions based on a points-to graph. This leads us to the following new insight that is the key to the SAFECode work:

  • Insight1: pool based check instead of acrossing all memory range2:

    • Precondition (guarantteed by DSA): unaliasable memory objects are not allocated within the same region.

    • poolcheck($ph$, A, $o$): verifies that address, A, is contained within the set of memory ranges assigned to pool, $ph$, and has the correct alignment for the pool’s data type (or for the field at offset $o$ if $o\neq0$)

  • Insight2: No run-time check needed on initialized pointers in TK; Runtime check needed for arithmetic derived pointers and TU pointers3.

    • An initialized pointer obtained from a TK region will always be valid4; It cannot have been corrupted in an unpredictable way, e.g., via arbitrary casts and subsequent stores (it would then be in a TU region).
  • Insight3: Reused TK region will not violate type safety5.

  • Insight4: Release is safe only when no pointers into that regions6.

Official Document

Published Papers:

Control-C in 2002CASES 7, 2003 LCTES 8, 2005Embed 9, 2005SAFECodeTR 1, 2005PLDIPool10,

2006ICSE 11, 2006PLDI 12, 2006DinakarPhD 13, 2011Formal 14.

Implementation

The SAFECode analysis and transformation sources are organized as follows (from manual 15):

  1. lib/ArrayBoundChecks: This library contains several analysis passes for static array bounds checking.
  2. lib/InsertPoolChecks: This library contains the transform passes for inserting run-time checks and for inserting code to register memory objects within individual pools. It also contains the CompleteChecks pass which implements the Check Completion Phase.
  3. lib/OptimizeChecks: This library contains several passes for optimizing run-time checks.
  4. lib/RewriteOOB: This library contains passes for implementing Ruwase/Lam pointer rewriting. This code allows SAFECode to tolerate out-of-bounds pointers that are never dereferenced.
  5. lib/DebugInstrumentation: This library implements code that modifies run-time checks to contain additional debug information (if such debug information is present in the program). It is used in SAFECode’s debug tool mode.
  6. lib/DanglingPointers: This library contains a pass that modifies a pro- gram to perform dangling pointer detection.


In details

  • Control-C
  • Control-C (in 2002 CASES1) , a subset of C, but with key restrictions designed to ensure that memory safety of code can be verified entirely by static checking, under certain system assumptions. Restrictions on C T1. Requires strong typing of all functions, variables, assignments, and expressions, using the same types as in C. T2. Disallows casts to or from any pointer type. Casts between other types (e.g., intergers, floating point numbers, and characters) are allowed.

  • Memory Safety Without Runtime Checks or Garbage Collection
  • Reference1 Challenge: dangling pointers Proving statically that a general C program (for example) never dereferences a freed pointer (the “dangling pointer” problem) is undecidable. Region-based memory management, however, has been used to guaranttee the safety of pointer-based accesses to region data without garbage collection, but with limitations: 1) manual effort to convert program to use regions; 2) many solutions disallow explicit deallocation. Automatic regions inference algorithms have been developed to solve limitation completely or partially, such as in ML, or Cyclone.

  • 2005Embed
  • Reference1: Restricted C + Compiler = Safe language benefits with no garbage collection, no runtime checks. Safe definition: define a software entity (module, thread, or a complete program) to be safe if: not out of bound: never reference a memory location outside the data area by or for the entity. no alien code execution: never executes instructions outside the code area created by the compiler and linker within that space.

  • Steensgaard
  • Reference: B. Steensgaard. Points-to analysis in almost linear time. POPL, 1996.1 Type system B. Steensgaard. Points-to analysis in almost linear time. POPL, 1996. ↩

  • Type System in SAFECode
  • Q & A How to do encoding? What kind of information has been encoded? How does the type checking work on those encodings? What kinds of safety property can be checked? Program Presentation SAFECode support full C, but here a subset of C is used the simplify the presentation: Figure 2006DinakarPhD1, same as in 2005SAFECodeTR2: This language includes most sources of potential memory errors in the weakly typed C language, including:

  • Runtime Checks
  • Runtime Check in SAFECode All pointers in Type-Unknown pools are checked. All casts from int to pointer are runtime checked to ensure in the right pool. Pointers in TU are all loaded as int1 All pointers derived from array indexing operations need run-time check (2006 PLDI1), regardless of TK or TU. More about array bound checking. All function pointers need runtime check before being used. (2005-SAFECode-TR: function pointer)

  • Passes
  • Stack Check Reference1 safecode/include/StackSafety.h: This file defines checks for stack safety. struct checkStackSafety : public ModulePass { public : ... virtual bool runOnModule(Module &M); virtual void getAnalysisUsage(AnalysisUsage &AU) const { AU.addRequired<DataLayout>(); AU.addRequired<EQTDDataStructures>(); AU.setPreservesAll(); } private : // // Tracks the DSNodes that have already been analyzed by an invocation of // markReachableAllocas(). // std::set<DSNode *> reachableAllocaNodes; bool markReachableAllocas(DSNode *DSN, bool start=false); bool markReachableAllocasInt(DSNode *DSN, bool start=false); }; } } safecode/lib/StackSafety/CheckStackPointer.




  1. Enforcing Alias Analysis for Weakly Typed Languages, TR, 2005. ↩
  2. Original text from 2006DinakarPhD: if memory objects corresponding to each node are located in a region of the heap, we would check efficiently at run-time that a pointer is a valid member of the compile-time points-to set for that pointer, i.e., that alias analysis is not invalidated. ↩
  3. Original text from 2006DinakarPhD: Any initialized pointer read from an object in a TK region or from an allocation site, will hold a valid address for its target region; All other pointers, i.e., pointers derived from indexing operations, and pointers from TU regions (including function pointers), need run-time checks before being used. ↩
  4. Precondition: In the absence of dangling pointer errors and array indexing errors ↩
  5. From 2006 DinakarPhD: In a TK (type-homogeneous) region, if a memory block holding one or more objects were freed and then reallocated to another request in the same region with the same alignment, then dereferencing dangling pointers to the previous freed object cannot cause either a type violation or an aliasing violation. ↩
  6. From 2006 DinakarPhD: “We can safely release the memory of a region when there are no reachable pointers into that region. This gives us a way to release memory to the system. Since Automatic Pool Allocation already binds the life times of regions (using escape analysis), we can arrange for memory to be released at the end of a region’s life time.” ↩
  7. Ensuring Code Safety Without Runtime Checks for Real-Time Control Systems. CASES, 2002. ↩
  8. Memory Safety Without Runtime Checks or Garbage Collection. LCTES, 2003. ↩
  9. Memory Safety Without Garbage Collection for Embedded Applications, ACM Transactions on Embedded Computing Systems, 2005. ↩
  10. Automatic pool allocation: Improving performance by controlling data structure layout in the heap. PLDI, 2005. ↩
  11. Backwards-Compatible Array Bounds Checking for C with Very Low Overhead, ICSE, 2006. ↩
  12. SAFECode: Enforcing Alias Analysis for Weakly Typed Languages, PLDI, 2006. ↩
  13. SAFECode: A Platform for Developing Reliable Software in Unsafe Languages, Ph.D. Thesis, 2006. ↩
  14. Formalizing the SAFECode Type System, 2011. ↩
  15. SAFECode Software Architecture Manual ↩
Created Jul 4, 2019 // Last Updated Aug 31, 2020

If you could revise
the fundmental principles of
computer system design
to improve security...

... what would you change?