Type System in SAFECode


Q & A

  • How to do encoding?
  • What kind of information has been encoded?
  • How does the type checking work on those encodings?
  • What kinds of safety property can be checked?

Program Presentation

SAFECode support full C, but here a subset of C is used the simplify the presentation:

Figure 2006DinakarPhD1, same as in 2005SAFECodeTR2: c like language

This language includes most sources of potential memory errors in the weakly typed C language, including:

  • P1. dangling pointers to freed heap memory;
  • P2. array bound violation;
  • P3. access via uninitialized pointers, and
  • P4. arbitrary cast from an int type to another pointer type and subsequent use.

Same as in C: int, char, cast, malloc, free;

New from SAFECode: alloca and galloc : alloca to allocate memory on stack; galloc to allocate/initilize global variables. These two features make it unnecessary to apply the & operator to get the address of a stack variable or global object; & is only used for indexing into structures, arrays and for function pointers. $\rightarrow$ Lele: why is this important?

Pointer Analysis

Syntactic extensions

From 2006DinakarPhD1/2005SAFECodeTR2:

Intuitively, the pointer analysis representation can be thought of as a storage-shape graph, also referred to as a points-to graph. Pointers pointing to two different nodes in the graph are not aliased.

  • Each node represents a set of memory objects created by the program
  • Two distinct nodes represent disjoint sets of memory objects.

We assume there is one points-to graph per function, since this allows either context-sensitive or insensitive analyses.

We use a type system to encode the results of points-to analysis (i.e., the storage shape graph) as type attributes within the program, using a type system analogous to Steensgaard’s3.

  • each points-to graph node is encoded as a distinct type, as $\rho$ defined by association $(\rho, \tau)$ below;
  • each pointer in this type system has a node attribute, $\rho$, describing the node it points to.

For example, in Figure 3 below:

example-tr-f3

The type of y is int*r2, denoting that it points to objects of node (or type) r2 in the points-to graph. Since a pointer can point into objects at an offset, we use $\tau * (p, n)$ to denote the type of a pointer pointing to offset n (where n is a compile type constant), within a struct of type $\tau$ at node $\rho$.

The statement associate($\rho,\tau$) associates node $\rho$ of the graph with type $\tau$, denoting that the node $\rho$ contains objects of type $\tau$. These objects may be pointers, such as associate(r1, $\tau'$ * r2): this encodes a ‘points-to’ edge from node r1 to node r2. associate (r1, r2) also encode the edge between r1, r2. Redundant cases like this is kept to take pointer analysis results directly from Steensgaard’s like type system3.

As a unification-based analysis, there can be only one target node for each variable or field of pointer type.

Memory that is used in a type inconsistent manner, e.g., via unions or casts in C, is assigned type Unknown, which is intepreted as an array of chars.


  1. SAFECode: A Platform for Developing Reliable Software in Unsafe Languages, Ph.D. Thesis, 2006. ↩
  2. Enforcing Alias Analysis for Weakly Typed Languages, TR, 2005. ↩
  3. B. Steensgaard. Points-to analysis in almost linear time. POPL, 1996. ↩
Created Jul 30, 2019 // Last Updated Aug 31, 2020

If you could revise
the fundmental principles of
computer system design
to improve security...

... what would you change?