SAFECode support full C, but here a subset of C is used the simplify the presentation:
Figure 2006DinakarPhD1, same as in 2005SAFECodeTR2:
This language includes most sources of potential memory errors in the weakly typed C language, including:
Same as in C: int
, char
, cast
, malloc
, free
;
New from SAFECode: alloca
and galloc
: alloca
to allocate memory on stack; galloc
to allocate/initilize global variables. These two features make it unnecessary to apply the & operator to get the address of a stack variable or global object; & is only used for indexing into structures, arrays and for function pointers. $\rightarrow$ Lele: why is this important?
From 2006DinakarPhD1/2005SAFECodeTR2:
Intuitively, the pointer analysis representation can be thought of as a storage-shape graph, also referred to as a points-to graph. Pointers pointing to two different nodes in the graph are not aliased.
We assume there is one points-to graph per function, since this allows either context-sensitive or insensitive analyses.
We use a type system to encode the results of points-to analysis (i.e., the storage shape graph) as type attributes within the program, using a type system analogous to Steensgaard’s3.
For example, in Figure 3 below:
The type of y
is int*r2
, denoting that it points to objects of node (or type) r2
in the points-to graph. Since a pointer can point into objects at an offset, we use $\tau * (p, n)$ to denote the type of a pointer pointing to offset n
(where n
is a compile type constant), within a struct of type $\tau$ at node $\rho$.
The statement associate($\rho,\tau$)
associates node $\rho$ of the graph with type $\tau$, denoting that the node $\rho$ contains objects of type $\tau$. These objects may be pointers, such as associate(r1, $\tau'$ * r2)
: this encodes a ‘points-to’ edge from node r1
to node r2
. associate (r1, r2)
also encode the edge between r1, r2
. Redundant cases like this is kept to take pointer analysis results directly from Steensgaard’s like type system3.
As a unification-based analysis, there can be only one target node for each variable or field of pointer type.
Memory that is used in a type inconsistent manner, e.g., via unions or casts in C, is assigned type Unknown, which is intepreted as an array of chars.
If you could revise
the fundmental principles of
computer system design
to improve security...
... what would you change?