Control-C

Control-C (in 2002 CASES1) , a subset of C, but with key restrictions designed to ensure that memory safety of code can be verified entirely by static checking, under certain system assumptions.

Restrictions on C

  • T1. Requires strong typing of all functions, variables, assignments, and expressions, using the same types as in C.

  • T2. Disallows casts to or from any pointer type. Casts between other types (e.g., intergers, floating point numbers, and characters) are allowed.

  • T3. A union can only contain types that can be cast to each other; consequently a union cannot contain a pointer type.

  • T4. No uses of uninitialized local pointer variables within a procedure. A pointer variable must be assigned a value before it is used or address is taken.

For example:

int a, *p, **pp;

pp = &p;

print (**pp);

p = &a;

p is uninitialized and cannot be assigned to pp. Dereferencing p via **pp is potentially unsafe and this violation would be difficult to detect.

The language therefore disallows taking the address of an uninitialized pointer.

  • T5. If reserved address is used, then any individual data object should be no larger than the size of the reserved address range.

    • Reserved address is leveraged to avoid null pointer checks at runtime (replaced with HW checks).
  • T6. Pointer arithmetic is disallowed.

    • array traversals must be done using explicit array indexing in Control-C.
    • string traversals must use strlen followed by explicit array index operations; A safe string library operations can also be provided.

Restrictions on Array Operations

Limitation of symbolic integer analysis

One of the fundamental limits in static program analysis lies in the analysis of contraints on symbolic integer expressions. For ensuring safety, the compiler must prove (symbolically) that the index expressions in an array reference lie within the corresponding array bounds on all possible execution path. For each index expression, this can be formulated as an integer contraint system with equalities, inequalities, and logical operators used to represent the computation and control-flow statements of the program.

“Unfortunately, integer constrains with multiplication of symbolic variables is undecidable”.

A broad, decidable class of symbolic integer constraints is Presburger arithmetic, which allows addition, subtraction, multiplication by constants, and the logical connectives $\vee, \wedge, \neg, \exists $ and $\forall$. Omega library provides an efficient implementation that has been widely used for solving such integer programming problems in compiler research.

Exploiting static analysis based on Presburger arithmetic requires that our language only allows linear expressions with constant(known) coefficients for all computations that determine the set of values accessed by an array index expression.

Array operation rules

Definition of affine transformation: Let $F: R^n \rightarrow R$, then a transformation, $F$, is said to be affine if $F(\overrightarrow{\rm p}) = C\overrightarrow{\rm p} + \overrightarrow{\rm q}$, where $C$ is any linear transformation, and $\overrightarrow{\rm q}$ contains only constants or known symbolic variables independent of $\overrightarrow{\rm p}$.

In the following, we assume affine transformations with known constant integer coefficients ($C$).

Array operations in Control-C must obey the following rules. On all control flow paths,

  • A1. The index expression used in an array access must evaluate to a value within the bounds of the array.
  • A2. For all dynamically allocated arrays, the size of the array has to be a positive expression.
  • A3. If an array, A, is accessed inside a loop, then:

    1. the bounds of the loop have to be provably affine transformations of the size of $A$ an outer loop index variables or vice versa;** ???
    2. **the index expression in the array reference, has to be provably an affine transformation of the vector of loop index variables, or an affine transformation of the size of $A$; and
    3. if the index expression in the array reference depends on a symbolic variable $s$ which is independent of the loop index variable (i.e., appears in the constant term $\overrightarrow{\rm q}$ in the affine representation), then the memory locations accessed by that reference have to be provably independent of the value of $s$.
  • A4. If an array is accessed outside of a loop then

    1. the index expression of the array has to be provably an affine expression of the length of the array.

The compiler can statically check a program satisfies A1 only if the additional language rules A2 – A4 are obeyed.

Example:

... /*** caller function ***/
if (( k > 0) && (k < 5)) {
    B = (int *)RMalloc(k * 30);
    C = initialize(B, 20);
}
int * initialize(int *B, int n) {
    int *A,m;
    if (B[n] > 0) {
        A = (int *) RMalloc(5 * B[n] + 10);
        for (m=0; m < B[n]; ++m) A[m*4] = m ;
        return A;
    } else 
        return null;
}

The code above is a valid Control-C because:

  1. $ n = 20$ can be proven to be an affine function of the size of $B$ (A4);
  2. $ m * 4 $ is clearly an affine function of $m$ (A3(2)); and
  3. the bounds of the loop ($0$ and $B[n]$) can be proven to be affine functions of the size of $A$ (A3(a)).

Once the above three conditions are satisfied, the compiler proves A1 for all array accesses.

Note that for proving safety of $B[n]$, the compiler has to correctly include the constraint on $k$ ($ k < 5 $) and only then it can verify that $n(=20)$ is less than the size of $B$.


  1. Ensuring Code Safety Without Runtime Checks for Real-Time Control Systems. CASES, 2002. ↩
Created Jul 29, 2019 // Last Updated Aug 31, 2020

If you could revise
the fundmental principles of
computer system design
to improve security...

... what would you change?