Crix: Detecting Missing-Check Bugs via Semantic- and Context-Aware Criticalness and Constraints Inferences

Reference 1

Inter-procedural, semantic- and context-aware analysis.

Modeling and cross-checking of the semantics of conditional statements in the peer slices of critical variables infer their criticalness.

Use criticalness to detect missing-check bugs.

278 new missing-check bugs in Linux kernel that can cause security issues. 151 accepted by Linux maintainers.

Missing Check Example

/* Linux: net/smc/smc_ib.c */
static void smc_ib_remove_dev(struct ib_device *ibdev...)
{
    struct smc_ib_device *smcibdev;
    /* ib_get_client_data may fail and return NULL */
    smcibdev = ib_get_client_data(ibdev, &smc_ib_client);
    // ERROR1: NULL-pointer deference
    list_del_init(&smcibdev->list);
    /* ERROR2: device cannot be removed or unregistered */
    smc_pnet_remove_by_ibdev(smcibdev);
    ib_unregister_event_handler(&smcibdev->event_handler);
    /* ERROR3: memory leak */
    kfree(smcibdev);
    /* No return value: caller cannot know the errors */
}

From NVD: 59.5% stem from missing-check bugs; 52% (excluding DoS) of them will cause severe security impacts such as permission bypass, memory corruption, system crashes/hangs.

Existing work

Vanguard 2: detects only four specified critical operations such as arithmetical division and array indexing.

Chuncky3, Juxta4, Kremenek et al. 5, Dillig et al. 6.

  • Manual specificatin of critical variables. cover only a small set of critical variables. –> false negatives.
  • Most are not semantic- or context-aware. E.g treating any if/switch statement as security check. Only by the variable’s semantics and context, should we reduce the false negatives and false positive.s

Challenges

  • critical variables take diverse forms. Can be a parameter, a global, a return value. Can and can not be used in arithmetic operations.
  • identifying security checks requires semantic understanding. A variable in conditional statement does not necessarily mean the variable is security critical (70% not, see chapter 6), can also be normal selectors where all branches are normal execution.
  • missing check bugs are context dependent. Eg. No need to check when var used in a debugging function.
  • OS kernel is large. Checking every variable will not scale; corner cases such as hand-written assembly will make the analysis error-prone.

Crix

  • auto identify critical variables.
    • a two-layer type analysis to identify indirect-call targets. Function-type analysis in traditional CFI 7 8 9 + struct-type analysis 10.
    • data flow analysis. Inter-procedural, flow-, context-, field-sensitive.
    • identify security checked variables by automatic analysis.
  • peer slices that share similar semantics and contexts.
    • peer-slice construction to collect slices (code paths) of a critical variable that share similar semantics and contexts. For detection of missing checks.
  • model constraints.
    • Modeled constraints from the conditional statements and their semantics (e.g., the condition type).
    • Semantic aware.
    • Improves the precision of detection.
  • cross-checks the modeled constrain of the perr slices -> identifies deviations and reports.
  • 804 cases found –> manual confirmed 278 cases, patched, submitted to Linux –> 151 accepted (134 applied + 17 confirmed).

General Techniques

Two-layer type analysis for indirect-calls

Similar to Ge et al.10.

  • majority of taken addresses of functions are first stored to a function-pointer field of a struct.
  • dereference of the addresses in indirect calls, they must be loaded from the struct.
  • Function addresses that are never stored in the specific struct will not be valid targets of the indirect calls that load the function addresses from the struct.

** 12% of function addresses in Linux kernel are not stored to struct. e.g. function pointer variable passed as an argument of another function: These does not have second layer struct-type, thus will not benefit from two-layer type matching. ==> falls back to one-layer.

  • Field-sensitive: a struct may have multiple fields that hold function pointers. When offset is undecidable because indices are non-constant, we roll back the analysis to be field-insensitive.
Type-Escaping Analysis for False Negatives

Escape: a function address stored to a different struct, say structB, can be loaded from memory with structA.

In this case, the function address will be missed by the type analysis because we cannot find that the function address is ever stored to structA but only structB.

Such escaping cases exist when:

  • the struct holding the function addresses is cast to or from a different type;
  • the function-pointer field of struct is stored to with a value of a different type (e.g. unsigned long).

==> Crix: conservative: find all store and casts. When an escaped type is found, discard type during analysis, use only one-layer type analysis for this indirect call.

==> two layer type analysis does not introduce extra false negatives to existing one-layer type analysis.

Comparison to Ge et al.10
  • elastic: can fall back to first-layer type analysis.
  • escaping analysis: conservative to ensure soundness.

Critical-variable inference

Identify Security Check by failure handling code pattern.

  • Check failure pattern:
    • return error code, similar to LRSan11;
    • calling an error-handling function.
      • In Linux, limited number of basic error-handling functions
      • are critical functions and often in assembly: BUG(), panic(), dump_stack(), pr_err(), dev_err();
      • severity level: KERN_ERR, KERN_CRIT, KERN_EMERG.
      • variable num of parameter.
      • Detection is straightforward for a static analysis tool.
      • Manually investigated the results and filter out false-positive cases: 531 error handling functions found.
  • security check by if statement:
    • one branch handles a check failure, and
    • the other continues normal execution.

an if statement whose two branches both handle checking failures is not a security check.

LRSan reports 131K security checks; CRIX reports 308K security checks.

Once security checks identified, the checked variables are regarded as critical variables.

Peer-slice

Source and Use

Def and use in LLVM?

Find Source (not exactly def):

Only for identified critical variables, do interprocedural backward data-flow analysis, to collect:

  • Constants (such as error codes);
  • Return values and parameters of certain functions: Input functions; assembly functions.
  • Global variables.
  • Others: when CRIX cannot find a predecessor instruction, the current values are marked as sources.

Find Use:

  • Pointer dereference.
  • Indexing in memory accesses.
  • Binary operations.
  • Function call parameter.
  • None. if none of above, deem as no use. (diff with llvm def/use) common for error codes.
Construct Peer Slice

Wiki: Program slicing

Naive slice from use to source or source to use:

  • path explosion 12.
  • different semantic and contexts can make slice different and unrelated.

Goal of solution: for a source or use, need to construct its peer slices: should be sufficient to enable cross-checking; peer slices should share similar semantics and contexts.

Observation: in a CFG: call and return often generate peer paths.

E.G. a dispatcher indirect call: pad->var->reg()

  • can target multiple semantically similar callee functions (e.g. adp5589_reg and adp5585_reg);
  • arguments passed in by same caller: callee share similar contexts.

E.G. critical variable as parameter:

  • direct calls to function: different callers to same callee: callee’s arguments are used as similar semantics in similar contexts.

Forward data-flow analysis: for each critical variable source:

  • if indirect call take c.v. as paramter: collect all callees as a set of peer paths;
  • if c.v. is returned or written to memory pointed to by an argument: collect callers as a set of peer paths;
  • recursive until a use is found or propagation ends.

Backward data-flow analysis: for each use:

  • if c.v. comes from an argument of the current function: all callers are collected as peer paths;
  • recursive until a source is found.

Slicing peer paths:

  • Slicing ends at a conditional statement or the end of the path. ==> each slice has at most one conditional statement.

Four classes of slices:

  • Source-Ret: c.v. is returned as return value to multiple peer callers.
  • Source-Param: c.v. is returned as output to peer callers.
  • Source-Arg: c.v. is passed to peer callees through an indirect call.
  • Use-Param: c.v. is passed in from peer callers.

Cross-Checking Check Constraints


  1. Usenix Security, 2019. ↩
  2. L. Situ, L. Wang, Y. Liu, B. Mao, and X. Li. Vanguard: Detecting missing checks for prognosing potential vulnerabilities. In Proceedings of the Tenth Asia-Pacific Symposium on Internetware, page 5. ACM, 2018. ↩
  3. F. Yamaguchi, C. Wressnegger, H. Gascon, and K. Rieck. Chucky: Exposing missing checks in source code for vulnerability discovery. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 499–510. ACM, 2013. ↩
  4. C. Min, S. Kashyap, B. Lee, C. Song, and T. Kim. Cross-checking semantic correctness: The case of finding file system bugs. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, Oct. 2015. ↩
  5. T. Kremenek, P. Twohey, G. Back, A. Ng, and D. Engler. From uncertainty to belief: Inferring the specification within. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI ’06, 2006. ↩
  6. I. Dillig, T. Dillig, and A. Aiken. Static error detection using semantic inconsistency inference. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), San Diego, CA, June 2007. ↩
  7. ↩
  8. ↩
  9. ↩
  10. ↩
  11. W. Wang, K. Lu, and P. Yew. Check It Again: Detecting Lacking-Recheck Bugs in OS Kernels. In Proceedings of the 25th ACM Conference on Computer and Communications Security (CCS), Toronto, ON,Canada, Oct. 2018. ↩
  12. J. Jaffar, V. Murali, J. A. Navas, and A. E. Santosa. Path-sensitive backward slicing. In International Static Analysis Symposium, pages 231–247. Springer, 2012. ↩
Created Nov 25, 2019 // Last Updated Nov 26, 2019

If you could revise
the fundmental principles of
computer system design
to improve security...

... what would you change?