Revgen

Paper: Enabling sophisticated analyses of ×86 binaries with RevGen 1.

Document: Revgen 1

  • Disassemble the binary using IDA Pro;
  • Recover the control flow graph (CFG) using McSema;
  • Translate each basic block in the CFG into a chunk of LLVM bitcode by using QEMU’s translator;
  • Stitch together translated basic block into LLVM functions.

Note:

  • use old version of McSema (from 2016. not the lastest McSema2);
  • binary is statically linked; calling conventions not parsed for dynamic calls;
  • only on x86?
  • not self-modifying code (but QEMU supports them, and re-translating on the fly).
  • translated binary that is 15-30x bigger than the orignal. Overhead from:
    • no optimizations;
    • wraps each memory access to a function call;
    • embeds the original binary in the translated binary; in order to translate data memory access at run time;
    • those above should be done statically during translation. BUT this requires more complex lifting as in McSema; tasks include:
      • lifting local/global variables;
      • recover library function calls;
      • support multi-threading, signals, exceptions, long jumps, and many more.

  1. Enabling sophisticated analyses of ×86 binaries with RevGen. DSN-W, 2011. ↩
Created Nov 11, 2019 // Last Updated Nov 11, 2019

If you could revise
the fundmental principles of
computer system design
to improve security...

... what would you change?