Revgen
Paper: Enabling sophisticated analyses of ×86 binaries with RevGen .
Document: Revgen
- Disassemble the binary using IDA Pro;
- Recover the control flow graph (CFG) using McSema;
- Translate each basic block in the CFG into a chunk of LLVM bitcode by using QEMU’s translator;
- Stitch together translated basic block into LLVM functions.
Note:
- use old version of McSema (from 2016. not the lastest McSema2);
- binary is statically linked; calling conventions not parsed for dynamic calls;
- only on x86?
- not self-modifying code (but QEMU supports them, and re-translating on the fly).
- translated binary that is 15-30x bigger than the orignal. Overhead from:
- no optimizations;
- wraps each memory access to a function call;
- embeds the original binary in the translated binary; in order to translate data memory access at run time;
- those above should be done statically during translation. BUT this requires more complex lifting as in McSema; tasks include:
- lifting local/global variables;
- recover library function calls;
- support multi-threading, signals, exceptions, long jumps, and many more.