Taint Analysis with DynamoRIO, Part 1
For my master’s thesis at Rensselaer Polytechnic Institute under Ana Milanova, I plan on applying taint analysis to harden ASLR by preventing address leaks during the execution of a program. Because of my familiarity with DynamoRIO, I plan on constructing a taint analysis system specifically for this platform.
This series of posts will document my progress towards that goal.
This particular article documents some notes on Dynamic Taint Analysis (DTA) which I’ve found useful, as well as a roadmap for my progress.
Introduction
Dynamic taint analysis serves to accurately model information flow in a single execution of a program; thus it requires an execution context on which it performs analysis. The technique itself has been explained very thoroughly by Brumley, et al 1 (required reading). In short, It allows us at any point in the execution of a program to determine if a runtime value is tainted, i.e. its value is indirectly or directly affected by a taint source. In most applications of taint analysis, the taint source is user input.
DynamoRIO, as well as Intel Pin and Valgrind, function almost analogously to a hypervisor for userspace programs; in other words, DynamoRIO executes code directly (with small modifications to the code stream) using an umbrella technique known as binary rewriting or dynamic binary instrumentation, and exposes an API for clients to directly modify the code stream at runtime.
Existing Projects
Below are some projects which use taint analysis, or some simplified form of it for some purpose, though do not necessarily expose some composable API:
TaintCheck 2 and TaintTrace 3, two sides of the same coin, implement taint analysis for the purposes of detecting possible exploits. Valgrind and DynamoRIO power these projects, respectively, though the DynamoRIO implementation was never open sourced.
Bochspwn Reloaded, 4 a talk by j00ru, documents the implementation of a simplified taint analysis engine on top of the bochs emulator (which only targets rep movsb
instructions), in order to detect kernel memory disclosures. It applies taint analysis in a similar manner to Dr. Memory, by tainting uninitialized stack and heap allocations.
Existing APIs
There exist many public implementations of taint analysis, some of which expose a dedicated API for use by plugins. These are documented here:
Triton implements not only taint analysis, but also a symbolic execution engine via a unified client-facing API. Triton also multiplexes these two features onto multiple implementations; one such implementation makes use of Intel Pin to execute the instructions on a real processor.
Panda 5 exposes a plugin API for taint analysis and replayable execution traces via the qemu engine; panda implements taint analysis on top of TCG, the internal qemu IR, to be truly platform independent. The project page references many other papers which showcase panda.
Roadmap
Due to time constraints I am restricting my research to the ARM architecture; the variety and complexity of x86’s ISA renders it infeasible to implement taint analysis in a stable and complete way in the time span of two semesters.
However, this project should also result in something composable; because of DynamoRIO’s rich plugin system, it would be prudent to contribute back a library, drtaint
which would expose a taint analysis API to clients, similar to the drreg
or drx
extensions.
In lieu of these requirements, I propose the following roadmap towards this composable library:
- Construct shadow memory scheme for ARM
- Implement taint propagation fastpath for most common opcodes
- Implement taint propagation slowpath for all other opcodes
- Expose a taint source and sink API, as well as a knob to control taint propagation policies
Once complete, tools such as TaintTrace and Triton may more easily be implemented without having to resort to reimplementing the taint propagation step; this composable library will also drive my forthcoming ASLR hardening tool.
All You Ever Wanted To Know About Taint Analysis and Forward Symbolic Execution (but might have been too afraid to ask)↩
Dynamic Taint Analysis for Automatic Detection, Analysis and Signature Generation of Exploits on Commodity Software↩
TaintTrace: Efficient Flow Tracing with Dynamic Binary Rewriting↩
Bochspwn Reloaded: Detecting Kernel Memory Disclosure with x86 Emulation and Taint Tracking↩
Tappan Zee (North) Bridge: Mining Memory Accesses for Introspection↩