Dynamic Control-Flow Graph with DynamoRIO

Posted on November 4, 2016 with tags dynamorio


This post notes the usefulness of the Intel-based PIN tool for code coverage.

To recap from this post, we would like to construct a “control flow graph” of a particular program. However, this control flow graph is not statically computed via LLVM given source code, but rather is computed as we’re running this binary. As a result, we can only construct a control flow graph that represents the branches or edges taken at runtime, which is a subset of all possible branches or edges.

Instead of using PIN, we would like to use DynamoRIO, as it trades PIN’s relatively simplistic API for raw speed.

First Steps

It turns out the DynamoRIO contains multiple samples, a few of which construct control flow traces (cbr.c, cbrtrace.c) that can be dumped to a file in a format very similar to that of the original source post’s PIN tool (if you squint).

$ ./bin64/drrun -c ./api/bin/libcbrtrace.so -- ls
[ls output]
$ head api/bin/cbrtrace.ls.01264.0000.log -n1
0x00007fde53311c00       # basic block
  [ 0x00007fde53311c4f   # inst addr
  , 0x00007fde53311c55   # fall-through
  , 0x00007fde53311ce2 ] # target addr
 => 0x00007fde53311c55   # taken branch

However, these ready-made clients do come with restrictions. First, these samples only instrument conditional branches, and not indirect calls (i.e. call [rax]), nor do they instrument indirect jumps (i.e. jmp [rax]).

These three classes of instructions together can construct a much more complete CFG at runtime, which directly complement static analysis as it is computationally hard to determine statically the set of all targets an indirect call or jump might take.

Conditional Branches

We start by working with a modified version of cbr.c, with a few notable differences:

  1. cbr.c implements its own hash table, we use the C++11 std::unordered_map container to save space and implementation time.
  2. This sample is interested in whether a branch is taken or not. We care about the edges of our CFG. The problems are sufficiently similar, but our hashtable instead stores addresses, rather than whether taken.
  3. When a branch has been taken, we no longer need this complicated instrumentation. And, once both the taken and fallthrough branches have been taken, no instrumentation is needed at all! However, this forces DynamoRIO to fall back on its corner-case design, see DR_EMIT_STORE_TRANSLATIONS in the docs. We keep the instrumentation there for simplicity and stability.

The instrumentation necessarily will modify conditional branches; we will victimize the original cbr and replace it with our own, which jumps into our at_taken clean call. Otherwise, we will jump into the at_not_taken clean call, as such:

  ; (1) Original conditional branch, clobbered to jump
  ; to our label.
  cbr label

  ; (2) We fell through the original cbr, so we note that
  ; the path was not taken by calling the `at_not_taken`
  ; clean call.
  ; Afterwards we jump to the original fallthrough address.
  push fall
  push src
  call at_not_taken
  jmp fallthrough

  ; (3) We note that this was the taken branch and jump to
  ; the original target.
  push targ
  push src
  call at_token
  jmp target

  ; original fallthrough instrumentation

To modify the original condition branch, we construct a label and set the instructions target to this label:

// (1)
instr_t *label = INSTR_CREATE_label(drcontext);
if (instr_is_cti_short(instr))
    instr = instr_convert_short_meta_jmp_to_long(
        drcontext, bb, instr);
instr_set_target(instr, opnd_create_instr(label));

Note that we must perform the above conversion from a short jump to a long jump as necessary, since inserting the fallthrough clean call later might take up a lot of space, even if the clean call has been inlined. As a result the short jump might not be able to reach over to the rest of the instrumentation, resulting in a crash at best.

The rest is more intuitive, as it is standard DynamoRIO instrumentation manipulation:

// (2)
dr_insert_clean_call(drcontext, bb, NULL,
                     (void *)at_not_taken,
instr_t *jmp = INSTR_CREATE_jmp(drcontext,
instrlist_preinsert(bb, NULL, INSTR_XL8(jmp), fall);

// (3)
instrlist_meta_preinsert(bb, NULL, label);
dr_insert_clean_call(drcontext, bb, NULL,
                     (void *)at_taken,
jmp = INSTR_CREATE_jmp(drcontext,
instrlist_preinsert(bb, NULL, INSTR_XL8(jmp), targ));

At (2) we construct the clean call to at_not_taken(), and then jump to the original fallthrough address (computed via decode_next_pc()). At (3), we place the label we created before, and then call the at_taken() clean call and jump to the target, which can be computed via instr_get_branch_target_pc().

The two clean calls, at_taken() and at_not_taken() intuitively add the appropriate target address to their respective entries in the hash table:

static std::unordered_map<app_pc, std::unordered_set<app_pc>> cbr;

static void at_taken(app_pc src, app_pc targ)
{ cbr[src].insert(targ); }

static void at_not_taken(app_pc src, app_pc fall)
{ cbr[src].insert(fall); }

If we want to be really thorough, we should lock on both of these functions.

Indirect Calls

It turns out intercepting indirect calls is very simple, as all we need to do is insert a clean call in front of a Indirect call, for example a call $rax instruction, and note the value of rax at that point.

First, DynamoRIO gives us an instr_is_call_indirect(), which allows us to identify calls of all types, i.e. near and far, as well as the following call variants:

  1. call rax
  2. call [rax]
  3. call [mem]

We can ensure we have exhausted all possible cases by consulting a good reference1. It turns out that all three are likely to be seen in the wild, i.e. due to vtable implementations in C++. However, it turns out we can simply reuse the target operand given to us at the DynamoRIO IR API level, allowing us to reuse the code for either case.

So, we can simply construct this instrumentation as follows:

DR_ASSERT(instr_is_call_indirect(instr)); // we ensure this
opnd_t target_opnd = instr_get_target(instr);
DR_ASSERT(opnd_is_reg(target_opnd) ||
reg_id_t target = opnd_get_reg(instr_get_target(instr));
dr_insert_clean_call(drcontext, bb, NULL,
                     (void *)at_ind_call,

And now for the clean call, at_ind_call():

static void at_ind_call(app_pc src, app_pc targ)
{ cbr[src].insert(targ); }

Indirect Jumps

Indirect jumps and Indirect calls are functionally very similar; both jumps and calls are forms of control transfer instructions, and given an instr_t * which we’ve determined has some indirect target, we can recycle the code from above.

reg_id_t target = opnd_get_reg(instr_get_target(instr));

In fact, we can even share the same clean call as there is very little functional difference; The final code comes out to look something like

DR_ASSERT(instr_is_cti(instr)); // we ensure this
opnd_t target_opnd = instr_get_target(instr);
DR_ASSERT(opnd_is_reg(target_opnd) ||
dr_insert_clean_call(drcontext, bb, instr,
                     (void *)at_ind_cti,

Note that we have simply relaxed our preconditions for this code–previously we required a call instruction, but we have generalized to any control transfer instruction. This allows us to reuse the same code for both indirect jumps and for indirect calls. 2


Finally, the most straightforward part is to output our data structure in json. We can use any json library at our disposal, say json, a cool C++11 variant which supports STL-like semantics.

json j = { { "branches", json::array() } };
std::transform(std::begin(cbr), std::end(cbr),
    [] (auto i) -> json {
        return {
            { "address", i.first  },
            { "targets", i.second }
std::cout << std::setw(2) << j << std::endl;

Test Run

The fully functional code can be found here

  1. Reference for the x86 near and far call semantics here

  2. One final note: a return instruction is certainly a “cti” instruction with an indirect target, as it pops rsp off of the stack and jumps to it, so it will trigger our codepath and we will record that branch. It doesn’t hurt to have this information, though it might slow down an application to have this instrumentation at every return instruction of every function. We can test for this specific case with a call to instr_is_return().