Available at: https://github.com/Guardsquare/mocxx
C++ is a very powerful general-purpose programming language. However, this power comes with a cost. Big applications written in C++ become very complicated, very fast. Maintaining such applications and testing them is a challenge in its own right. Many modern programming languages already received excellent testing and mocking tools, allowing them to switch type and object implementations at any time, to limit the testing scope. Typically this is harder to do in a compiled language, that’s not designed with this functionality in mind, for instance Python is one of such languages. C++, thanks to its compilers, has very little dynamic data present at runtime, which makes it difficult to make any substitutions. Most often such solutions(for instance GoogleMock, Cmock, FakeIt) require direct source-code or/and build system modifications, which is admittedly the last thing you would want to do. But wait…
Here at Guardsquare we work days and (sometimes) nights on cutting-edge protection technologies to prevent reverse-engineers from reaching their nefarious goals. This requires us to analyze all these fancy tools, libraries and techniques used by that motley bunch obsessed with the color of their hats. One such tool is Frida, which according to its description on GitHub is a dynamic instrumentation toolkit for developers, reverse-engineers, and security researchers. You can think of it as a debugger of sorts, it attaches to the running process and provides you with a javascript interpreter and API allowing you to manipulate process memory, thread state, control flow and so forth. This is a remarkable library.
Frida is based on portable, battle-tested dynamic code instrumentation technology, and if used correctly would make it possible to create a new C++ mocking framework, standing on par with similar frameworks for dynamic languages in functionality and flexibility. And, we wrote such a framework.
Mocxx is a versatile function mocking framework. It replaces a target function with the provided implementation, and integrates well with the existing testing and mocking frameworks. It requires no macros, virtually no source code modification, allows you to replace any function, including system functions, such as open. It is type safe and follows RAII. You can find Mocxx at https://github.com/Guardsquare/mocxx. The rest of the article will dive deep in its capabilities, design and implementation. Keep reading.
To replace a function with Mocxx you are required to construct an instance of it and make a single call to Replace:
|
|
That’s it, simply pass in the replacement lambda and the target function to the
Replace
method. The lambda/target passing order is necessary to drive target
function type resolution, because in C++ it is possible to have many overloads
of the same function with different arguments. To save you some typing, the
target type is derived from the provided lambda. If you want a different
overload, simply change the type of the lambda:
|
|
If the type of the lambda cannot be matched against any of the overloads, then the replacement call will fail to compile.
The target function type resolution is done via a set of function declarations and type aliases, which effectively takes its call operator type and strips its lambda type off. Take a look:
|
|
This is enough to resolve free functions’ overloading sets, but requires a bit more effort to resolve member function types. As a convention, the very first parameter in the replacement lambda indicates the type the target member function belongs to. The constness of this parameter indicates the const/non-const member version.
The differences between member and free function types are substantial, and the
fact that the function overloading sets are resolved at the call sites, makes
it almost impossible to handle such replacements with a single Replace
API.
Subsequently, in order to replace member function you have to make a call to
ReplaceMember
:
|
|
Now that we have a replacement lambda and properly typed replacement target, how do we actually switch the target implementation? As was already mentioned we are using a dynamic instrumentation toolkit called Frida, specifically its instrumentation component called Gum. This library has a wide range of capabilities, for example it can hook a function to replace its implementation; it can stealthy trace a program, rewriting it on the go; it provides memory scanning and monitoring, symbols lookup, code generation, you name it. In the Mocxx case we required only two API sets: function hooking and symbols lookup. To replace a function with Gum we do the following:
|
|
Gum API accepts as its context an object of type GumInterceptor
, you can
obtain it by calling gum_interceptor_obtain()
and destroy it by calling
g_object_unref(interceptorInstance)
. Gum implements a transaction-style API.
In the snippet above you can see the calls to the transaction begin and end. In
between transaction calls jammed a request to replace the target function. This
request requires passing the interceptor instance, the target function void
pointer, replacement function and any data, that can be queried later. It was
mentioned already how to obtain the interceptor instance, let’s figure out how
to deal with the rest of the arguments.
Since Gum is written in C, any function pointer it accepts is necessarily
type-erased and converted to void
pointer via TargetToVoidPtr()
. After this
conversion such pointers cannot be treated as functions, because void pointers
are inherently pointers to data, and this is all perfectly safe because Mocxx
never invokes the target function via this pointer.
Free function can be easily converted to void
pointer, assuming it is treated
as data thereafter, or some additional logic is implemented to invoke it
correctly. This is not necessarily true for member functions. In C++ such
functions are not represented in the same way as free functions. A pointer to a
member function might be a data structure rather than a single pointer.
Consider a virtual member function. At compile time it is not possible to
resolve it to a valid address, at runtime you can resolve it, but it will
change depending on the underlying object. So how would member function type
erasure even work. C++ standard is intentionally vague on that account. The
best next thing that can act as an authority on member pointer representation
is the compiler. There are two (open-source) mainstream compilers in the wild:
Clang and GCC. All our development is done with Clang (with its libc++) on the
machines running x86_64 processors, so the following explanation should be
viewed from this position. Other compilers (and I am looking at you MSVC) and
other architectures might employ a different kind of witchcraft.
Clang is based on the open-source compiler toolkit called LLVM. There, if you
look carefully in the codegen package, you will find a file ItaniumCXXABI.cpp
with the following
comment:
|
|
As you can see the member function pointers are not even of the size of the regular pointer. The (intel) semantics of this structure is as follows:
(memptr.ptr & 1) != 0
memptr.adj
(memptr.ptr - 1)
From this description we can assert that the first field in this structure is
always a legal pointer if the member function is not virtual. If the function
is virtual, ptr
value should be treated as an offset into the object
vtable
. Mocxx currently cannot deal with virtual member functions, but it
works well with regular member functions, because of the conversion trick it
uses:
|
|
Admittedly this is a grey area. This will work until Clang changes its C++ ABI, which happens quite rarely, at least in this part.
The second field in this structure called memptr.adj
is an adjustment value
to be added to this
pointer available in member functions. Why would you need
it? Consider these three types:
|
|
If you construct A or B directly, this
pointer will contain the address of
the object of its respective type. And the reference to any of the fields would
be equal to this pointer plus the relative offset to this field, in this case
fields are number
and flag
. When class C inherits from A and B the reality
changes slightly. For inheritance to work, C++ compilers must be able to
reserve space for all involved types (A, B and C in this case) in the same
memory object. Since the memory is linear in nature, the allocation is linear
as well, so the compiler simply reserves space sequentially. In our case, an
object of type C consists of object A followed by object B followed by object
C, and alignment padding in between if necessary.
This is where memptr.adj
comes into play. It is simply an offset from the
memory object start to a sub-object of some inherited type. In the case above,
if you invoke get_number()
on an object of type C the adjustment to this
pointer will be 0
, but if you invoke get_flag()
adjustment will be 8
bytes, to slide past the number
field of the sub-object A.
Mocxx does not require special treatment for regular member functions even if
the call requires this
pointer adjustment, because such adjustment is done at
the call site, before the replacement is invoked.
With all the information above it is possible to freely convert a free function
and member function pointer (albeit not virtual yet) to void
pointer. Now that
we have the target, let’s talk about the replacement.
In the provided function replacement example you can see that we don’t pass the lambda directly but instead we pass something like this:
|
|
Gum requires replacement to be passed as void pointer, but a lambda, a replacement for some target is implemented as a struct with state, passing only the call operator would not work in this case. To solve this a static proxy must be used of the following form:
|
|
The proxy is a template class parameterized by the return and argument types of
the replacement (and therefore the ones of the target). It extends
ReplacementProxyBase
type, that serves the type-erasure purpose, to be able
to store instances of this typed proxy in a single map.
The replacement proxy template class (not its objects) instantiation is unique
per function signature, the ResultType
and Args...
parameters. This means
that multiple functions of the same signature will share the same replacement
proxy. This will become relevant in a moment.
When such a replacement proxy is constructed, it is passed the void
pointer
of the target function and the replacement lambda. The mapping between the
target and the replacement is stored in a static std::unordered_map
of that
proxy template class instantiation.
The static Invoke
method of the proxy template class instantiation is
parameterized by the ResultType
and Args...
template parameters, and upon
template class instantiation it can be used as a substitution for the target,
because it can accept all the arguments target can, and returns the same result
as the target does. This static method is converted to void
pointer, to pass
it further to Gum. The implementation of this method is rather simple:
|
|
First, the method requests current invocation context. Its content is outside
the scope of this article, but for all intents and purposes it is unique per
target invocation. From the context we get the target pointer, preserved as
data at the replacement request above. And as the last action, the method looks
up the replacement static map keyed by target pointers, to find the lambda it
needs to invoke and return its result. This is where the titbit about
uniqueness per signature is important. The static Invoke
method can be
invoked for very different functions matching the same signature, and to invoke
the correct lambda instance we look up this static replacements map at runtime.
A small but still important caveat about this proxy is that it stores the target pointer per its object instance. This is required for proxy destruction. Recall that the static replacement map is keyed by the target pointers. When a proxy is destroyed the mapping between target pointer and its replacement is removed from the static map.
So far we have been discussing the basic replacement facilities inside Mocxx. The base API is already powerful enough to do 80% of the work. At times you’d wish to have a shortcut to just replace the target invocation result, return a new instance of some time on every target invocation, or limit the replacement to a single invocation. All this is relatively easy to implement, so let’s dive in.
To replace a target invocation result, without the need to specify all the
necessary types can be achieved via Result
API:
|
|
With this mock in place any attempt to fopen
will result in nullptr
. The
implementation for this method is straightforward, but not without caveats:
|
|
In short, the Result
method simply wraps the desired result value in a lambda
and passes it to the Replace
method. Two important questions might pop into
your head while looking at this code sample. How the type of the target can be
resolved at the call site, and what that call to details::Capture
does.
The answer to the first question is irritatingly simple: there is no target type resolution at call site with this method, except by providing the exact type for the target function:
|
|
To answer the second question consider what happens when you pass a reference,
or a value to this method. It does use perfect-forwarding into and out of the
synthetic lambda. However, the way C++ lambda declaration syntax is organised,
does not allow you to decide on the storage type, either by value or by
reference, from the value or its type. In other words, this decision is
syntactic, not semantic. To solve this problem we need a wrapper type that can
make this decision semantically. It just happens that std::tuple
is one such
type. This type is perfectly suited to store values and references, so we use
it to “capture” the result value and pass it to the synthetic lambda.
But this is not everything that is required to capture the result value. Recall
that ReplacementProxy
contains a map from target pointer to its replacement.
This replacement is wrapped in std::function
, which cannot accept a lambda
without copying it, and this is problematic for values that can be only moved.
The reason behind this is outside the article, but we still have to deal
somehow with this. The current solution simply uses std::shared_ptr
wrapped
around the std::tuple
. Here it is:
|
|
A useful extension to the base API is the ability to replace targets only once. Here how it looks like:
|
|
The code above is pretty straightforward, except for one subtlety. The restoration of the target must occur after the result value is read, because upon restoration this enclosing synthetic lambda is destroyed and with it the capture storage.
Another useful extension is the ability to return a new value every target invocation, without the need to specify every parameter:
|
|
No surprises here, a new synthetic lambda is created that simply wraps invocations to the passed generator.
As all good things, this one does have some drawbacks. One major issue you might encounter while using Mocxx is that your functions are not being replaced, this is especially true for system headers. This section will go over the most common problems, and potential solutions.
C++ is a powerful language and it is usually packaged within a powerful
compiler that tries to optimise every bit of your code. The function you are
trying to replace might be inlined, or simply removed, because it is no longer
required for the program. This is especially evident in the very first example
this article show-cases. In order to successfully replace std::filesystem
API
you would want to wrap its header inclusion with the following pragma (clang
only):
|
|
For such wraps we create a dedicated header, so that all call sites are left unoptimised.
The problem with overloading sets in C++ is that they are not first-class citizens, you cannot bind an overloading set to a name, or pass it through a function call. Overloading sets are always resolved at call site. At the moment there is no straightforward solution to this problem, and you have to provide target function type for lambdaless API.
The major issue with template functions is the fact that they are not actual functions. Mocxx is a runtime tool, it can only replace a function that has an address. What this means is that you first need to instantiate the template function, and then pass its address to the tool.
The nature of virtual methods (aka dynamic dispatch, or late method binding) in C++ makes it impossible to home in on the target member function using language means, which is what Mocxx requires. You can of course pass in the virtual member function pointer, but it contains no information about the actual function. So what can be done with it?
The C++ standard committee in its infinite wisdom decided not to specify the
way virtual functions should be implemented, but most commonly used compilers use
a technique that can be colloquially called vtables
. Every class implementing
a virtual method, or inheriting another class with virtual methods will have a
vtable
associated with it. This vtable
contains pointers to all virtual
methods of the current class and every class it inherits from. Every object of
such class will have a hidden pointer called vpointer
. When a virtual call
needs to be done, compilers generate a load of vtable
address from the
vpointer
, add (memptr.ptr - 1)
, as explained above, to it, make another
load and invoke the resulting method pointer. All of this is done at the call
site, which Mocxx cannot control.
But we don’t actually need the call site, because it is enough to load the
target method from the vtable
using the offset given in the virtual method
pointer and the class associated with this pointer, luckily C++ type system
allows us to do at least that. This of course highly depends on the compiler
vendor or even compiler version, so we haven’t implemented this yet, you are
welcome to try.
This tool makes use of various STL containers like std::string
,
std::variant
, memory allocation API and so for. To save your sanity, suppress
the urge to mock commonly used generic API.
A potential solution to this limitation would be an ability to invoke the
original target from its replacement. This way you could for example check
this
pointer or other critical argument to invoke a desired action only when
actually required. In other cases you would default to the original function.
Mocxx proved itself an invaluable tool for our internal testing. With its help we were able to isolate the code under testing down to the actually important functions, detaching it from the execution environment and the feature context.
The work on this mocking tool is very much in progress. If you find Mocxx helpful, please consider contributing by testing and porting it to other compilers and systems.