Guardsquare Devblog

Technical notes from the Guardsquare engineering teams.

Using LLVM to Prevent Objective-C Swizzling Through Devirtualization

Posted at — May 27, 2020

Hooking can be used as powerful introspection technique that redirects a function invocation to an attacker-controlled implementation. By doing this, an attacker can achieve a multitude of goals. It can be used to, for example:

A concrete example could be an app containing premium features. In those apps there must be logic to verify if a user is allowed access to those features. A malicious party could redirect that logic to a different implementation that always allow access.

Function hooking relies on abusing existing control flow redirections or in some cases adding new ones. On iOS this can take on the following forms:

  1. Abuse of a language’s dynamic dispatch mechanism. One of the most popular techniques. Method swizzling is an example of this for the Objective-C language. In theory, this could be done for Swift’s witness tables or C++’s vtables too.
  2. Repurposing of the dynamic linker/loader’s lazy binding facilities. This could be called ‘symbol table hooking’ and is done through modification of the __nl_symbol_ptr and __la_symbol_ptr sections in a Mach-O file. This hooking method only works for functions that are dynamically linked, for example, system library functions.
  3. Addition of control flow indirections to ‘swap out’ implementations of a function. Simply put; one would override existing instructions at the start of a function with a new branch (eg, jmp instruction) towards a new implementation. In other words, inline hooking in the assembly.
  4. Abusing signal handlers could under certain conditions be used for function hooking. By remapping memory pages and reacting to the page faults one can ‘inject’ custom behaviour for existing functions.

In this article, we’ll focus on abusing the language dynamic dispatching mechanism in Objective-C, where every method call goes through the Objective-C runtime. Visually this looks something like this:

Image 1: Objective-C method call dispatch.

Because of this, it’s possible to change the destination of function calls at runtime. The language even offers this as a feature, generally referred to as method swizzling. Objective-C offers several APIs to this end, for example class_replaceMethod, method_setImplementation, method_exchangeImplementation, etc… . The objective-C metadata available in the binary is instrumental in its dynamic dispatch. Reverse engineers leverage this system to hook/swizzle Objective-C methods.

To prevent method swizzling this post demonstrates an LLVM based approach to
devirtualization for Objective-C function calls so that the runtime is circumvented for those “lowered” functions and swizzling has no effect.

Devirtualizing Objective-C calls

Objective-C Message Passing

To fully understand what devirtualizing means, we must first understand how methods in Objective-C are “virtual” to begin with. Let’s take the following code sample:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#import <Foundation/Foundation.h>
#include <objc/runtime.h>

@interface ClassA : NSObject
- (void) instanceMethodA:(int)param1;
@end

@implementation ClassA
- (void) instanceMethodA:(int)param1 {
  NSLog(@"Called instance method in Class A");
}
@end

@interface ClassB : NSObject
- (void) instanceMethodB:(int)param1;
@end

@implementation ClassB
- (void) instanceMethodB:(int)param1 {
  NSLog(@"Called instance method in Class B");
}
@end

int main() {
  ClassA *classA;
  classA = [[ClassA alloc] init];

  ClassB *classB;
  classB = [[ClassB alloc] init];

  [classA instanceMethodA: 1];
  [classB instanceMethodB: 1];
}

The output of this program would be:

1
2
main.o[19906:4393635] Called instance method in Class A
main.o[19906:4393635] Called instance method in Class B

When looking at the binary in a disassembler, we see that the function calls don’t happen directly but happen through a kind of helper function called objc_msgSend.

Image 2: Objective-C method call decompiled

So under the hood, the method call [classA instanceMethodA: 1]; is actually translated to the following objc_msgSend(classA, @selector(instanceMethodA), 1);.

This means that for image 1 a more accurate representation would be:

Image 3: Objective-C method call dispatch with `objc_msgSend`.

So what is this objc_msgSend? This is a regular C function that takes

For more details see: https://developer.apple.com/documentation/objectivec/1456712-objc_msgsend

In other words, the function is called indirectly by passing a message to the Objective-C runtime. Which performs a lookup to invoke the correct function with the provided parameters.

Objective-C Method Swizzling

It’s because of this dispatching model that a feature called method swizzling is available in the Objective-C language. By offering an API that replaces the address in the runtime, it’s possible to dynamically change the implementation of a function at run time.

This is a powerful tool for a developer when implementng complex logic, but even more useful for a reverse engineer to swap out the implementation of security-sensitive functions.

Also note that since almost every Objective-C function call is translated to an objc_msgSend call, by hooking objc_msgSend itself a reverse engineer can get an accurate trace of the program execution. This is another good reason to avoid this indirection.

To show how this works in practice, let’s extend the main function of the previous example with a call to method_setImplementation that replaces the implementation of instanceMethodA with that of instanceMethodB:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
int main() {
  ClassA *classA;
  classA = [[ClassA alloc] init];

  ClassB *classB;
  classB = [[ClassB alloc] init];

  method_setImplementation(
      class_getInstanceMethod([ClassA class], @selector(instanceMethodA:)),
      method_getImplementation(class_getInstanceMethod([ClassB class], @selector(instanceMethodB:)))
      );

  [classA instanceMethodA: 1];
  [classB instanceMethodB: 1];
}

Now the output of this program is:

1
2
main.o[19906:4393635] Called instance method in Class B
main.o[19906:4393635] Called instance method in Class B

Note that the call to [classA instanceMethodA: 1]; has been redirected and doesn’t execute the original implementation anymore!

Devirtualizing objc_msgSend with LLVM

We can circumvent the runtime by skipping the objc_msgSend call and directly calling the actual function. Let’s have a look at the LLVM bitcode that is generated for the code sample above. It can be compiled with the following command:

1
xcrun clang main.m -o main.o -framework Foundation  -fembed-bitcode  

Use a tool like ebcutil to extract the bitcode from the binary (you need to run llvm-dis on the extracted bitcode to obtain human-readable .ll file).

The bitcode looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
define internal void @"\01-[ClassA instanceMethodA:]"(%0*, i8*, i32) #0 {
  // Function body omitted for brevity
}
define internal void @"\01-[ClassB instanceMethodB:]"(%1*, i8*, i32) #1 {
  // Function body omitted for brevity
}

define i32 @main() #2 {

  // Class initialization omitted for brevity

  // swizzling
  %16 = load %struct._class_t*, %struct._class_t** @"OBJC_CLASSLIST_REFERENCES_$_.5", align 8
  %17 = bitcast %struct._class_t* %16 to i8*
  %19 = call i8* @objc_opt_class(i8* %17)
  %19 = load i8*, i8** @OBJC_SELECTOR_REFERENCES_.6, align 8, !invariant.load !9
  %20 = call %struct.objc_method* @class_getInstanceMethod(i8* %18, i8* %19)
  call void @method_exchangeImplementations(%struct.objc_method* %15, %struct.objc_method* %20)

  
  // [classA instanceMethodA: 1];
  %21 = load %0*, %0** %1, align 8
  %22 = load i8*, i8** @OBJC_SELECTOR_REFERENCES_, align 8, !invariant.load !9
  %23 = bitcast %0* %21 to i8*
  call void bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to void (i8*, i8*, i32)*)(i8* %23, i8* %22, i32 1)

  // [classB instanceMethodB: 1];
  %24 = load %1*, %1** %2, align 8
  %25 = load i8*, i8** @OBJC_SELECTOR_REFERENCES_.6, align 8, !invariant.load !9
  %26 = bitcast %1* %24 to i8*
  call void bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to void (i8*, i8*, i32)*)(i8* %26, i8* %25, i32 1)
  ret i32 0
}

declare i8* @objc_alloc_init(i8*)
declare i8* @objc_msgSend(i8*, i8*, ...) #3

In this bitcode snippet we can see two important things:

  1. The objc_msgSend calls that were discussed earlier.
  2. The function definitions of the instancemethods: "\01-[ClassA instanceMethodA:]" and "\01-[ClassB instanceMethodB:]".

So, on a bitcode level the Objective-C functions are the same as a regular C-style function. There’s, for example, no immediately obvious difference with the main function in this bitcode file, nor would there be a visible difference with any other C function we might implement. This makes a lot of sense, because Objective-C is built on top of the C language and extends it with message passing.

We can replace the indirect objc_msgSend call with a direct function call to "\01-[Class1 instanceMethod1:]" by replacing the following lines in the bitcode:

1
2
3
4
5
// [classA instanceMethodA: 1];
call void bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to void (i8*, i8*, i32)*)(i8* %23, i8* %22, i32 1)

// [classB instanceMethodB: 1];
call void bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to void (i8*, i8*, i32)*)(i8* %26, i8* %25, i32 1)

with:

1
2
3
call void @"\01-[ClassA instanceMethodA:]"(%0* %21, i8* %22, i32 1)

call void @"\01-[ClassB instanceMethodB:]"(%1* %24, i8* %25, i32 1)

This way the Objective-C runtime is removed from the equation and the call happens directly:

image 4: Objective-C method call devirtualized.

You can compile the human-readable .ll file with clang:

1
xcrun clang main.ll -o main.o -framework Foundation

The output binary can be executed and the output is back to the original, as if the swizzling never even happened:

1
2
main.o[19906:4393635] Called instance method in Class A
main.o[19906:4393635] Called instance method in Class B

The method call is now devirtualized and the runtime is not involved in the function call anymore making it impossible to swizzle.

This was a manual demonstration of this technique. To perform method call devirtualization in an automated fashion there are two difficult hurdles to face:

  1. First hurdle is getting an overview of all the Objective-C classes and methods implemented in the binary. This requires extensive parsing of the Objective-C metadata to build a complete model of the app.
  2. Secondly, when trying to convert an objc_msgSend call to a direct call, the callee must be predicted. In simple cases, like the bitcode above, it’s trivial to trace back the call to the original selector and class instance. But in real-world situations, this is a lot harder and there are many edge cases.

Closing thoughts

We’ve shown how we can protect applications against one of the more popular hooking techniques by devirtualizing Objective-C method calls. The observant reader will notice that by converting objc_msgSend calls to direct calls, the call graph can now be retrieved more easily, making it easier to find new hooking targets. While this is true, it must be said that most decent decompilers can create a call graph for objc_msgSend calls anyway. It’s also important to note that the other hooking techniques mentioned in the introduction require additional protection.

iXGuard provides effective protection against iOS application hooking.