A Tale of Two Mallocs: On Android libc Allocators – Part 3 – exploitation

In the two previous posts of this series, we’ve discussed how the Android libc allocators work. In this last post of the series, we can try to determine what we need to do in order to exploit a heap memory corruption or use-after-free, in light of these allocators.

Exploiting these kinds of bugs is all about precise positioning of heap objects. We want to force certain objects to be allocated in specific locations in the heap, in order to form useful adjacencies for memory corruption, or reuse of a desired location for a use-after-free.

The operations performed in order to force the allocator to allocate the objects we want in the positions we want is known as heap shaping or heap ‘feng shui’. We essentially need to take advantage of our understanding of the inner workings of the allocator in order to allocate desired objects in the locations or with the adjacencies we desire.

One of the things which can really make a huge difference when implementing heap shaping for an exploit is having a way to visualize the heap: to see the various allocations in the context of the heap.

For this we need tools. These tools need not be especially complex. We don’t really need a fancy GUI to visualize regions or chunks. A simple tool which will allow us to inspect the heap state for a target process during exploit development will make an incredible difference.

As it happens, argp and vats presented a tool for visualizing the jemalloc heap at last year’s infiltrate. They later released the tool, which they call shadow, on github with support for both firefox’s standalone jemalloc and Android’s libc jemalloc. I highly recommend viewing their talk and using their tool. Shadow is a debugger plugin which allows you to view the various jemalloc structures in a textual table format. It really makes a big difference in understanding how the heap is behaving during exploit development.

Despite the fact that it is more than 30 years old, we were unable to find a similar visualization tool for dlmalloc at the time we were working on it. So we wrote one.

We’ve release shade, a dlmalloc visualization tool, on github at https://github.com/s1341/shade. The tool has an interface very similar to that of shadow.

You can request info about a given chunk using its address. This tells you the size of the chunk, its status and the segment to which it belongs.

You can get a quick picture of the chunks before and after your chunk using the dlaround command. This shows a table of chunks centered around your chunk, including their addresses, sizes and statuses.

Once you know a segment base address, you can use the dlsegment command to view all the chunks in that segment.

shade currently works only in gdb with Android’s libc dlmalloc in 32bit ARM processes. This is mostly because that has been what we’ve needed it for. Pull requests are, of course, more than welcome!

While preparing for this presentation, I found that the excellent ncc group released their own tool for visualizing dlmalloc heaps about 6 months ago. While I have not used it actively, it seems like a great tool.

Let’s get back to exploitation. For the purposes of this discussion, let’s assume that you have discovered a 0-day heap buffer overflow in application X. It’s time to exploit it.

Modern exploitation is hard. The Android system is one of the most heavily hardened platforms out there. Android devices implement a whole host of mitigations to make exploiting vulnerabilities  and gaining control of devices more difficult. Things like Address Space Layout Randomization, selinux and process sandboxing are designed to make end-to-end exploitation as painful as possible.

We can, however, overcome some of these mitigations by the generous application of persistence, creativity and luck. We usually end up breaking the exploitation down into a few steps, each of which uses a gadget of some sort.

Gadgets are specific exploitations of your vulnerability set to achieve one or more functionalities which can be chained together to create an end-to-end exploit. We usually need one or more gadgets to actually gain something useful. The gadgets might come from different ways of exploiting a single vulnerability, or from different vulnerabilities.

We usually need a relative read or write gadget to overcome ASLR, followed by arbitrary read/write or execute gadgets to gain code execution. Once we have code execution, we can try to escape sandboxing or to evade selinux.

How do we go about creating a gadget from a heap buffer overflow?

We do this with adjacency. What we want to do is to position the object containing the overflowable buffer just before another object, which we’ll call the gadget object]. We need to be able to trigger operations on that gadget object, which will, for example, use a pointer inside the object’s data structure to perform a read operation.

We can then cause the overflow, overwriting the pointer in the gadget object’s data structure with a pointer of our choice. We then trigger the read operation to read data at our desired pointer.

Using adjacency, we have created an arbitrary read gadget. We can use similar techniques to create write and execute gadgets. The key is to have a useful operation, which we can repurpose as a gadget by modifying object data through our overflow.

In order to achieve this adjacency, we need to shape the heap such that our exploitable object is allocated just before our gadget object.

Note that on dlmalloc the two objects can be of completely different sizes, as dlmalloc allows allocations of various sizes to be contiguous. For jemalloc, however, the overflow and gadget objects must be of the same size class, so that they will be allocated from the same bin, in the same run. This can greatly increase the complexity of exploitation on jemalloc, as it is necessary to find gadget objects which not only provide useful functions, but which fit in the right bin.

If you find an object which will work for jemalloc, it may work for dlmalloc too, but the reverse is not true.

So how do we go about performing the shaping necessary to get our objects allocated as desired? Remember that our interface to the allocator is very simple. We essentially only have the allocate and free operations. We need to use those to perform our shaping. What we need are some useful allocation primitives.

We need some way to cause the target application to perform allocations and frees. Usually we’ll look for some discrete, easily triggerable, functionality, such as processing of a particular kind of network packet, or usage of a particular kind of object, which performs allocations or frees as a side-effect of its normal operation.

It’s important that the primitive perform the allocation or free operation we want, at the time that we want it, so we can use it to gain control of the heap. A given exploitation might require more than one primitive.

The ideal primitive will allow us to allocate an arbitrary size, fill it with attacker controlled data and free it at a later stage with another application trigger.

Although the ideal primitive is the holy grail, we will often have to make do with lesser primitives. For example, a primitive which only allocates at a specific size, or one which allocates memory we cannot reliably free later. Some of these lesser primitives are more useful on jemalloc, some on dlmalloc.

The primitives you’ll find in your target app really depend on the specific details of the target app. They might be the result of sending certain types of packets to the target, performing operations in a scripting context such as javascript, or creating and freeing objects through higher level, more abstract, operations.

Finding good primitives is a bit of an art form. It is often necessary to reverse a lot of code before useful primitives can be found. So what should one look for?

You can start by looking for raw mallocs. These are probably the simplest primitives. A function which performs a raw malloc, either with a size you can control or of a fixed size can be a useful primitive. Sometimes the allocation will be for a multiple of a structure size.

Another good source of primitives are c++ classes which are allocated with the new operator. In most cases this new translates directly to a malloc for the size of the class. If you can easily trigger allocation and freeing of classes of interesting sizes, you have yourself an allocation primitive. These can also serve as excellent gadget objects.

Reallocs can also be a good source of primitives. They are often used with zero-length arrays in C structs in order to create variable length ‘contained’ arrays of sub objects. realloc is essentially the same as malloc.

If your target process uses the C++ standard template library, a good place to look for allocation primitives is in std::vectors. The growth of those vectors causes a ‘new’ behind the scenes which is essentially just a malloc. As the vector grows, it might fit into the size bin you are interested in. It may be necessary to add objects iteratively until the vector grows to the correct size.

std::string also uses malloc to allocate backing stores for the strings it contains. If you can allocate a std::string from a raw buffer, the size of which you control, you have an excellent primitive. Note however that std::strings are often passed by value, which causes their data to be copied into new allocations. This could be a source of ‘allocation noise’ which you might want to avoid.

In general, it’s necessary to think creatively about the accessible operations in your target in order to find the allocation primitives you need.

Let’s walk through an example exploitation with notes on some issues you might encounter on each of the allocators.

First, let’s define our asssumptions:

  • Let’s say we have discovered a buffer overflow in an object of size 0x4e0. We are able to overflow an arbitrary number of bytes with controlled data, starting from the overflowable buffer.
  • Let’s also say that we have discovered a useful gadget object, with size 0x450. This means this object will be in the same size class as the overflowable object.
  • Let’s also assume that we have discovered an allocation primitive which allows us to allocate at an arbitrary size
  • We’ve also got the ability to free any of the allocations we make with the allocation primitive
  • Finally, by performing an operation on the gadget object, we are able to determine if it has been overflowed or not.

We need to cause the overflowable object to be allocated immediately before the gadget object, using the allocation and free primitives to shape the heap.

The assumptions we’ve made are good assumptions. Without a gadget object in the same size class as the overflowable object, it is almost impossible to exploit a buffer overflow on jemalloc. Without an arbitrary size allocation primitive, it is very difficult to exploit on dlmalloc.

One technique for heap shaping that I’ve found particularly successful is called the placeholder technique.

The idea is simple. Using our arbitrary allocation primitive, we allocate a bunch of placeholder sets for our target objects, at the sizes of those objects. Each set of placeholders has a placeholder for the overflowable object and a placeholder for the gadget object. We hope that at least one set of placeholders is allocated such that the overflowable placeholder is directly before the gadget placeholder.

Next, we iterate over all of our placeholder sets, and for each set, we free the gadget placeholder and cause a gadget object to be allocated. Hopefully this gadget object will fall into the freed placeholder slot.

Then we once more iterate over each of our placeholder sets, this time freeing the overflowable placeholder, allocating our overflowable object and performing the overflow. We then activate the gadget for each instance of the gadget object, and use the result to determine if we have overflowed this instance or not. Hopefully at least one set of placeholders will have been turned into a working gadget.

This idea is the basis for the placeholder heap shaping technique. This technique can be used on both jemalloc and dlmalloc, but there are various things to look out for on each of the respective allocators.

One thing that can help increase the probability that one or more placeholder sets will have contiguous overflowable and gadget objects is to spray a large number of allocations of the relevant size class at the very beginning of the exploit. This will hopefully cause any best-fit chunks in the dlmalloc free lists or partially used runs in the relevant jemalloc bin to be used up, filling the holes. The placeholder allocations will then most probably be from new dlmalloc segments or new jemalloc runs, with successive allocations for the same size class falling after one another.

One problem you may encounter is that the allocation of the overflowable object, the gadget object or even the allocation primitives performs more than just the target allocation. In fact, it is very common for one or more of these to perform a bunch of smaller or bigger allocations around the actual target allocation we’re interested in.

These unwanted allocations shouldn’t make any difference when using jemalloc, as there we are concerned only with allocations in our bin.

On dlmalloc, these allocations might fall between the overflowable object and the gadget object, messing with our heap shaping.

Assuming the gadget object is the one which allocates the unwanted allocations, one way to deal with this issue is to do the following:

  • First, at the beginning of the exploit, we allocate a bunch of smaller blocks, let’s say 100 blocks of 0x100 bytes each
  • We then allocate all the placeholder sets
  • Then, for each placeholder set, we first free the gadget placeholder.
  • Then we free a few of the small filter allocations. These will be added to the free bins
  • Then we allocate the gadget itself, the smaller unwanted allocations will use the filter allocations we just freed, and the gadget itself will fall on the placeholder.

On dlmalloc, freeing a placeholder can cause it to be consolidated with an adjacent free chunk, resulting in it being placed in a larger bin’s free list. It then might not be used for the next allocation of the placeholder size. In other words, the placeholder will not be used to service the gadget or overflowable object allocation.

We can solve this quite easily using pinner allocations. The idea is that you simply allocate objects before your first placeholder and after your last placeholder. You never free these. As these pinners are always in use, and as we only ever free one placeholder in a set at a time, the placeholders will not be consolidated when freed.

Once you finally get your objects to be one after the other, there is an additional gotcha on dlmalloc. The metadata for the gadget chunk will be between your overflowable object and the gadget data. You need to overwrite this metadata with the sizes of the first and second objects respectively. Otherwise the allocator will fail when it comes time to free your chunks.

On jemalloc, the metadata is out-of-band, so this is not necessary.

Another thing to watch out for is that your primitive candidates may cause allocations or frees on threads other than the one which allocates your overflowable and gadget objects.  In fact, the overflowable and gadget objects might not be allocated on the same thread at all. So your placeholders might be allocated (and freed) by one thread, but the gadget or overflowable object will be allocated on another.

This shouldn’t present a problem for dlmalloc, but can be a significant pain on jemalloc.

The basic question is how can we move an allocation from one thread’s tcache to another thread.

One way to do this is to use flush events.

We free our desired placeholder region, r15. We then free up to 20 more regions on that same thread. This will fill up the tcache, resulting in a flush. The flush removes the oldest half of the tcache and marks those regions as free, making them available for other threads to allocate. We then allocate on our desired thread to get the region we want with the object content we need.

Note that the desired region might not be first in line on the thread which allocates the desired object. You might need to spray a bunch of allocations on that thread in order to catch the freed placeholder. Filling holes before allocating the placeholders can help prevent this issue.

Getting this right can be really complicated and is going to be very specific to your target, your vulnerability and the gadgets and primitives you find.

One thing to remember is that on jemalloc, the regions for your overflowable and gadget objects will be of the same size – the maximum size of this particular bin. This might be larger than the objects themselves, so when you overflow from your overflowable buffer, you need to pad that overflow with the difference between the overflowable object’s size and the bin size before you overflow any gadget object data.

On dlmalloc, the objects are directly after one another, except for the metadata between them.

On jemalloc it is possible that the threads that are involved in your exploitation are not assigned to the same arena. This can be problematic if, for example, you need to allocate contiguous regions in different threads.

Arena selection is supposed to use a round-robin approach, but in reality it will always allocate a new thread to the arena with the least threads assigned. If you can create and destroy threads somehow, before your overflow, you can do the following:

First add a whole bunch of threads to the system, say 30 threads. They will be spread evenly across the two arenas. Now destroy every second thread you created, causing one arena to have 15 threads and the other 0. Now allocate your exploitation threads. Up to 15 of them will be allocated in the same arena.

That’s about all I’ve got for you guys. I hope that I’ve given you a foundation in understanding how the Android libc allocators work, and that you’ve heard some tips regarding how to successfully exploit heap vulnerabilities on Android.