Parasite Validator

Challenge Description

A rival pro profit tree planting enterprise has published a reporting solution for business inquiries, thinking it might help them gain reputation to increase sales. Fortuately for us they thought it would be a good idea to outsource the development and i believe it isn’t secure at all. Get me the key.

First impression

The challenge starts with an EXE which opens a somehow modified Notepad requesting for a key. Three failed attempts and you’re out.

A bit of reversing of ParasiteValidator.exe led to the fact that in order to succeed, the processed input memory is compared with the string (which looks like a hash): “C81477CEFC17B7241647696EDD01341E156B6C3073ACF9BD4413168AE5F04A062D7F75C508”.
Some more dynamic debugging shows that the string you input is somehow changed, prior to the memcmp, but it’s not clear exactly where and how the string is changed.

The hidden dll:

One of the startup function reads a resource named “101” from the PE.

.text:0000000140001109       xor     ecx, ecx        ; lpModuleName
.text:000000014000110B       call    cs:GetModuleHandleA ; Indirect Call Near Procedure
.text:0000000140001111       lea     r8, Type        ; "D"
.text:0000000140001118       mov     edx, 101        ; lpName
.text:000000014000111D       mov     rcx, rax        ; hModule
.text:0000000140001120       call    cs:FindResourceA ; Indirect Call Near Procedure

Then this resource is loaded into memory and changed via a XOR function:

.text:0000000140006926                 mov     [rsp+128h+xor_buf], 13h
.text:000000014000692E                 mov     [rsp+128h+xor_buf+1], 37h
.text:0000000140006936                 mov     [rsp+128h+xor_buf+2], 0C0h
.text:000000014000693E                 mov     [rsp+128h+xor_buf+3], 0DEh
.text:0000000140006946                 mov     [rsp+128h+xor_buf+4], 0BAh
.text:000000014000694E                 mov     [rsp+128h+xor_buf+5], 0BEh
.text:0000000140006956                 mov     [rsp+128h+xor_buf+6], 4
.text:000000014000695E                 mov     [rsp+128h+xor_buf+7], 20h
.text:0000000140006966                 mov     [rsp+128h+xor_buf+8], 0FEh
.text:000000014000696E                 mov     [rsp+128h+xor_buf+9], 0EDh
.text:0000000140006976                 mov     [rsp+128h+xor_buf+0Ah], 0B0h
.text:000000014000697E                 mov     [rsp+128h+xor_buf+0Bh], 0Bh
.text:0000000140006986                 mov     [rsp+128h+xor_buf+0Ch], 0CAh
.text:000000014000698E                 mov     [rsp+128h+xor_buf+0Dh], 0FEh

The XORed memory is written to a file named ‘seed’, from which it is loaded as a dll and injected into the Notepad.exe process.

So we extracted the resource, converted the data using the XOR buffer, and saw that we have a valid dll file, which we loaded into our disassembler for reversing.

It’s evident that the 101 dll handles the manipulation of the user input. But how do the processes communicate, and what data manipulation is done?

IPC via Window Message, exceptions and context

Keypress Window message

The main process sends each key press via the SendMessageA(0x102,…) to the Notepad process.
During initialization the dll part of the Notepad process replaces the default window handler by it’s own function. This function intercepts 0x102 messages (key presses), records the characters in an internal memory buffer, and then passes on the message to Notepad for regular processing.

Exception and Context sharing

This is an interesting communication model. The two processes communicate by setting values in the registers of the Notepad process.

In order for the main process to process a message from 101, it calls GetThreadContext() and checks the value of the RCX register (the command register). Valid commands are 0xC1BDCEEE and 0xE8C7B756. It then proceeds to process the command, sets the result in RAX and/or RBX of the Context
, and calls SetThreadContext().

When Notepad wants to send a command to the main process, it sets it’s RCX register and then calls a function which dereferences address 0, thus causing a segmentation fault.
e.g.

mov     ecx, 0E8C7B756h ;
...

mov     rbx, 0
sub     rbx, [rbx]      ; Integer Subtraction
retn                    ; Return Near from Procedure

This presumably triggers an exception handler, which passes on the control over to the main process.

Parts of the decompiled code in the Notepad dll, in order to show a bit of the flow between the two processes:

    do
    {
      p_struc2 = get_p_global_struc2();
      remove_key_part_from_string(p_struc2, &input_string); // user has input the string
      send_context_to_other_thread(0xE8C7B756); // notify main process
    }
    while ( v3 != 'õf3N' );          // wait until confirmation received
    ...
    perform hash function on user input
    ...

    send_context_to_other_thread(0xC1BDCEEE); // notify main that hashing is complete

The corresponding decompiled part in the main process:

     if ( GetThreadContext(hThread, &Context) ) // get context to check command
      {
        command = Context.Rcx;                  // command is in RCX
        ...
        if ( command == 0xC1BDCEEE )    // hash has been performed on user input
        {
          context_r8 = Context.R8;      // R8 points to memory containing hashed input
          char_buf = 0;
          for ( i = 0; i < Context.Rdx; ++i ) // read all bytes of memory into one string
          {
            p_address = (i + context_r8);
            if ( !read_one_byte_from_memory(v2, p_address, &char_buf) )
              break;
            append_number_to_string(&user_str.external, char_buf); // char to hex
            ...
          }
          ...
          struc_7::check_is_valid(p_struc_7, &user_str);  // compare with expected result
        }

        else if ( command == 0xE8C7B756 )
        {
          ...
    }

The hashing function

Once we understood the flow and found the hashing functions, we managed to identify the constants, indices and digest size as those used by Keccak, and more specifically SHA3-256.

At this point, knowing that the input is hashed using SHA3-256 still doesn’t help us finding the flag as we can’t simply brute force an input in order to achieve the correct SHA3-256 hash.

Further investigation of the hashing function, showed that the implementation is not a straight forward SHA3. Instead the input string is hashed from the end to the beginning, starting with the last character only, then the last two, then last three etc. At each stage only the first character of the hash digest is appended to the output result string.

The decompiled code looks something like this:

    ...
    string_len = input_string.internal.len;
    offset_in_string = input_string.internal.len - 1;
    while ( offset_in_string < string_len )
    {
      memset(&hash_context, 0, sizeof(hash_context_t));
      ...
      sha3_512_hash_digest(
        &hash_context,
        (unsigned __int8 *)&p_input_buffer_data[offset_in_string],
        string_len - offset_in_string);
      first_byte_from_hash = *(_BYTE *)sha3_512_hash_finalize(&hash_context);
      string::append_char(&appended_hash_bytes_result, v9, v2, first_byte_from_hash);
      --offset_in_string;
      ...
    }

Seeing that this implementation only hashes one new character at a time, thereby weakening it, gave us an idea how to brute force it.

Brute forcing the hash

The first character hashed is the last of the input string, and the first character of the digest must equal the first char in our target hash string. So it can easily be brute forced, by enumerating over all 256 options, and comparing against our known result.
Once we have the last character of the input, we can proceed to brute force the one before last character, together with the last one, and expect to get the second output character as the first byte of the digest.
We repeat this process until receiving all the input bytes.
This algorithm is not promised to be collision free, but we hoped that it will indeed end in a reasonable time and give us results.

The python implementation:

import hashlib

target = bytes.fromhex("C81477CEFC17B7241647696EDD01341E156B6C3073ACF9BD4413168AE5F04A062D7F75C508")

def func(bytes_so_far):
    if len(bytes_so_far) == len(target):
        print(bytes_so_far)
        return

    for i in range(256):
        h = hashlib.sha3_512()
        h.update(bytes([i]) + bytes_so_far)
        x = h.digest()[0]
        if x == target[len(bytes_so_far)]:
            func(bytes([i]) + bytes_so_far)

func(b"")

Running the script gives the following output

b'/\xcd\\xafj\xfb\xec\x1e\xc3<\xcc\xe0\xdcN&\xd6\x04g@\x81qV\x14eW\x0f\x13\xce\x96\xc2\xb4H\xaf;\xb1&}'  <br>
b'8\xcd\\xafj\xfb\xec\x1e\xc3<\xcc\xe0\xdcN&\xd6\x04g@\x81qV\x14eW\x0f\x13\xce\x96\xc2\xb4H\xaf;\xb1&}'  <br>
b'M\x81\xc7Q\xbc\x1b\x15\x1ai\x12\xce\xedyW\xbd\xf1\xd5:\n\xaa\x88\xc6\x14eW\x0f\x13\xce\x96\xc2\xb4H\xaf;\xb1&}'  <br>
b'\x95\x81\xc7Q\xbc\x1b\x15\x1ai\x12\xce\xedyW\xbd\xf1\xd5:\n\xaa\x88\xc6\x14eW\x0f\x13\xce\x96\xc2\xb4H\xaf;\xb1&}'  <br>
b't\x90\x80\xf6\xbc\x1b\x15\x1ai\x12\xce\xedyW\xbd\xf1\xd5:\n\xaa\x88\xc6\x14eW\x0f\x13\xce\x96\xc2\xb4H\xaf;\xb1&}'  <br>
b'v\x00\xa9\xc5\xdeO\xeaSk\x12\xce\xedyW\xbd\xf1\xd5:\n\xaa\x88\xc6\x14eW\x0f\x13\xce\x96\xc2\xb4H\xaf;\xb1&}'  <br>
b'\x87\x00\xa9\xc5\xdeO\xeaSk\x12\xce\xedyW\xbd\xf1\xd5:\n\xaa\x88\xc6\x14eW\x0f\x13\xce\x96\xc2\xb4H\xaf;\xb1&}'  <br>
b'\xa0\x9b\x83v\xf1l_|\xbc\xb4\xce\xedyW\xbd\xf1\xd5:\n\xaa\x88\xc6\x14eW\x0f\x13\xce\x96\xc2\xb4H\xaf;\xb1&}'  <br>
b'\xb3\x9b\x83v\xf1l_|\xbc\xb4\xce\xedyW\xbd\xf1\xd5:\n\xaa\x88\xc6\x14eW\x0f\x13\xce\x96\xc2\xb4H\xaf;\xb1&}'  <br>
b'\x83\x9f\xd1v\xf1l_|\xbc\xb4\xce\xedyW\xbd\xf1\xd5:\n\xaa\x88\xc6\x14eW\x0f\x13\xce\x96\xc2\xb4H\xaf;\xb1&}'  <br>
b'\x04/^\xc0\x88A\x10\x9b\xbc\xb4\xce\xedyW\xbd\xf1\xd5:\n\xaa\x88\xc6\x14eW\x0f\x13\xce\x96\xc2\xb4H\xaf;\xb1&}'  <br>
b'R/^\xc0\x88A\x10\x9b\xbc\xb4\xce\xedyW\xbd\xf1\xd5:\n\xaa\x88\xc6\x14eW\x0f\x13\xce\x96\xc2\xb4H\xaf;\xb1&}'  <br>
b'\xeb/^\xc0\x88A\x10\x9b\xbc\xb4\xce\xedyW\xbd\xf1\xd5:\n\xaa\x88\xc6\x14eW\x0f\x13\xce\x96\xc2\xb4H\xaf;\xb1&}'  <br>
b'7=m;e:\xf2\xd8U\xff\x8c\x16\xc4\xb0G]\xcbC\x18\xb9\xber"\xf9\xc7\xab\xb7\x890O2_4w4y}'  <br>
b'\x91=m;e:\xf2\xd8U\xff\x8c\x16\xc4\xb0G]\xcbC\x18\xb9\xber"\xf9\xc7\xab\xb7\x890O2_4w4y}'  <br>
b'\xb4=m;e:\xf2\xd8U\xff\x8c\x16\xc4\xb0G]\xcbC\x18\xb9\xber"\xf9\xc7\xab\xb7\x890O2_4w4y}'  <br>
<strong>b'flag{4_tr33_a_d4y_k33ps_th3_cO2_4w4y}'</strong>  <br>
b'plag{4_tr33_a_d4y_k33ps_th3_cO2_4w4y}'  <br>
b'=\xc8\xabg{4_tr33_a_d4y_k33ps_th3_cO2_4w4y}'  <br>
b'M\xc3\xdc\x836A\xea\x9e;\xa43_a_d4y_k33ps_th3_cO2_4w4y}'  <br>
b'M2(H\xcb\x1cq\x8c\xde\xa43_a_d4y_k33ps_th3_cO2_4w4y}'  <br>
b'\x87i(H\xcb\x1cq\x8c\xde\xa43_a_d4y_k33ps_th3_cO2_4w4y}'  <br>
b'\x9f\xab(H\xcb\x1cq\x8c\xde\xa43_a_d4y_k33ps_th3_cO2_4w4y}'  <br>
b'\xdc\xab(H\xcb\x1cq\x8c\xde\xa43_a_d4y_k33ps_th3_cO2_4w4y}'  <br>
b'\xab\x9d\x0f\x84{\x12\xaf}\xceP\xb9 ~_d4y_k33ps_th3_cO2_4w4y}'  <br>
b"\xe8'\xe2w\xd3\x12\xaf}\xceP\xb9 ~_d4y_k33ps_th3_cO2_4w4y}"  

python3 solve.py 0.24s user 0.00s system 99% cpu 0.249 total

It’s quite easy to see which of the inputs is the requested flag…