I will describe my solution to the “post quantum” crypto challenge from the 2018 CCC CTF.
We begin the challenge by downloading a tar.gz file containing two python scripts: “challenge.py”, “generate.py” and a directory called data. As always, we are requested to find the flag.
Let’s start by looking at generate.py:
from challenge import CodeBasedEncryptionScheme from random import SystemRandom from os import urandom if __name__ == "__main__": cipher = CodeBasedEncryptionScheme.new() random = SystemRandom() for i in range(1024 + 512): pt = urandom(2) ct = cipher.encrypt(pt) with open("plaintext_{:03d}".format(i), "wb") as f: f.write(pt) with open("ciphertext_{:03d}".format(i), "wb") as f: f.write(ct) assert(pt == cipher.decrypt(ct)) with open("flag.txt", "rb") as f: flag = f.read().strip() if len(flag) % 2 != 0: flag += b"\0" cts = list() for i in range(len(flag) // 2): cts.append(cipher.encrypt(flag[i*2:i*2 + 2])) for i, ct in enumerate(cts): with open("flag_{:02d}".format(i), "wb") as f: f.write(ct)
As we can see, the script initializes some kind of encryption object called cipher, and uses it to encrypt 1024+512 strings of length 2 bytes. Every plaintext string and every encryption of such string is stored in a separate file. The script then reads a file called “flag.txt” and encrypts every 2 bytes of it into a separate file. All of these files (except flag.txt) are found in the data directory.
At this point, we realize that given enough ciphertext samples and their corresponding plaintext, we should be able to somehow compute the encryption key and decrypt the encrypted flag files.
In order to do that, we need to analyze the file challenge.py that implements the encryption and decryption.
#!/usr/bin/python3 from BitVector import BitVector from random import SystemRandom inverse_error_probability = 3 def matrix_vector_multiply(columns, vector): cols = len(columns) assert(cols == len(vector)) rows = len(columns[0]) result = BitVector(size=rows) for i, bit in zip(range(cols), vector): assert(len(columns[i]) == rows) if bit == 1: result = result ^ columns[i] return result def bitvector_to_bytes(bitvector): return bitvector.int_val().to_bytes(len(bitvector) // 8, 'big') def bitvector_from_bytes(bytes): return BitVector(size=len(bytes) * 8, intVal = int.from_bytes(bytes, 'big')) class CodeBasedEncryptionScheme(object): @classmethod def new(cls, bitlength=48): key = cls.keygen(bitlength) return cls(key) def __init__(self, key): self.key = key self.key_length = len(self.key) self.random = SystemRandom() @classmethod def keygen(cls, bitlength): key = SystemRandom().getrandbits(bitlength) key = BitVector(size=bitlength, intVal = key) return key def add_encoding(self, message): message = int.from_bytes(message, 'big') message = BitVector(size=self.key_length // 3, intVal=message) out = BitVector(size=self.key_length) for i, b in enumerate(message): out[i*3 + 0] = b out[i*3 + 1] = b out[i*3 + 2] = b return out def decode(self, message): out = BitVector(size=self.key_length // 3) for i in range(self.key_length // 3): if message[i * 3] == message[i * 3 + 1]: decoded_bit = message[i * 3] elif message[i * 3] == message[i * 3 + 2]: decoded_bit = message[i * 3] elif message[i * 3 + 1] == message [i * 3 + 2]: decoded_bit = message[i * 3 + 1] else: assert(False) out[i] = decoded_bit return bitvector_to_bytes(out) def encrypt(self, message): message = self.add_encoding(message) columns = [ BitVector( size=self.key_length, intVal=self.random.getrandbits(self.key_length) ) for _ in range(self.key_length) ] # compute the noiseless mask y = matrix_vector_multiply(columns, self.key) # mask the message y ^= message # add noise: make a third of all equations false for i in range(self.key_length // 3): noise_index = self.random.randrange(inverse_error_probability) y[i * 3 + noise_index] ^= 1 columns = [bitvector_to_bytes(c) for c in columns] columns = b"".join(columns) return columns + bitvector_to_bytes(y) def decrypt(self, ciphertext): y = ciphertext[-self.key_length // 8:] columns = ciphertext[:-self.key_length // 8] columns = [ bitvector_from_bytes(columns[i:i+self.key_length // 8]) for i in range(0, len(columns), self.key_length // 8) ] y = bitvector_from_bytes(y) y ^= matrix_vector_multiply(columns, self.key) result = self.decode(y) return result
Just by looking at the names of the functions and variables, we can guess that we are facing an encryption scheme that uses some linear algebra (much like in the first challenge “unofficial”).
The encryption algorithm begins by “encoding” the input message by tripling every bit in it. That is, the input 00101 becomes 000000111000111.
It then generates 48 random columns, each containing 48 random bits and uses it as a 48×48 matrix over GF(2) (https://en.wikipedia.org/wiki/GF(2)). Then it multiplies this matrix with the key, which is a 48 bit random vector, and uses the result to mask the input message. Finally, the result is added with some noise that flips one random bit in every 3 sequential (non-overlapping) bits. The output of the encryption is the random 48×48 matrix and the masked+noised message. Let’s describe the encryption algorithm in a more “linear algebra” way:
- y=Encode(Message) // this triples the bits. Assuming a 2 byte input, the result is a vector of length 48 over GF(2).
- M=RandomMatrix(FieldSize=2,Dimention=48)
- r=noise(48) // this generates the noise vector – in every 3 sequential bits, one equals 1 and the others are 0.
- return(M,M⋅key+y+r)
As for the decryption – given (M,c) it computes y’=c+M⋅key . The plaintext is then reconstructed from y’ by applying “majority” on every 3 sequential bits.
How can we break this encryption?
Let’s first assume no noise was added. Then every pair of plaintext and ciphertext is providing us with equations of the form:
Here the only unknown values are the Key bits, and so we treat them as variables in our system of equations. We have 48 equations per ciphertext+plaintext pair and 48 variables (NOTE: the equations consist of random scalars and thus are most likely linearly independent). Solving this is pretty much straightforward (using SAGE, MATLAB or any other tool supporting simple linear algebra operations).
However, we have the noise added to the equations, so what do we do? We treat them as more variables!
The equations are of the form
Where r_k is a different variable in every equation. Seemingly we encounter another problem now – every equation adds another variable, so for k equations we have k+48 variables.
We do however know some more restrictions on the new variables r_k. We know that for every k≡0 mod(3):
This means that for every k original equations, we get additional k/3 equations and k+48 variables. Hence we only need to make sure that k≥48⋅3 and we are fine. For this it suffice to use only 3 or 4 pairs of ciphertext and plaintext files. This approach indeed solves the challenge!