Alex's blog

posts feed about

PlaidCTF 2013 writeup for compression (Crypto 250)

22 Apr 2013

Summary: Partially-controlled-plaintext compression attack, similar to CRIME

Key: crime_sometimes_pays

This was a really neat problem that required a bit of creative thinking to pull off. Hopefully my enthusiasm for this problem comes through!

Note: If I have made any mistakes here regarding the techniques used in the CRIME attack or the details of Zlib (both of which I only have cursory knowledge of), please send me an email.

What We Have

#!/usr/bin/python
import os
import struct
import SocketServer
import zlib
from Crypto.Cipher import AES
from Crypto.Util import Counter

# Not the real keys!
ENCRYPT_KEY = '0000000000000000000000000000000000000000000000000000000000000000'.decode('hex')
# Determine this key.
# Character set: lowercase letters and underscore
PROBLEM_KEY = 'XXXXXXXXXXXXXXXXXXXX'

def encrypt(data, ctr):
    aes = AES.new(ENCRYPT_KEY, AES.MODE_CTR, counter=ctr)
    return aes.encrypt(zlib.compress(data))

class ProblemHandler(SocketServer.StreamRequestHandler):
    def handle(self):
        nonce = os.urandom(8)
        self.wfile.write(nonce)
        ctr = Counter.new(64, prefix=nonce)
        while True:
            data = self.rfile.read(4)
            if not data:
                break

            try:
                length = struct.unpack('I', data)[0]
                if length > (1<<20):
                    break
                data = self.rfile.read(length)
                data += PROBLEM_KEY
                ciphertext = encrypt(data, ctr)
                self.wfile.write(struct.pack('I', len(ciphertext)))
                self.wfile.write(ciphertext)
            except:
                break

class ReusableTCPServer(SocketServer.ForkingMixIn, SocketServer.TCPServer):
    allow_reuse_address = True

if __name__ == '__main__':
    HOST = '0.0.0.0'
    PORT = 4433
    SocketServer.TCPServer.allow_reuse_address = True
    server = ReusableTCPServer((HOST, PORT), ProblemHandler)
    server.serve_forever()

Goal

We want to recover the “PROBLEM_KEY”, which is a secret string of lowercase letters and underscores, as shown in the source code above.

Protocol

We send a 4-byte big-endian length, followed by that many bytes.

If the length is not too long, we get back:
nonce || ciphertext_len || aes-ctr(zlib(sent || PROBLEM_KEY))
where || means ‘concatenation’.

If you get 8 random bytes back, you did something wrong - this is the server sending the nonce before finding an error with your input.

Technique 1: Recovering the first byte

for l in a b c d e f g h i j k l m n o p q r s t u v w x y z "_"; do
  echo -ne "\x05\x00\x00\x00$l$l$l$l$l" | nc ip port -vv -q 3
done

In the CRIME exploits, there is usually some known text sent before and after the session key - this can be used to exacerbate/improve the infoleak of better compression when part of the session key is guessed, by prepending your guesses with the known text. We did not have any known text, so we needed to improvise.

With the above Bash script, we are sending strings aaaaa, bbbbb .. _____ and observing any changes in the compressed size (nc -vv gives that). The vast majority of the returned ciphertexts will be the same length. Any smaller ciphertexts represent better compression, possibly because we have guessed the first character.

Zlib’s compression format is a simple language - the verbs of which are either “when decompressing, emit these X bits exactly as they appear” or “go back to position X and emit N of those bits Y times”. This means that repeating patterns are compressed very efficiently in Zlib. The details of the CRIME attack are detailed very well in the original papers by Thai Duong and Juliano Rizzo, and on Adam Langley’s Imperial Violet blog.

Example

We send aaaaa, which concatenated with the PROBLEM_KEY, results in aaaaacrime_sometimes_pays This will compress to the following three Zlib terms:

  1. emit the following byte: 'a'
  2. go back to the first position, emit that one byte 5 times
  3. emit the following bytes: crime_sometimes_pays

This means that our 5 bytes of a’s have been compressed to, for example, 2 bytes — one for the single raw a and one for the instruction to repeat it 5 times.

However, once we send ccccc, which results in ccccccrime_sometimes_pays, this will compress to the following three terms:

  1. emit the following byte: 'c'</li>
  2. go back to the first position, emit that one byte 6 times</li>
  3. emit the following bytes: rime_sometimes_pays</li>

Notice that the bytes in the third instruction is a full byte shorter (missing the first c). The size of the term to repeat the first byte remains the same, whether it is repeating it 5 times or 6. This means that the compressed message will be one byte shorter - when we see this, we know that we’ve guessed the first byte correctly.

Note: Because Zlib can aggressively compress even small repeating patterns, you may get some false positives.

This single-byte change is easily visible as a consequence of using AES-CTR, which transforms a block cipher (here, AES) into a stream cipher. If another mode had been used (for example, AES-ECB or AES-CBC) we would have to determine the length of the PROBLEM_KEY and then do careful edge-of-block alignment.

At this point, we have successfully recovered the first byte of the PROBLEM_KEY.

Technique 2: Recovering the second byte

This is where we got stuck briefly. We saw no changes in the lengths when we sent the following: ca, cb, cc, etc. Our guess is that the compressor did not want to compress the crcr in: crcrime_sometimes_pays, because it would not result in a smaller compressed text.

After some thought, we ended up doing the following:

for l in a b c d e f g h i j k l m n o p q r s t u v w x y z "_"; do
  echo -ne "\x06\x00\x00\x00c$lc$lc$l" | nc ip port -vv
done

Examples: cacaca, cbcbcb, c_c_c_

This, concatenated with the PROBLEM_KEY, would yield: cacacacrime..., crcrcrcrime... This proved to be a big-enough pattern for the compressor to compress. Once again, when our guess byte was correct, the compressed text would be a single byte shorter.

We were able to recover the second byte, “r”, with this technique.

Technique 3: Recovering the remaining bytes

Repeating Technique 2 (cracracra .. crimacrimacrima) until we recovered approximately 5 bytes allowed us to switch to a more naive method - simply sending known || guess_byte, or: crimea, crimeb, etc.

Using this method, we were able to recover the remaining bytes, up to crime_some. At that point, we abandoned our pain-staking manual method and guessed crime_sometimes_pays, which happened to be the right key!

Conclusion

The CRIME attacks are very effective because there is known HTTP header text that may allow an attacker to improve their odds by creating long runs of repeating text. For example, in an HTTP header:

Set-Cookie: a98d73298a3ud98a37da938ja379a3d

If we could send Set-Cookie: a, Set-Cookie: b, etc., the compressor would greedily compress as much as possible, and we naively use Technique 3 to recover the entire (in this case) session key. Because we did not have this, we had to come up with a couple of “bootstrapping” techniques to get going - once we had the first portion of the problem key, we were able to trivially recover the remainder of the key using the naive “Technique 3”.