Summary: Partially-controlled-plaintext compression attack, similar to CRIME
Key: crime_sometimes_pays
This was a really neat problem that required a bit of creative thinking to pull off. Hopefully my enthusiasm for this problem comes through!
Note: If I have made any mistakes here regarding the techniques used in the CRIME attack or the details of Zlib (both of which I only have cursory knowledge of), please send me an email.
What We Have
- We have an IP address and a port for the service we are supposed to be attacking</li>
- We have the source code for part of a Python Web service:</li>
Goal
We want to recover the “PROBLEM_KEY”, which is a secret string of lowercase letters and underscores, as shown in the source code above.
Protocol
We send a 4-byte big-endian length, followed by that many bytes.
If the length is not too long, we get back:
nonce || ciphertext_len || aes-ctr(zlib(sent || PROBLEM_KEY))
where ||
means ‘concatenation’.
If you get 8 random bytes back, you did something wrong - this is the server sending the nonce before finding an error with your input.
Technique 1: Recovering the first byte
In the CRIME exploits, there is usually some known text sent before and after the session key - this can be used to exacerbate/improve the infoleak of better compression when part of the session key is guessed, by prepending your guesses with the known text. We did not have any known text, so we needed to improvise.
With the above Bash script, we are sending strings aaaaa
, bbbbb
.. _____
and observing any changes in the compressed size (nc -vv
gives that).
The vast majority of the returned ciphertexts will be the same length. Any smaller
ciphertexts represent better compression, possibly because we have guessed the first
character.
Zlib’s compression format is a simple language - the verbs of which are either “when decompressing, emit these X bits exactly as they appear” or “go back to position X and emit N of those bits Y times”. This means that repeating patterns are compressed very efficiently in Zlib. The details of the CRIME attack are detailed very well in the original papers by Thai Duong and Juliano Rizzo, and on Adam Langley’s Imperial Violet blog.
Example
We send aaaaa
, which concatenated with the PROBLEM_KEY
, results in aaaaacrime_sometimes_pays
This will compress to the following three Zlib terms:
- emit the following byte:
'a'
- go back to the first position, emit that one byte 5 times
- emit the following bytes:
crime_sometimes_pays
This means that our 5 bytes of a
’s have been compressed to, for example, 2
bytes — one for the single raw a
and one for the instruction to repeat it 5 times.
However, once we send ccccc
, which results in ccccccrime_sometimes_pays
, this will compress to the following three terms:
- emit the following byte:
'c'
</li> - go back to the first position, emit that one byte 6 times</li>
- emit the following bytes:
rime_sometimes_pays
</li>
Notice that the bytes in the third instruction is a full byte shorter (missing the first c
).
The size of the term to repeat the first byte remains the same, whether it is repeating it 5 times or 6.
This means that the compressed message will be one byte shorter - when we see this,
we know that we’ve guessed the first byte correctly.
Note: Because Zlib can aggressively compress even small repeating patterns, you may get some false positives.
This single-byte change is easily visible as a consequence of using AES-CTR, which transforms a block cipher (here, AES) into a stream cipher. If another mode had been used (for example, AES-ECB or AES-CBC) we would have to determine the length of the PROBLEM_KEY and then do careful edge-of-block alignment.
At this point, we have successfully recovered the first byte of the PROBLEM_KEY.
Technique 2: Recovering the second byte
This is where we got stuck briefly. We saw no changes in the lengths when we
sent the following: ca
, cb
, cc
, etc.
Our guess is that the compressor did not want to compress the crcr
in:
crcrime_sometimes_pays
, because it would not result in a smaller
compressed text.
After some thought, we ended up doing the following:
Examples: cacaca
, cbcbcb
, c_c_c_
This, concatenated with the PROBLEM_KEY, would yield: cacacacrime...
, crcrcrcrime...
This proved to be a big-enough pattern for the compressor to compress. Once
again, when our guess byte was correct, the compressed text would be a single
byte shorter.
We were able to recover the second byte, “r”, with this technique.
Technique 3: Recovering the remaining bytes
Repeating Technique 2 (cracracra
.. crimacrimacrima
) until we recovered
approximately 5 bytes allowed us to switch to a more naive method - simply
sending known || guess_byte
, or: crimea
, crimeb
, etc.
Using this method, we were able to recover the remaining bytes, up to
crime_some
. At that point, we abandoned our pain-staking
manual method and guessed crime_sometimes_pays
, which happened to be the right key!
Conclusion
The CRIME attacks are very effective because there is known HTTP header text that may allow an attacker to improve their odds by creating long runs of repeating text. For example, in an HTTP header:
If we could send Set-Cookie: a
, Set-Cookie: b
, etc., the compressor would
greedily compress as much as possible, and we naively use Technique 3 to recover the entire (in this case)
session key. Because we did not have this, we had to come up with a couple of “bootstrapping” techniques to get
going - once we had the first portion of the problem key, we were able to trivially recover the
remainder of the key using the naive “Technique 3”.