August 15, 2004

SHA0 is cracked

According to the below post, SHA-0 has been cracked. The researchers crunched their way through lots of data and lots of cycles and finally found some text that hashes to the same value. And people at Crypto 2004 in Santa Barbara are reporting the fall of many older message digests such as MD5 as well.

A brief explanation: SHA-0 is one of a big class of messaging digest algorithms that has a particular magical property: it gives you a big number that is one-to-one with a document. So take any document and hash it with a message digest, and you get this big number, reliably. Here's one for the dollar contract my company issues: SHA#e3b445c2a6d82df81ef46b54d386da23ce8f3775. It's big, but small enough to post or send around in email [0].

Notwithstanding last week's result, you can't find a document that also hashes to that number, so software and people can reliably say, oh, yes, that's the Systemics PSD dollar. In short, message digests make wonderful digital identifiers, and also digital signatures, and we use them bountifully in Ricardo [1].

So if SHA-0 has been cracked, it might be a big deal. Is our digital infrastructure at risk? Yes, it's a big deal, but no, there is little risk. Here's why.

In cryptographic terms, this is a big deal. When the NSA designed SHA back in the early 90s, it was designed to be strong. Then, as the standards process plodded along, the NSA came out with a correction. It seems as though a flaw had been discovered, but they didn't say what that flaw was.

So we ended up with SHA-0, the suspect one, and SHA-1, the replacement. Of course, cryptographers spent years analysing the tiny differences, and about 6 years ago it all became clear (don't ask me, ask them). And now, those flaws have been exploited by the crack and by the machines. So now we know it can be done to SHA-0.

Luckily, we all use the replacement, SHA-1, but this will also fall in time. Once again, it is lucky that there is a new generation coming online: SHA-256, SHA-512 and the like.

But as a practical matter, this is not a big issue. When we as financial cryptographers build systems based on long term persistence of hashes, we weave the hash and its document into a system. This is called entanglement, whereby the hash and the document are verified over time and usage [2]. We use the software to lay a trail, as it were, and if someone were to turn up with a bogus document but a matching hash, there would be all sorts of other trip wires to catch any simplistic usage.

Also, bear in mind that the two documents that hashed to the same value are pretty useless. It took Antoine Joux and his team 80,000 CPU hours to do it, even then. So in cryptography terms, this is a milestone in knowledge, not a risk: For practical purposes, any message digest still fits the bill, as long as it is designed into a comprehensive system that provides backup by entanglement [3].



Addendums:
Also see SHA0 crack paper. Especially, at Crypto, Wang, Feng, Lai, Yu announced a fast crack: Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD. This reportedly also improves Joux, et al's result by 1000 on SHA-0, see Greg Rose's comment posted below.

Ed Felton reported a rumour that SHA-1 had already suffered the same fate, but this appeared to unfounded: so far, nothing but suggestions that SHA-1 looks shaky.

Also, Eric Rescorla gets more technical with the risks to systems, and agrees that this is big news but not big risks.

More links:
http://www.theregister.com/2004/08/19/hash_crypto/
http://www.certainkey.com/news/?12
http://eprint.iacr.org/2004/146
http://www.md5crk.com/sha0col/
http://www.tcs.hut.fi/~mjos/md5/
http://www.freedom-to-tinker.com/archives/000662.html
http://www.iacr.org/conferences/crypto2004/rump.html

http://www.computerworld.com/securitytopics/security/story/0,,95343,00.html?SKC=security-95343



[0] And it is in SHA-1 ...
[1] To see how message digests make a fine digital signature, see The Ricardian Contract which as an aside also carries a private-key signature as well.
[2] Maniatis and Baker, Secure History Preservation through Timeline Entanglement
[3] MD5 is the old favourite, which was first attacked in Dobertin's 1996 paper (and here) and now seems to be trashed in Wang, Feng, Lai, Yu paper Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD.


-------- Original Message --------
Subject: Joux found a collision for SHA-0 !
Date: Fri, 13 Aug 2004 15:32:29 +0200
From: Pascal Junod
Organization: EPFL - LASEC
To: cryptography@metzdowd.com

Hi !

This has appeared on a french mailing-list related to crypto. The results of
Joux improve on those of Chen and Biham which will be presented next week at
CRYPTO'04.

Enjoy !

<quote>

Thursday 12th, August 2004

We are glad to announce that we found a collision for SHA-0.

First message (2048 bits represented in hex):
a766a602 b65cffe7 73bcf258 26b322b3 d01b1a97 2684ef53 3e3b4b7f 53fe3762
24c08e47 e959b2bc 3b519880 b9286568 247d110f 70f5c5e2 b4590ca3 f55f52fe
effd4c8f e68de835 329e603c c51e7f02 545410d1 671d108d f5a4000d cf20a439
4949d72c d14fbb03 45cf3a29 5dcda89f 998f8755 2c9a58b1 bdc38483 5e477185
f96e68be bb0025d2 d2b69edf 21724198 f688b41d eb9b4913 fbe696b5 457ab399
21e1d759 1f89de84 57e8613c 6c9e3b24 2879d4d8 783b2d9c a9935ea5 26a729c0
6edfc501 37e69330 be976012 cc5dfe1c 14c4c68b d1db3ecb 24438a59 a09b5db4
35563e0d 8bdf572f 77b53065 cef31f32 dc9dbaa0 4146261e 9994bd5c d0758e3d

Second message:
a766a602 b65cffe7 73bcf258 26b322b1 d01b1ad7 2684ef51 be3b4b7f d3fe3762
a4c08e45 e959b2fc 3b519880 39286528 a47d110d 70f5c5e0 34590ce3 755f52fc
6ffd4c8d 668de875 329e603e 451e7f02 d45410d1 e71d108d f5a4000d cf20a439
4949d72c d14fbb01 45cf3a69 5dcda89d 198f8755 ac9a58b1 3dc38481 5e4771c5
796e68fe bb0025d0 52b69edd a17241d8 7688b41f 6b9b4911 7be696f5 c57ab399
a1e1d719 9f89de86 57e8613c ec9e3b26 a879d498 783b2d9e 29935ea7 a6a72980
6edfc503 37e69330 3e976010 4c5dfe5c 14c4c689 51db3ecb a4438a59 209b5db4
35563e0d 8bdf572f 77b53065 cef31f30 dc9dbae0 4146261c 1994bd5c 50758e3d

Common hash value (can be found using for example "openssl sha file.bin"
after creating a binary file containing any of the messages)
c9f160777d4086fe8095fba58b7e20c228a4006b

This was done by using a generalization of the attack presented at Crypto'98
by Chabaud and Joux. This generalization takes advantage of the iterative
structure of SHA-0. We also used the "neutral bit" technique of Biham and
Chen (To be presented at Crypto'2004).

The computation was performed on TERA NOVA (a 256 Intel-Itanium2 system
developped by BULL SA, installed in the CEA DAM open laboratory
TERA TECH). It required approximatively 80 000 CPU hours.
The complexity of the attack was about 2^51.

We would like to thank CEA DAM, CAPS Entreprise and BULL SA for
their strong support to break this challenge.

Antoine Joux(*) (DCSSI Crypto Lab)
Patrick Carribault (Bull SA)
Christophe Lemuet, William Jalby
(Universit'e de Versailles/Saint-Quentin en Yvelines)

(*) The theoretical cryptanalysis was developped by this author.
The three others authors ported and optimized the attack on the TERA NOVA
supercomputer, using CAPS Entreprise tools.

$hexdump fic1.bin
0000000 66a7 02a6 5cb6 e7ff bc73 58f2 b326 b322
0000010 1bd0 971a 8426 53ef 3b3e 7f4b fe53 6237
0000020 c024 478e 59e9 bcb2 513b 8098 28b9 6865
0000030 7d24 0f11 f570 e2c5 59b4 a30c 5ff5 fe52
0000040 fdef 8f4c 8de6 35e8 9e32 3c60 1ec5 027f
0000050 5454 d110 1d67 8d10 a4f5 0d00 20cf 39a4
0000060 4949 2cd7 4fd1 03bb cf45 293a cd5d 9fa8
0000070 8f99 5587 9a2c b158 c3bd 8384 475e 8571
0000080 6ef9 be68 00bb d225 b6d2 df9e 7221 9841
0000090 88f6 1db4 9beb 1349 e6fb b596 7a45 99b3
00000a0 e121 59d7 891f 84de e857 3c61 9e6c 243b
00000b0 7928 d8d4 3b78 9c2d 93a9 a55e a726 c029
00000c0 df6e 01c5 e637 3093 97be 1260 5dcc 1cfe
00000d0 c414 8bc6 dbd1 cb3e 4324 598a 9ba0 b45d
00000e0 5635 0d3e df8b 2f57 b577 6530 f3ce 321f
00000f0 9ddc a0ba 4641 1e26 9499 5cbd 75d0 3d8e


$ hexdump fic2.bin
0000000 66a7 02a6 5cb6 e7ff bc73 58f2 b326 b122
0000010 1bd0 d71a 8426 51ef 3bbe 7f4b fed3 6237
0000020 c0a4 458e 59e9 fcb2 513b 8098 2839 2865
0000030 7da4 0d11 f570 e0c5 5934 e30c 5f75 fc52
0000040 fd6f 8d4c 8d66 75e8 9e32 3e60 1e45 027f
0000050 54d4 d110 1de7 8d10 a4f5 0d00 20cf 39a4
0000060 4949 2cd7 4fd1 01bb cf45 693a cd5d 9da8
0000070 8f19 5587 9aac b158 c33d 8184 475e c571
0000080 6e79 fe68 00bb d025 b652 dd9e 72a1 d841
0000090 8876 1fb4 9b6b 1149 e67b f596 7ac5 99b3
00000a0 e1a1 19d7 899f 86de e857 3c61 9eec 263b
00000b0 79a8 98d4 3b78 9e2d 9329 a75e a7a6 8029
00000c0 df6e 03c5 e637 3093 973e 1060 5d4c 5cfe
00000d0 c414 89c6 db51 cb3e 43a4 598a 9b20 b45d
00000e0 5635 0d3e df8b 2f57 b577 6530 f3ce 301f
00000f0 9ddc e0ba 4641 1c26 9419 5cbd 7550 3d8e

$ diff fic1.bin fic2.bin
Binary files fic1.bin and fic2.bin differ

$ openssl sha fic1.bin
SHA(fic1.bin)= c9f160777d4086fe8095fba58b7e20c228a4006b

$ openssl sha fic2.bin
SHA(fic2.bin)= c9f160777d4086fe8095fba58b7e20c228a4006b

</quote>

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Pascal Junod http://crypto.junod.info *
* Security and Cryptography Laboratory (LASEC) *
* Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne *
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo@metzdowd.com

Posted by iang at August 15, 2004 11:04 AM | TrackBack
Comments

FWIW, here's the little Perl program I wrote to convert the HEX back to binary so I could check the results. A real nerd could do this in one line.....


while (<>) {
@words = split(' ', $_);
foreach $w (@words) {
my $i;
for ($i = 0; $i < 4; $i++) {
$w =~ s/(..)//;
my $byte = $1;
print pack("H*", $byte);
}
}
}

I was originally flustered by the byte ordering coming out swapped in the above hex text, but that seems to be an artifact of their machine, not mine.

Posted by: Iang at August 15, 2004 11:25 AM

Greg Rose wrote, in summary of Crypto:

Eli Biham -- has collisions on 34 (out of 80) rounds of SHA-1, but can extend that to probably 46. Still nowhere near a break.

Antoine Joux -- his team announced the collision on SHA-0 earlier this week. There is concentration on the so-called "IF" function in the first 20 rounds... f(a,b,c) = (a & b) ^ (~a & c). That is, the bits of a choose whether to pass the bits from b, or c, to the result. The technique (and Eli's) depends on getting a "near collision" in the first block hashed, then using more near collisions to move the different bits around, finally using another near collision to converge after the fourth block hashed. This took 20 days on 160 Itanium processors. It was about 2^50 hash evaluations.

Xiaoyun Wang was almost unintelligible. But the attack works with "any initial values", which means that they can take any prefix, and produce collisions between two different suffixes. The can produce the first collision for a given initial value in less than an hour, and then can crank them out at about one every 5 minutes. It seems to be a straightforward differential cryptanalysis attack, so one wonders why no-one else came up with it. The attack on Haval takes about 64 tries. On MD4, about 4 tries. RIPE-MD, about 2 hours (but can improve it). SHA-0 about 2^40 (1000 times better than Joux).

Xuejia Lai clarified that the paper on E-print has been updated with correct initial values. They were initially byte-reversed, which they blamed on Bruce Schneier.

Greg.

Posted by: Greg Rose at August 18, 2004 05:14 AM

In the light of day and less inebriated, I'd like to clarify some of what I wrote last night, and maybe expand a bit. My original account wasn't what I'd like to think of as a record for posterity.

Greg.

At 13:11 2004-08-18 +1000, Greg Rose wrote:

> Xiaoyun Wang was almost unintelligible.


This was not meant as a criticism. It just meant you had to concentrate really hard. Her English is much better than my Chinese.

> But the attack works with "any initial values", which means that they can take any prefix, and produce collisions between two different suffixes. The can produce the first collision for a given initial value in less than an hour, and then can crank them out at about one every 5 minutes.


As mentioned previously, the idea is to produce a good "partial collision" with the first blocks input to the hash, and then from two similar starting points, find subsequent blocks that correct them back to the same output. So, for a given input chaining vector, it takes about an hour to get the partial collisions in the first input block. From there, they can compute subsequent "second blocks" to produce the collisions in a few minutes.

Note that they did two entirely new collisions for MD5 overnight, communicating back to China, when they found out about the endianness problems. No-one can now argue that they can't find collisions at will.

> It seems to be a straightforward differential cryptanalysis attack, so one wonders why no-one else came up with it.


With further hindsight, and Phil Hawkes' help, I understand now. The technique needs to alternate between single bit (xor) differences and subtractive differences, with careful conditioning of the bits affected to make sure the right carries do, or don't, propagate. This is explained in Phil's upcoming paper about SHA-2 cancellations, which was presented later in the rump session. That should be on e-print in the next couple of days. The Chinese team is also writing a much more detailed paper, but it will take longer.

There has been criticism about the Wang et. al paper that "it doesn't explain how they get the collisions". That isn't right. Note that from the incorrect paper to the corrected one, the "delta" values didn't change. Basically, if you throw random numbers in as inputs, in pairs with the specified deltas, you should eventually be able to create your own MD5 collisions for fun or profit. How they got this insight, we'll have to wait for... but the "method" is already there.

Greg.

Posted by: Greg Rose at August 18, 2004 12:06 PM

Jon Callas writes on one of the crypto lists:

I'm still at Crypto. SHA-1 is still safe. There have been a lot of unconfirmed reports about all sorts of things.

The bottom line is that SHA-1 is the most analyzed, still-safe hash function we have. That is also the bad news. There needs to be a lot more work on hash functions. However, none of the attacks we learned of this week apply to SHA-1.

Jon

Posted by: Jon Callas at August 19, 2004 02:21 PM