38

What's the probability for the clash for the md5 algorithm? I believe it is extremely low.

Donald Duck
  • 6,488
  • 18
  • 59
  • 79
Adam Lee
  • 21,598
  • 43
  • 138
  • 208

2 Answers2

44

You need to hash about 2^64 values to get a single collision among them, on average, if you don't try to deliberately create collisions. Hash collisions are very similar to the Birthday problem.

If you look at two arbitrary values, the collision probability is only 2-128.

The problem with md5 is that it's relatively easy to craft two different texts that hash to the same value. But this requires a deliberate attack, and doesn't happen accidentally. And even with a deliberate attack it's currently not feasible to get a plain text matching a given hash.

In short md5 is safe for non security purposes, but broken in many security applications.

CodesInChaos
  • 100,017
  • 20
  • 197
  • 251
  • 1
    2^(n/2) as predicted by the birthday problem. – CodesInChaos Jan 13 '12 at 15:19
  • Due to this information, does it suitable to create documents ids for a system contains millions of documents based on their md5 hash of their respective content.? @CodesInChaos – SaidbakR Jun 07 '15 at 15:31
  • @sємsєм I'd rather use SHA256, but MD5 shouldn't be a problem as long as the documents are created by a benign party. – CodesInChaos Jun 07 '15 at 17:43
  • I prefer md5 due to performance I think md5 is much quicker than SHA256, is not it? @CodesInChaos – SaidbakR Jun 07 '15 at 20:38
  • 1
    @sємsєм It is faster, but even SHA-2 and SHA-3 can handle several hundred MB/s on a desktop CPU. If that's still not good enough, you can look at Skein or Blake2, which are almost as fast as MD5 while still being secure. | Alternatively if you can use a secret key, HMAC-MD5 is still relatively secure. – CodesInChaos Sep 18 '15 at 08:02
  • Great answer, thanks! – Boris Burkov Jan 06 '17 at 11:27
  • @Albert "that's 1 clash every X files" You can't really say it like that, because the probability scales quadratically with the number of files. – CodesInChaos Oct 18 '17 at 10:09
7

It generates a 128-bit value. The accidental clash rate should therefore be 2-64 (because of the Birthday Paradox).

Jonathan Leffler
  • 666,971
  • 126
  • 813
  • 1,185
  • 3
    The collision probability because significant around 2^64 values, but the clash rate for two arbitrary values is only 2^-128. – CodesInChaos Jan 13 '12 at 15:22