0

In my Perl application, I need compare two versions of a file and detect whether they have changed.

I'm trying to choose between MD5 or SHA. This is not about security. This is about fastest way to compare files. I'm inclined towards MD5.

However, when I ran benchmarks, it suggests otherwise.

Any recommendations?

Here's a benchmark I ran with the largest file in my app.

Benchmark: timing 10000000 iterations of MD5, SHA...
   MD5: -0.199009 wallclock secs ( 0.07 usr +  0.01 sys =  0.08 CPU) @ 125000000.00/s (n=10000000)
        (warning: too few iterations for a reliable count)
   SHA: 0.494412 wallclock secs ( 0.06 usr +  0.00 sys =  0.06 CPU) @ 166666666.67/s (n=10000000)
        (warning: too few iterations for a reliable count)
       Rate  MD5  SHA
MD5 125000000/s   -- -25%
SHA 166666667/s  33%   --
  • 5
    `warning: too few iterations for a reliable count` is significant. It might help if you posted the benchmark code. It might be interesting to note that Git uses SHA1 as its means of detecting changes to files under its control. – DavidO Jan 02 '14 at 18:52
  • `my $results = timethese(10000000,{ 'SHA' => &hashsha, 'MD5' => &hashmd5, }); cmpthese($results); sub hashsha{ my $sha = new Digest::SHA( 256 ); $sha->addfile( $file, "b" ); return $sha->hexdigest(); ## do an eq of 2 file checksums } sub hasmd5{ if ( open( my $fh, "addfile( $fh ); return $md5->hexdigest(); } ## do an eq of 2 file checksums }` – user3154696 Jan 10 '14 at 13:01
  • Basically my method does an eq of 2 checksums from 2 files – user3154696 Jan 10 '14 at 13:01

1 Answers1

6

MD5 may be faster to compute than SHA1 because of its simpler structure. Then again, getting the data from the disk will be slower than keeping MD5 or SHA1 checksum up to date, so it will not really matter in practice.

Joni
  • 101,441
  • 12
  • 123
  • 178
  • Indeed. If it takes 5 full seconds to read the data from disk, does it _really_ matter whether the hashing algorithm is a few milliseconds slower than it could be? – Dave Sherohman Jan 02 '14 at 19:34
  • Being optimistic about disk read, I'm little inclined towards MD5 as it is recommended. But I was just wondering why SHA showed better benchmark than MD5. – user3154696 Jan 10 '14 at 13:04