3

My application is running as a windows service and I'm attaching VS2013 to it's process to debug. I'm getting the hash code for the content of image files to check for differences with the following method (within a static class):

static class FileUtils
{
    public static int GetFileHash(string filePath)
    {
        int hash = 0;
        Logger.WriteLog(ToolTipIcon.Info, "Calculating hash code for {0}", filePath);
        StreamReader sr = new StreamReader(filePath, Encoding.Unicode);
        hash = sr.ReadToEnd().GetHashCode();
        sr.Close();
        return hash;
    }
}

Which has been working fine in production. However, this method will always return 2074746262 for two different images. I've tried to reproduce this in a winforms app with the same code and images and I can't. Is there something with debugging a process in VS2013 that would cause this behavior? I've replaced one of the images with an entirely different image, but it still happens.

Marc
  • 3,739
  • 4
  • 19
  • 34
Phil
  • 3,038
  • 3
  • 28
  • 39
  • What does the code that calls this look like? Is it threaded at all? – Liam Apr 28 '15 at 14:10
  • 5
    Who said `string.GetHashCode` will be unique? It depends on the day the code is executing etc – Sriram Sakthivel Apr 28 '15 at 14:13
  • 3
    `GetHashCode` is not appropriate for creating a digest for a large binary file. Use e.g. MD5 or SHA1 hashing instead. Google for "cryptographic hash". – stakx - no longer contributing Apr 28 '15 at 14:13
  • Are you running on an old version of .Net? [There was a bug with strings that contain \0](http://stackoverflow.com/q/6813263/7586). – Kobi Apr 28 '15 at 14:14
  • 1
    How to [Calculate MD5 checksum for a file](http://stackoverflow.com/questions/10520048/calculate-md5-checksum-for-a-file) – Liam Apr 28 '15 at 14:15
  • 1
    Looks like the bug I mentioned is still there, so it could be an issue. I'm not sure converting binary data to Unicode is a good idea. – Kobi Apr 28 '15 at 14:25

4 Answers4

8

First of all, you should be aware that you are using GetHashCode incorrectly, for two reasons:

  1. Hash codes are not unique, there are merely very well distributed. There are a finite number of hash codes and an infinite number of binary strings, so it is physically impossible to generate a unique hash code per string.

  2. The details of the hash code algorithm are explicitly not documented, and will change for reasons that seem irrelevant to you. In particular, this is not the first time I've seen it reported that string.GetHashCode() changes behavior when running under a debugger:

string.GetHashCode() returns different values in debug vs release, how do I avoid this?


Having said that, it seems a bit unusual that three different binary strings would hash differently in the same run-time environment just depending on having a debugger attached. Other than generally not trusting GetHashCode as you are, my next guess is that you're not hashing what you think you're hashing. I would dump the binary data itself to disk before hashing it, and confirm that you really do have different binary strings.

Community
  • 1
  • 1
Michael Edenfield
  • 27,188
  • 4
  • 77
  • 114
  • 1
    I've done the dump and yes they are different. The getHashCode usage is legacy code. I guess I might update with SHA or MD5 as has been suggested. – Phil Apr 28 '15 at 14:16
2

Documentation explicitly calls this out. Don't rely on String.GetHashCode to be unique. Your assumption is wrong.

If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code.

Sriram Sakthivel
  • 67,773
  • 7
  • 96
  • 172
2

Instead of GetHashCode which is definitely not going to be unique across all images. Use MD5 or similar as per this link:

https://msdn.microsoft.com/en-us/library/s02tk69a%28v=vs.110%29.aspx

jamespconnor
  • 1,264
  • 9
  • 27
1

Using GetHasCode to check for uniqueness will never work, there is no guarantee that every different object will give a different hash code.

Jamiec
  • 118,012
  • 12
  • 125
  • 175