66

I am trying to create a quick hashcode function for a complex number class (a + b) in C#.

I have seen repeatedly the a.GetHashcode()^b.GetHashCode() method. But this will give the same hashcode for (a,b) and (b,a).

Are there any standard algorithm to do this and are there any functions in the .Net framework to help?

Masoud
  • 7,580
  • 9
  • 51
  • 108
JDunkerley
  • 11,869
  • 5
  • 39
  • 45
  • http://stackoverflow.com/questions/682438/hash-function-providing-unique-uint-from-an-integer-coordinate-pair/682617#682617 – Pete Kirkham May 22 '09 at 11:42

6 Answers6

89

My normal way of creating a hashcode for an arbitrary set of hashable items:

int hash = 23;
hash = hash * 31 + item1Hash;
hash = hash * 31 + item2Hash;
hash = hash * 31 + item3Hash;
hash = hash * 31 + item4Hash;
hash = hash * 31 + item5Hash;
// etc

In your case item1Hash could just be a, and item2Hash could just be b.

The values of 23 and 31 are relatively unimportant, so long as they're primes (or at least coprime).

Obviously there will still be collisions, but you don't run into the normal nasty problems of:

hash(a, a) == hash(b, b)
hash(a, b) == hash(b, a)

If you know more about what the real values of a and b are likely to be you can probably do better, but this is a good initial implementation which is easy to remember and implement. Note that if there's any chance that you'll build the assembly with "check for arithmetic overflow/underflow" ticked, you should put it all in an unchecked block. (Overflow is fine for this algorithm.)

Jon Skeet
  • 1,261,211
  • 792
  • 8,724
  • 8,929
  • "so long as they're primes (or at least coprime)" - I'd have thought that the initial state can be anything (if multiples of 31 are bad, then what happens when you hit such a value by chance part way through the calculation?). Then the multiplier just needs to be odd (to avoid early values having no effect on the low bits of the result). And not 1, to avoid commutativity. Am I completely missing the point? – Steve Jessop May 21 '09 at 12:31
  • I wouldn't like to say I've followed the logic for why the numbers should be co-prime - but that's the advice I've always seen given for this pattern by people wiser than myself (such as Josh Bloch). – Jon Skeet May 21 '09 at 12:45
  • Can't blame you for doing what your boss's boss's boss tells you ;-) Here's him writing a hashCode method with initial state 0, although only as part of an example illustrating something completely different: http://209.85.229.132/search?q=cache:3H-Bb8E4sDEJ:developers.sun.com/learning/javaoneonline/2007/pdf/TS-2689.pdf+josh+bloch+hashcode+coprime&cd=3&hl=en&ct=clnk&gl=uk&client=firefox-a. Maybe I should just read Effective Java... – Steve Jessop May 22 '09 at 10:54
  • 2
    For the OP's specific case of 2 integer values, this is only an optimal solution if all values are equally probable. If the values are skewed toward, say, smaller numbers (e.g. autogenerated primary key values), this solution is more likely to produce collisions than the answer from @Noldorin. – Eric J. Aug 07 '12 at 21:13
  • Why not starting with `int hash = 713 + item1Hash;` instead since you will always be multiplying 23 with 31 ? – Winter Apr 12 '17 at 14:30
  • @Winter: For simplicity of reading - it keeps everything consistent. – Jon Skeet Apr 12 '17 at 15:13
  • 1
    I think it shouldn't be the best answer because when hashing 3 ints vector (x,y,z) it will collide very soon (0, 1, 0) and (0, 0, 31) have the same hash 685224 – Herrgott Aug 13 '18 at 16:11
  • @Herrgott: Note:L "If you know more about what the real values of a and b are likely to be you can probably do better". Your example falls into that situation IMO. – Jon Skeet Aug 13 '18 at 17:16
  • I stumbled upon this while looking up how to do this very thing. I then saw one of those VS Quick Tips that was actually helpful and pointed me to the fact that System.HashCode.Combine now exists for .Net Core 2.1 and up. It looks to be overloaded to take up to 7 generic parameters and combines them for you. – Scott Nov 24 '20 at 15:44
15

Here's a possible approach that takes into account order. (The second method is defined as an extension method.)

public int GetHashCode()
{
    return a.GetHashcode() ^ b.GetHashcode().RotateLeft(16);
}

public static uint RotateLeft(this uint value, int count)
{
    return (value << count) | (value >> (32 - count))
}

It would certainly be interesting to see how the Complex class of .NET 4.0 does it.

Noldorin
  • 134,265
  • 53
  • 250
  • 293
  • 2
    This is the best answer if the integer values are skewed, e.g. if they tend to be on the small side because they are autogenerated primary keys in a database. The call to a.GetHashCoce() and b.GetHashCode() is not necessary as it will just return the value of a and b respectively (I believe this is a current implementation detail rather than documented behavior). – Eric J. Aug 07 '12 at 21:14
  • 1
    The calls to GetHashCode() certainly are needed if a and b are anything other than `int` (such as `uint`) because of the return type of GetHashCode() on the containing class. – Neo Nov 30 '12 at 02:27
  • Nice solution; I like the fact that this keeps everything ordered and predictable. This seems like a 'correct and complete' solution. One comment though: the extension method didn't work, in the sense that the compiler wasn't smart enough to coerce the `int` value of `GetHashCode()` to a `uint`. I didn't want to do the edits to your code, cuz it seemed like it would just add noise. – Paul d'Aoust Apr 28 '14 at 16:08
  • 1
    in case you are interested, here is the source code of [`Complex.GetHashCode()`](http://referencesource.microsoft.com/#System.Numerics/System/Numerics/Complex.cs,8d39d13a80a9cb18) – Default Apr 16 '17 at 19:05
11

One standard way is this:

hashcode = 23
hashcode = (hashcode * 37) + v1
hashcode = (hashcode * 37) + v2

23 and 37 are coprime, but you can use other numbers as well.

Lasse V. Karlsen
  • 350,178
  • 94
  • 582
  • 779
  • This algorithm is simple to implement, but beware that you can easily have collisions with it: (v1=2, v2=1) will collide with (v1=1, v2=38) for instance – Eino Gourdin Oct 01 '20 at 15:52
5

@JonSkeet gives a fair, general-purpose algorithm for computing a hash code from n hash codes but assumes you already know which members of an object need to be hash, know what to do about null members, and ommits an implementation for n arbitrary items. So we expand upon his answer:

  1. Only public, immutable properties and fields should contribute to an objects hash code. They should be public (or isomorphic to the public) since we should be able to count on two objects with the same visible surface having the same hash code (hinting towards relationship between object equality and hash code equality), and they should be immutable since an object's hash code should never change in its life time (since then you might end up with an object in the wrong slot of a hash table!).
  2. null members should hash as a constant, such as 0
  3. @JonSkeet's algorithm is a text-book example for applying the functional programming higher-order function usually called fold (Aggregate in C# LINQ), where 23 is our seed and <hash accumulator> * 31 + <current item hash> is our folding function:

In F#

let computeHashCode items =
    items
    |> Seq.map (fun item -> if item = null then 0 else item.GetHashCode())
    |> Seq.fold (fun hash itemHash -> hash * 31 + itemHash) 23

In C#

Func<IEnumerable<Object>, int> computeHashCode = items =>
    items
    .Select(item => item == null ? 0 : item.GetHashCode())
    .Aggregate(23, (hash, itemHash) => hash * 31 + itemHash);
Stephen Swensen
  • 21,731
  • 9
  • 76
  • 126
5

What about this:

(a.GetHashcode() + b).GetHashcode()

Gives you a different code for (a,b) and (b,a) plus it's not really that fancy.

Welbog
  • 55,647
  • 8
  • 105
  • 119
1

All that depends on what you're trying to achieve. If hashes are meant for hash structures like Dictionary, then you have to balance collision rate and speed of hashing. To have a perfect hash without collision at all it will be more time consuming. Similarly the fastest hashing algorithm will have more collisions relatively. Finding the perfect balance is the key here. Also you should take into consideration how large your effective hash can be, and if hashing should be reversible! Noldorin's approach gives you perfect hash (read no collision) if your real and imaginary parts of your complex number are always positive. This will do even for negative numbers if you're ok with the rare collisions. But I'm concerned over the range of values it can yield, quite big for my taste.

If you're after perfect hashes (out of some academic/research interests) that should work even for negative numbers, you can see this solution (and an array of other solutions in the same thread). In my tests, it is faster and utilizes space better than any other I have seen.

Community
  • 1
  • 1
nawfal
  • 62,042
  • 48
  • 302
  • 339