18

Consider the following code:

struct Vec2 : IEquatable<Vec2>
{
    double X,Y;

    public bool Equals(Vec2 other)
    {
        return X.Equals(other.X) && Y.Equals(other.Y);
    }

    public override bool Equals(object obj)
    {
        if (obj is Vec2)
        {
            return Equals((Vec2)obj);
        }
        return false;
    }

    // this will return the same value when X, Y are swapped
    public override int GetHashCode()
    {
        return X.GetHashCode() ^ Y.GetHashCode();
    }

}

Beyond the conversation of comparing doubles for equality (this is just demo code), what I am concerned with is that there is a hash clash when X, Y values are swapped. For example:

Vec2 A = new Vec2() { X=1, Y=5 };
Vec2 B = new Vec2() { X=5, Y=1 };

bool test1 = A.Equals(B);  // returns false;
bool test2 = A.GetHashCode() == B.GetHashCode() // returns true !!!!!

which should wreck havoc in a dictionary collection. So the question is how to property form the GetHashCode() function for 2,3 or even 4 floating point values such that the results are not symmetric and the hashes don't clash.

Edit 1:

Point implements the inappropriate x ^ y solution, and PointF wraps ValueType.GetHashCode().

Rectangle has a very peculiar (((X ^ ((Y << 13) | (Y >> 19))) ^ ((Width << 26) | (Width >> 6))) ^ ((Height << 7) | (Height >> 25))) expression for the hash code, which seems to perform as expected.

Edit 2:

'System.Double' has a nice implementation as it does not consider each bit equally important

public override unsafe int GetHashCode() //from System.Double
{
    double num = this;
    if (num == 0.0)
    {
        return 0;
    }
    long num2 = *((long*) &num);
    return (((int) num2) ^ ((int) (num2 >> 32)));
}
John Alexiou
  • 23,931
  • 6
  • 67
  • 123
  • 4
    It is fine to aim to *minimise* collisions, but your code **must** expect them; they will always happen – Marc Gravell Mar 07 '11 at 15:26
  • Hashes *will* clash - `int` has a smaller range of possible values than `double`, and it's even smaller when compared to `double`x`double`. Also, you might want to consider a looser equality comparison - due to rounding, two float values might be very close to one another (what anyone doing a comparison "by eye" would consider equal), but still won't be exactly equal. – Damien_The_Unbeliever Mar 07 '11 at 15:34
  • 4
    It's not just that (A,B) collides with (B,A). Thanks to the fact that X^X --> 0 all (C,C) collides with all (D,D), and that's a much larger space of collisions. – Eric Lippert Mar 07 '11 at 16:08
  • 1
    possible duplicate of [Create a hashcode of two numbers](http://stackoverflow.com/questions/892618/create-a-hashcode-of-two-numbers) – nawfal Apr 14 '13 at 20:50

5 Answers5

21

Jon skeet has this covered:

What is the best algorithm for an overridden System.Object.GetHashCode?

   public override int GetHashCode()
   {
       unchecked // Overflow is fine, just wrap
       {
           int hash = 17;
           // Suitable nullity checks etc, of course :)
           hash = hash * 23 + X.GetHashCode();
           hash = hash * 23 + Y.GetHashCode();
           return hash;
       }
   }

Also, change your Equals(object) implementation to:

return Equals(obj as FVector2);

Note however that this could perceive a derived type to be equal. If you don't want that, you'd have to compare the runtime type other.GetType() with typeof(FVector2) (and don't forget nullity checks) Thanks for pointing out it's a struct, LukH

Resharper has nice code generation for equality and hash code, so if you have resharper you can let it do its thing

Community
  • 1
  • 1
Ohad Schneider
  • 33,142
  • 10
  • 150
  • 190
  • 2
    The type in question is a struct so the `Equals(object)` implementation can't be changed to use `as`. There's nothing wrong with the OP's current `Equals(object)` method. – LukeH Mar 07 '11 at 15:32
  • This may work well, but if you look into the calculation for `Double.GetHashCode()` you will notice that a more careful consideration of important bits is used, and I would like to follow suit. – John Alexiou Mar 07 '11 at 15:45
  • Thanks for the reply, as I have implemented a version of this for many structures since. – John Alexiou Apr 20 '13 at 18:31
  • Glad to be of assistance ! – Ohad Schneider Apr 20 '13 at 22:15
7

Hash collisions don't wreak havoc in a dictionary collection. They'll reduce the efficiency if you're unlucky enough to get them, but dictionaries have to cope with them.

Collisions should be rare if at all possible, but they're don't mean the implementation is incorrect. XORs are often bad for the reasons you've given (high collisions) - ohadsc has posted a sample I gave before for an alternative, which should be fine.

Note that it would be impossible to implement Vec2 with no collisions - there are only 232 possible return values from GetHashCode, but there are rather more possible X and Y values, even after you've removed NaN and infinite values...

Eric Lippert has a recent blog post on GetHashCode which you may find useful.

Jon Skeet
  • 1,261,211
  • 792
  • 8,724
  • 8,929
  • 1
    Mind you that when the fields are floating point numbers the bits are not dense, but follow certain patterns given reasonable numbers for the structure to hold. Hence the xor operation should suffice if it avoids symmetry in most significant digits. Does this make sense? – John Alexiou Mar 07 '11 at 15:37
  • 1
    @ja72: Possibly. It depends on the real world data. If they tend to actually be integers, it will still be nasty. – Jon Skeet Mar 07 '11 at 15:45
1

What are reasonable bounds for the coordinates?

Unless it can be all possible integer values you could simply:

const SOME_LARGE_NUMBER=100000; return SOME_LARGE_NUMBER * x + y;

KristoferA
  • 11,859
  • 1
  • 34
  • 60
  • This is really the best if you know the bounds. You can use it for `int`s too with `int.MaxValue` with `unchecked` clause – nawfal Apr 14 '13 at 20:48
0

The hash codes approach works for interger coordinates but is not recommended for floating point values. With floating point coordinates one can create a point-set/pool by using a sorted sequence structure.

A sorted sequence is a leaf version balanced binary tree.

Here the keys would be the point coordinates.

mszlazak
  • 61
  • 1
  • 6
  • 1
    Can you show some code explaining what you mentioned. I am having a hard time visualizing it. I understand that floating point precision would make `GetHashCode()` non deterministic, but I do not understand how you propose to get around this. – John Alexiou Jan 08 '15 at 14:36
  • Forget about GetHashCode(), you don't use hashing. Just used a balanced binary tree with an xy-comparator. – mszlazak Jan 08 '15 at 15:31
  • I often need to implement `IEquatable<>` and thus hashing is required to override. Can you please show some code so we can understand what you are talking about. – John Alexiou Jan 08 '15 at 15:43
  • If you have to use GetHashCode then i cannot help you. In my understanding, one has two choices with dictionaries. Either binary search trees as a navigation structure to a list or hashing to entries in a list. Which one chooses is based on the type of keys you are working with. As you said, floating point does not work well when using hashing. Hashing is a less general approach than the one that uses a comparator with a search tree. Search for data structures called "sorted sequences" and look up leaf or exogenous versions of balanced binary trees. – mszlazak Jan 08 '15 at 16:31
  • 1
    Btw you can convert to integers using a snap grid then get hashing without collisions by using one of these functions. http://dmauro.com/post/77011214305/a-hashing-function-for-x-y-z-coordinates – mszlazak Jan 08 '15 at 18:10
  • Here is a thread on which hash function is best for uniqueness and speed but it looks like they tested only strings. Maybe worth looking at. http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed. Ho – mszlazak Jan 14 '15 at 05:31
  • Since keys maybe any set of fixed length bit-strings, they maybe floating point numbers as well. With the minor modifications of complimenting the most significant bit for non-negative floats, complimenting all bits for negative floats, then these numbers can be simply dealt with since their order is preserved when their bit representation is interpreted lexicographically [IEEE 2008] – mszlazak Jan 14 '15 at 06:40
  • 1
    Mapping two integers to one, in a unique and deterministic way. http://stackoverflow.com/questions/919612/mapping-two-integers-to-one-in-a-unique-and-deterministic-way – mszlazak Jan 14 '15 at 07:39
0

If size of your hash code is lesser than size of your struct, then clashes are inevitable anyways.

arrowd
  • 30,130
  • 7
  • 72
  • 96
  • But clashing with (X=1, Y=5) and (X=5, Y=1) for example is unacceptable. A clash with (3.287e308, -7,.228e175) and its symmetric pair is far more acceptable. – John Alexiou Mar 07 '11 at 15:51