8

or "Why do the Sun/Oracle guys force us to override both equals() and hashCode() everytime?"

Everyone knows that, if you override equals() or hashCode() of an object, you have to override the other one, too, because there is a contract between those two:

Note that it is generally necessary to override the hashCode method whenever this method [i.e. equals()] is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes. -- API doc of Object.equals()

Why isn't it implemented that way in the Object class:

public boolean equals(Object obj) {
    return this.hashCode() == obj.hashCode()
}

If they did this, it would have saved the rest of the world from having to implement both methods. It would be enough to override only hashCode().

I guess the guys had some good reason not to do this. I just cannot see it - please clear this up for me.

Community
  • 1
  • 1
Francois Bourgeois
  • 3,310
  • 5
  • 26
  • 38
  • 3
    Perhaps because of this: if two objects are equal, their hashcodes will be equal, but if two objects' hashcodes are equal, that is not enough information to guarantee that the objects themselves are equal because of this fact: hashes can collide. – Jrop Aug 26 '13 at 14:07
  • 2
    Different objects can still have the same `hashcode()`. There are only `2^32` `int`s. – Daniel Fischer Aug 26 '13 at 14:07
  • 3
    Common logic error: `x -> y` does not mean that `y -> x`. – asteri Aug 26 '13 at 14:10
  • 1
    Incidentally a much shorter version of your code would be `this.hashCode() == obj.hashCode()`. It would also avoid auto-boxing (and it would still be incorrect ;-)) – Joachim Sauer Aug 26 '13 at 14:14
  • @Joachim thx - i changed it – Francois Bourgeois Aug 26 '13 at 14:19

3 Answers3

12

If a.equals(b) returns true then a.hashCode() == b.hashCode() must evaluate to true.

The opposite is not true! It's perfectly valid to have two objects where a.hashCode() == b.hashCode() is true, but a.equals(b) is false.

In fact, that's necessary. There are 232 possible return values for hashCode(). At any given moment a JVM can hold more than 232 objects (given there's enough memory, which is quite possible these days). Assuming that none of the objects are equal to each other (easy to do, just let them be "s1", "s2", ...), then you're bound to have a collision of checksums (see Pidgeonhole principle).

In fact, this is the simplest possible hashCode implementation that's correct (but otherwise terribly bad) for every class*:

public int hashCode() {
  return 0;
}

It magically fulfills all the requirements of the general hashCode() contract.

* except for those classes that have a defined and documented hashCode algorithm that they must implement, the prime example being String.hashCode().

Joachim Sauer
  • 278,207
  • 54
  • 523
  • 586
  • Good point, but the documentation of hashCode() also tells us this: "However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables." But this recommendation is not consistent with the standard implmentation. If your class can have more than 2^32 different instances, you can still override equals(). – Francois Bourgeois Aug 26 '13 at 14:14
  • An example with strings is `"FB"` and `"Ea"`. – arshajii Aug 26 '13 at 14:14
  • 1
    @FrancoisBourgeois: yes, and that's why the implementation I showed above is a *bad idea*. And producing a [perfect hash](https://en.wikipedia.org/wiki/Perfect_hash_function) isn't always easy, even in those cases where it's possible. So *assuming* that it's always implemented, is a pretty risky move. – Joachim Sauer Aug 26 '13 at 14:16
3

Joachim is correct, but there's another reason: Efficiency.

Calculating a hash code can be expensive, and this effort would be unnecessarily incurred if equals() was invoked, but hashCode() never was.

There are lots of cases where this would be the case; only classes like Hashtable (or those that use it) invoke hashCode().

Bohemian
  • 365,064
  • 84
  • 522
  • 658
  • Aye! A `equals()` is usually quicker (especially if the objects are un-equal) than the corresponding `hashCode()` call. – Joachim Sauer Aug 26 '13 at 14:20
1

There is an infinite number of objects which have the same hashcOde. This means you cannot compare the hashCde alone.

A simple example is Long.hashCode(): Every Long value that is a multiple of 1L << 32 + 1 has a hashCode of 0.

Bohemian
  • 365,064
  • 84
  • 522
  • 658
Peter Lawrey
  • 498,481
  • 72
  • 700
  • 1,075