As stated by others, your latter implementation of .equals()
violates its contract. You simply cannot implement it that way. And if you stop to think it, it makes sense, since your implementation of .equals()
is not meant to return true
when two objects are actually equal, but when they are similar enough. But similar enough is not the same as equal, neither in Java nor anywhere else.
Check .equals()
javadocs and you'll see that any object that implements it must adhere to its contract:
The equals method implements an equivalence relation on non-null object references:
It is reflexive: for any non-null reference value x, x.equals(x) should return true.
It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
For any non-null reference value x, x.equals(null) should return false.
Your implementation of .equals()
does not fulfill this contract:
- Depending on your implementation of
double similarity(Car car1, Car car2)
, it might not be symmetric
- It's clearly not transitive (well explained in previous answers)
- It might not be consistent:
Consider an example slightly different than the one you gave in a comment:
'cobalt' would be equal to 'blue' while 'red' would be different to 'blue'
If you used some external source to calculate the similarity, such as a dictionary, and if one day 'cobalt' wasn't found as an entry, you might return a similarity near to 0.0, so the cars wouldn't be equal. However, the following day you realize that 'cobalt' is a special kind of 'blue', so you add it to the dictionary and this time, when you compare the same two cars, similarity is very high (or near 1.0), so they're equal. This would be an inconsistency. I don't know how your similarity function works, but if it depends on anything different than the data contained in the two objects you're comparing, you might be violating .equals()
consistency constraint as well.
Regarding using a TreeMap<Car, Whatever>
, I don't see how it could be of any help. From TreeMap
javadocs:
...the Map interface is defined in terms of the equals operation, but a sorted map performs all key comparisons using its compareTo (or compare) method, so two keys that are deemed equal by this method are, from the standpoint of the sorted map, equal.
In other words, in a TreeMap<Car, Whatever> map
, map.containsKey(car1)
would return true
iff car1.compareTo(car2)
returned exactly 0
for some car2
that belongs to map
. However, if the comparison didn't return 0
, map.containsKey(car1)
could return false
, despite car1
and car2
were very similar in terms of your similarity function. This is because .compareTo()
is meant to be used for ordering, and not for similarity.
So the key point here is that you can't use a Map
alone to suit your use case, because it's just the wrong structure. Actually, you can't use any Java structure alone that relies on .hashCode()
and .equals()
, because you could never find an object that matches your key.
Now, if you do want to find the car which is most similar to a given car by means of your similarity()
function, I suggest you use Guava's HashBasedTable structure to build a table of similarity coefficients (or whatever other fancy name you like) between every car of your set.
This approach would need Car
to implement .hashCode()
and .equals()
as usual (i.e. not checking just by color, and certainly without invoking your similarity()
function). For instance, you could check by a new plate number Car
's attribute.
The idea is to have a table which stores the similarities between each car, with its diagonal clean, since we already know that a car is similar to itself (actually, it's equal to itself). For example, for the following cars:
Car a = new Car("red", "audi", "plate1");
Car b = new Car("red", "bmw", "plate2");
Car c = new Car("light red", "audi", "plate3");
the table would look like this:
a b c
a ---- 0.60 0.95
b 0.60 ---- 0.45
c 0.95 0.45 ----
For the similarity values, I'm assuming that cars of the same brand and same color family are more similar than cars of same color but different brand, and that cars of different brands and not same color are even less similar.
You might have noticed that the table is symmetric. We could have stored only half the cells if space optimization was needed. However, according to the docs, HashBasedTable
is optimized to be accessed by row key, so let's keep it simple and let further optimizations as an exercise.
The algorithm to find the car which is most similar to a given car could be sketched as follows:
- Retrieve the given car's row
- Return the car which is most similar to the given car within the returned row, i.e. the car of the row with the highest similarity coefficient
Here's some code showing the general ideas:
public class SimilarityTest {
Table<Car, Car, Double> table;
void initialize(Car... cars) {
int size = cars.length - 1; // implicit null check
this.table = HashBasedTable.create(size, size);
for (Car rowCar : cars) {
for (Car columnCar : cars) {
if (!rowCar.equals(columnCar)) { // add only different cars
double similarity = this.similarity(rowCar, columnCar);
this.table.put(rowCar, columnCar, similarity);
}
}
}
}
double similarity(Car car1, Car car2) {
// Place your similarity calculation here
}
Car mostSimilar(Car car) {
Map<Car, Double> row = this.table.row(car);
Map.Entry mostSimilar = Maps.immutableEntry(car, Double.MIN_VALUE);
for (Map.Entry<Car, Double> entry : row.entrySet()) {
double mostSimilarCoefficient = mostSimilar.getValue();
double currentCoefficient = entry.getValue();
if (currentCoefficient > mostSimilarCoefficient) {
mostSimilar = entry;
}
}
return mostSimilar.getKey();
}
public static void main(String... args) {
SimilarityTest test = new SimilarityTest();
Car a = new Car("red", "audi", "plate1");
Car b = new Car("red", "bmw", "plate2");
Car c = new Car("light red", "audi", "plate3");
test.initialize(a, b, c);
Car mostSimilarToA = test.mostSimilar(a);
System.out.println(mostSimilarToA); // should be c
Car mostSimilarToB = test.mostSimilar(b);
System.out.println(mostSimilarToB); // should be a
Car mostSimilarToC = test.mostSimilar(c);
System.out.println(mostSimilarToC); // should be a
}
}
Regarding complexity... Initializing the table takes O(n2), while searching for the most similar car takes O(n). I'm pretty sure this can be improved, i.e. why putting cars in the table that are known to be not similar to each other? (we could only put cars whose similarity coefficient is higher than a given threshold), or, instead of finding the car with the highest similarity coefficient, we could stop the search when we find a car whose similarity coefficient is higher than another given threshold, etc.