I am storing a lot of objects with geographically positions as 2D points (x,y) in granularity of meters. To represent the world I am using a grid divided in cells of 1 square km. Currently I am using HashMap<Position, Object>
for this. Any other map or appropriate data structure is fine, but I the solution works so I am only interested in solving the details.
I have been reading a lot about making good hash functions, specifically for 2D points. So far, no solutions have been really good (rated in terms of as collision-free as possible).
To test some ideas I wrote a very simple java program to generate hash codes for points from an arbitrary number (-1000,-1000) to (1000, 1000) (x1, y1 -> x2,y2) and storing them in a HashSet<Integer>
and this is my result:
# java HashTest
4000000 number of unique positions
test1: 3936031 (63969 buckets, 1,60%) collisions using Objects.hash(x,y)
test2: 0 (4000000 buckets, 100,00%) collisions using (x << 16) + y
test3: 3998000 (2000 buckets, 0,05%) collisions using x
test4: 3924037 (75963 buckets, 1,90%) collisions using x*37 + y
test5: 3996001 (3999 buckets, 0,10%) collisions using x*37 + y*37
test6: 3924224 (75776 buckets, 1,89%) collisions using x*37 ^ y
test7: 3899671 (100329 buckets, 2,51%) collisions using x*37 ^ y*37
test8: 0 (4000000 buckets, 100,00%) collisions using PerfectlyHashThem
test9: 0 (4000000 buckets, 100,00%) collisions using x << 16 | (y & 0xFFFF)
Legend: number of collisions , buckets(collisions), perc(collisions)
Most of these hash functions perform really bad. In fact, the only good solution is the one that shifts x to the first 16 bits of the integer. The limitation, I guess, is that the two most distant points must not be more than the square root of Integer.MAX_INT
, i.e. area must be less than 46 340 square km.
This is my test function (just copied for each new hash function):
public void test1() {
HashSet<Integer> hashCodes = new HashSet<Integer>();
int collisions = 0;
for (int x = -MAX_VALUE; x < MAX_VALUE; ++x) {
for (int y = -MAX_VALUE; y < MAX_VALUE; ++y) {
final int hashCode = Objects.hash(x,y);
if (hashCodes.contains(hashCode))
collisions++;
hashCodes.add(hashCode);
}
}
System.console().format("test1: %1$s (%2$s buckets, %3$.2f%%) collisions using Objects.hash(x,y)\n", collisions, buckets(collisions), perc(collisions));
}
Am I thinking wrong here? Should I fine-tune the primes to get better results?
Edits:
Added more hash functions (test8 and test9). test8 comes from the reponse by @nawfal in Mapping two integers to one, in a unique and deterministic way (converted from short to int).