2

I am storing a lot of objects with geographically positions as 2D points (x,y) in granularity of meters. To represent the world I am using a grid divided in cells of 1 square km. Currently I am using HashMap<Position, Object> for this. Any other map or appropriate data structure is fine, but I the solution works so I am only interested in solving the details.

I have been reading a lot about making good hash functions, specifically for 2D points. So far, no solutions have been really good (rated in terms of as collision-free as possible).

To test some ideas I wrote a very simple java program to generate hash codes for points from an arbitrary number (-1000,-1000) to (1000, 1000) (x1, y1 -> x2,y2) and storing them in a HashSet<Integer> and this is my result:

# java HashTest
4000000 number of unique positions
test1: 3936031 (63969 buckets, 1,60%) collisions using Objects.hash(x,y)
test2: 0 (4000000 buckets, 100,00%) collisions  using (x << 16) + y
test3: 3998000 (2000 buckets, 0,05%) collisions using x
test4: 3924037 (75963 buckets, 1,90%) collisions using x*37 + y
test5: 3996001 (3999 buckets, 0,10%) collisions using x*37 + y*37
test6: 3924224 (75776 buckets, 1,89%) collisions using x*37 ^ y
test7: 3899671 (100329 buckets, 2,51%) collisions using x*37 ^ y*37
test8: 0 (4000000 buckets, 100,00%) collisions using PerfectlyHashThem
test9: 0 (4000000 buckets, 100,00%) collisions using x << 16 | (y & 0xFFFF)

Legend: number of collisions , buckets(collisions), perc(collisions)

Most of these hash functions perform really bad. In fact, the only good solution is the one that shifts x to the first 16 bits of the integer. The limitation, I guess, is that the two most distant points must not be more than the square root of Integer.MAX_INT, i.e. area must be less than 46 340 square km.

This is my test function (just copied for each new hash function):

  public void test1() {

    HashSet<Integer> hashCodes = new HashSet<Integer>();
    int collisions = 0;

    for (int x = -MAX_VALUE; x < MAX_VALUE; ++x) {
      for (int y = -MAX_VALUE; y < MAX_VALUE; ++y) {
        final int hashCode = Objects.hash(x,y);

        if (hashCodes.contains(hashCode))
          collisions++;

        hashCodes.add(hashCode);
      }
    }

    System.console().format("test1: %1$s (%2$s buckets, %3$.2f%%) collisions using Objects.hash(x,y)\n", collisions, buckets(collisions), perc(collisions));
  }

Am I thinking wrong here? Should I fine-tune the primes to get better results?

Edits:

Added more hash functions (test8 and test9). test8 comes from the reponse by @nawfal in Mapping two integers to one, in a unique and deterministic way (converted from short to int).

Community
  • 1
  • 1
qstebom
  • 589
  • 3
  • 12
  • I have tried with prime numbers 71 and 97 too now. Almost a double increase in number of buckets from 37. – qstebom Feb 04 '14 at 09:54
  • Try 31, please. I'd like to see result. – Xabster Feb 04 '14 at 09:54
  • Found this http://stackoverflow.com/questions/919612/mapping-two-integers-to-one-in-a-unique-and-deterministic-way. – qstebom Feb 04 '14 at 09:56
  • For your numbers it should be possible to create 0 collisions. There's 4 billion different coordinates in -1000 to 1000 times -1000 to 1000. I think you're doing something wrong somewhere. Please post code. – Xabster Feb 04 '14 at 09:58
  • 31 results in slightly "worse" than my example where I used 37. For instance: test4: 3936031 (63969 buckets, 1,60%) collisions using x*31 + y – qstebom Feb 04 '14 at 09:58
  • What would a better result be than no collisions at all? – Xabster Feb 04 '14 at 10:15
  • No collisions would of course be the best. I'm just surprised by the number of collisions using some hash functions that seemed to be regarded as appropriate (such as the XOR /w prime). – qstebom Feb 04 '14 at 10:18
  • But you're creating 2000*20000 entries into 76000 hashcodes (1000 or -1000 x 37 plus a bit is the largest and smallest hashcodes). Use my example, it guarantees zero collision. I'll edit it to allow for the ~20000 x 20000 grid. – Xabster Feb 04 '14 at 10:23

2 Answers2

2
public void test1() {

    int MAX_VALUE = 1000;

    HashSet<Integer> hashCodes = new HashSet<Integer>();
    int collisions = 0;

    for (int x = -MAX_VALUE; x < MAX_VALUE; ++x) {
        for (int y = -MAX_VALUE; y < MAX_VALUE; ++y) {
            final int hashCode = ((x+MAX_VALUE)<<16)|((y+MAX_VALUE)&0xFFFF);

            if (hashCodes.contains(hashCode))
                collisions++;

            hashCodes.add(hashCode);
        }
    }

    System.out.println("Collisions: " + collisions + " // Buckets: " +  hashCodes.size());
}

Prints: Collisions: 0 // Buckets: 4000000

Xabster
  • 3,594
  • 11
  • 18
  • Thanks! Added it to the list of tests. It doesn't seem to differ much from test2, more than it truncates y to a short though? – qstebom Feb 04 '14 at 10:19
  • It's probably close to the same, hmm. But what do you want more than no collisions? – Xabster Feb 04 '14 at 10:28
  • I am more interested in knowing if the theory is correct. You can't see anything wrong with it, or how I am testing it? – qstebom Feb 04 '14 at 10:58
  • What theory? Also, you don't say how you calculate your perc(collisions) and buckets(collision) or what they are. I assume buckets(collision) is the 2000x2000 - collisions just given a bad name? – Xabster Feb 04 '14 at 11:49
  • Sorry for being vague. I will update my post with an answer that seems to be perfect for solving this. – qstebom Feb 06 '14 at 06:48
1

I a similar question with the answer being to use a Cantor pairing function. Here: Mapping two integers to one, in a unique and deterministic way.

The Cantor pairing function can be used for negative integers as well, using bijection.

Community
  • 1
  • 1
qstebom
  • 589
  • 3
  • 12