39

What are good hashing functions (fast, good distribution, few collisions) for hashing 2d and 3d vectors composed of IEEE 32bit floats. I assume general 3d vectors, but algorithms assuming normals (always in [-1,1]) are also welcome. I also do not fear bit-manipulation as IEEE floats are alsways IEEE floats.

Another more general problem is hashing an Nd float-vector, where N is quite small (3-12) and constant but not known at compile time. At the moment I just take these floats as uints and XOR them together, which is probably not the best solution.

Christian Rau
  • 43,206
  • 10
  • 106
  • 177
  • 2
    ...have you tested how well your hashes are being distributed using the plain XOR method? You might be surprised. – Matti Virkkunen May 08 '11 at 16:33
  • @Matti it seems the distribution at least for 3d vectors is not very bad (tested on Stanford bunny 35k verts against hash table of size 65537). I just thought somebody perhaps has a more specialized solution, as I searched the net some time ago and haven't found anything on the topic. – Christian Rau May 08 '11 at 17:31
  • 65537 sounds like one greater than the number you might want to be using (or is a typo) – Steven Lu Sep 13 '13 at 03:12
  • 1
    Related: [Good way to hash a float vector?](http://stackoverflow.com/questions/650175/good-way-to-hash-a-float-vector) – legends2k Mar 27 '14 at 10:13
  • @StevenLu: absolutely not. ++ a power of two is a good safe way to almost always get a prime number. Which is necessary to avoid modulo correlations, and as such, makes awesome hash table sizing. – v.oddou Nov 20 '14 at 02:35
  • Ah indeed. I forgot about that little rule :) – Steven Lu Nov 20 '14 at 02:55

3 Answers3

46

There's a spatial hash function described in Optimized Spatial Hashing for Collision Detection of Deformable Objects. They use the hash function

hash(x,y,z) = ( x p1 xor y p2 xor z p3) mod n

where p1, p2, p3 are large prime numbers, in our case 73856093, 19349663, 83492791, respectively. The value n is the hash table size.

In the paper, x, y, and z are the discretized coordinates; you could probably also use the binary values of your floats.

Community
  • 1
  • 1
celion
  • 3,546
  • 22
  • 17
  • 18
    Note that 19349663 isn't prime (it's the product of 41 and 471943) – sehe Sep 05 '13 at 13:20
  • 6
    I found that using the prime numbers p1 and p3 for the two-dimensional case results in very good distributions. – axel22 Mar 06 '16 at 21:36
  • 2
    When they wrote `x p1 xor y p2 xor z p3`, did they mean `(x*p1) xor (y*p2) xor (z*p3)` or `x * (p1 xor y) * (p2 xor z) * p3`? – emlai Jun 25 '16 at 15:24
  • 3
    @tuple_cat I believe it's `(x*p1) xor (y*p2) xor (z*p3)` – celion Jun 26 '16 at 14:31
  • Very interesting! Is there any implementation around? I am trying to implement this with scipy/numpy. Thanks. – tuned Apr 23 '17 at 17:29
  • By "discretized coordinates", do you mean integers? I have coordinates in floating point meters. – kenyee Dec 05 '18 at 02:17
10

I have two suggestions.

If you don't do the quantization, it wont be sensitive to closeness(locality).

  • Locality Sensitive Hashing has been mentioned for hashing higher dimensional vectors. Why not use them for 3d or 2d vectors as well? A variant of LSH using adapted for Eucledian distance metric (which is what we need for 2d and 3d vectors) is called Locality Sensitive Hashing using p-stable distributions. A very readable tutorial is here.
koshy george
  • 611
  • 6
  • 23
0

I wrote this in Python based on the comments seen here,

l = 5
n = 5
p1,p2,p3 = 73856093, 19349663, 83492791

x1 = [33,4,11]
x2 = [31,1,14]
x3 = [10,44,19]

def spatial_hash(x):
    ix,iy,iz = np.floor(x[0]/l), np.floor(x[1]/l), np.floor(x[2]/l)
    return (int(ix*p1) ^ int(iy*p2) ^ int(iz*p3)) % n

print (spatial_hash(x1))
print (spatial_hash(x2))
print (spatial_hash(x3))

It gives

1
1
3

It seemed to work.

In C++

#include <cstdlib>
#include <iostream>
#include <unordered_map>
#include <vector>
#include <random>

#include <eigen3/Eigen/Dense>
using namespace Eigen;

using namespace std;
const int HASH_SIZE = 200;    
//const float MAX = 500.0;
const float L = 0.2f;
const float mmin = -1.f;
const float mmax = 1.f;

unordered_map<int, vector<Vector3d>> map ;

inline size_t hasha(Vector3d &p) {
    int ix = (unsigned int)((p[0]+2.f) / L);
    int iy = (unsigned int)((p[1]+2.f) / L);
    int iz = (unsigned int)((p[2]+2.f) / L);
    return (unsigned int)((ix * 73856093) ^ (iy * 19349663) ^ (iz * 83492791)) % HASH_SIZE;
}


int main(int argc, char** argv) {

    std::default_random_engine generator;
    std::uniform_real_distribution<double> distribution(-1.0,1.0);

    
    for(size_t i=0;i<300;i++){
    float x = distribution(generator);
    float y = distribution(generator);
    float z = distribution(generator);
        Vector3d v(x,y,z);
        std::cout << hasha(v)  << " " << v[0] << " " << v[1] << " " << v[2] << std::endl;
    map[hasha(v)].push_back(v);
    vector<Vector3d> entry = map[hasha(v)];
    std::cout << "size " << entry.size() << std::endl;
    }

    for (const auto & [ key, value ] : map) {
    cout << key << std::endl;
    vector<Vector3d> v = map[key];
    float average = 0.0f;
    for (int i=0; i<v.size(); i++){
        for (int j=0; j<v.size(); j++){
        if (i!=j){
            Vector3d v1 = v[i];
            Vector3d v2 = v[j];
            std::cout << "   dist " <<  (v1-v2).norm() << std::endl;
        }
        } 
    }

    }
    

}
BBSysDyn
  • 3,799
  • 8
  • 42
  • 57