Hashing 2D, 3D and nD vectors

Question

What are good hashing functions (fast, good distribution, few collisions) for hashing 2d and 3d vectors composed of IEEE 32bit floats. I assume general 3d vectors, but algorithms assuming normals (always in [-1,1]) are also welcome. I also do not fear bit-manipulation as IEEE floats are alsways IEEE floats.

Another more general problem is hashing an Nd float-vector, where N is quite small (3-12) and constant but not known at compile time. At the moment I just take these floats as uints and XOR them together, which is probably not the best solution.

...have you tested how well your hashes are being distributed using the plain XOR method? You might be surprised. — Matti Virkkunen, May 08 '11 at 16:33
@Matti it seems the distribution at least for 3d vectors is not very bad (tested on Stanford bunny 35k verts against hash table of size 65537). I just thought somebody perhaps has a more specialized solution, as I searched the net some time ago and haven't found anything on the topic. — Christian Rau, May 08 '11 at 17:31
65537 sounds like one greater than the number you might want to be using (or is a typo) — Steven Lu, Sep 13 '13 at 03:12
Related: [Good way to hash a float vector?](http://stackoverflow.com/questions/650175/good-way-to-hash-a-float-vector) — legends2k, Mar 27 '14 at 10:13
@StevenLu: absolutely not. ++ a power of two is a good safe way to almost always get a prime number. Which is necessary to avoid modulo correlations, and as such, makes awesome hash table sizing. — v.oddou, Nov 20 '14 at 02:35

score 46 · Accepted Answer · edited Jun 20 '20 at 09:12

46

There's a spatial hash function described in Optimized Spatial Hashing for Collision Detection of Deformable Objects. They use the hash function

hash(x,y,z) = ( x p1 xor y p2 xor z p3) mod n

where p1, p2, p3 are large prime numbers, in our case 73856093, 19349663, 83492791, respectively. The value n is the hash table size.

In the paper, x, y, and z are the discretized coordinates; you could probably also use the binary values of your floats.

edited Jun 20 '20 at 09:12

Community

1
1

answered May 08 '11 at 18:56

celion

3,546
22
17

18

Note that 19349663 isn't prime (it's the product of 41 and 471943) – sehe Sep 05 '13 at 13:20
6

I found that using the prime numbers p1 and p3 for the two-dimensional case results in very good distributions. – axel22 Mar 06 '16 at 21:36
2

When they wrote `x p1 xor y p2 xor z p3`, did they mean `(x*p1) xor (y*p2) xor (z*p3)` or `x * (p1 xor y) * (p2 xor z) * p3`? – emlai Jun 25 '16 at 15:24
3

@tuple_cat I believe it's `(x*p1) xor (y*p2) xor (z*p3)` – celion Jun 26 '16 at 14:31
Very interesting! Is there any implementation around? I am trying to implement this with scipy/numpy. Thanks. – tuned Apr 23 '17 at 17:29
By "discretized coordinates", do you mean integers? I have coordinates in floating point meters. – kenyee Dec 05 '18 at 02:17

score 10 · Answer 2 · answered May 05 '14 at 23:20

I have two suggestions.

Assume a grid cell of size l and quantize the x, y and z co-ordinates by computing ix = floor(x/l), iy = floor(y/l), and iz = floor(z/l), where ix, iy and iz are integers. Now use the hash function defined in Optimized Spatial Hashing for Collision Detection of Deformable Objects

If you don't do the quantization, it wont be sensitive to closeness(locality).

Locality Sensitive Hashing has been mentioned for hashing higher dimensional vectors. Why not use them for 3d or 2d vectors as well? A variant of LSH using adapted for Eucledian distance metric (which is what we need for 2d and 3d vectors) is called Locality Sensitive Hashing using p-stable distributions. A very readable tutorial is here.

BBSysDyn · Answer 3 · 2020-11-09T13:31:50.350

I wrote this in Python based on the comments seen here,

l = 5
n = 5
p1,p2,p3 = 73856093, 19349663, 83492791

x1 = [33,4,11]
x2 = [31,1,14]
x3 = [10,44,19]

def spatial_hash(x):
    ix,iy,iz = np.floor(x[0]/l), np.floor(x[1]/l), np.floor(x[2]/l)
    return (int(ix*p1) ^ int(iy*p2) ^ int(iz*p3)) % n

print (spatial_hash(x1))
print (spatial_hash(x2))
print (spatial_hash(x3))

It gives

1
1
3

It seemed to work.

In C++

#include <cstdlib>
#include <iostream>
#include <unordered_map>
#include <vector>
#include <random>

#include <eigen3/Eigen/Dense>
using namespace Eigen;

using namespace std;
const int HASH_SIZE = 200;    
//const float MAX = 500.0;
const float L = 0.2f;
const float mmin = -1.f;
const float mmax = 1.f;

unordered_map<int, vector<Vector3d>> map ;

inline size_t hasha(Vector3d &p) {
    int ix = (unsigned int)((p[0]+2.f) / L);
    int iy = (unsigned int)((p[1]+2.f) / L);
    int iz = (unsigned int)((p[2]+2.f) / L);
    return (unsigned int)((ix * 73856093) ^ (iy * 19349663) ^ (iz * 83492791)) % HASH_SIZE;
}


int main(int argc, char** argv) {

    std::default_random_engine generator;
    std::uniform_real_distribution<double> distribution(-1.0,1.0);

    
    for(size_t i=0;i<300;i++){
    float x = distribution(generator);
    float y = distribution(generator);
    float z = distribution(generator);
        Vector3d v(x,y,z);
        std::cout << hasha(v)  << " " << v[0] << " " << v[1] << " " << v[2] << std::endl;
    map[hasha(v)].push_back(v);
    vector<Vector3d> entry = map[hasha(v)];
    std::cout << "size " << entry.size() << std::endl;
    }

    for (const auto & [ key, value ] : map) {
    cout << key << std::endl;
    vector<Vector3d> v = map[key];
    float average = 0.0f;
    for (int i=0; i<v.size(); i++){
        for (int j=0; j<v.size(); j++){
        if (i!=j){
            Vector3d v1 = v[i];
            Vector3d v2 = v[j];
            std::cout << "   dist " <<  (v1-v2).norm() << std::endl;
        }
        } 
    }

    }
    

}

Hashing 2D, 3D and nD vectors

3 Answers3

Linked