Calculate Distance between numpy arrays

Question

I have two arrays, the first np.array is the points from, and the second np.array are all the distances I need to calculate.

Example:

 import numpy as np
 from_array = np.array([(0,1), (1,1), ..., (x,y)])
 to_array = np.array([(5,1), (3,1), ..., (x,y)])

What I need to do is take the first entry of the from_array and calculate all the distances between from_array[0] to all points in to_array, then keep the maximum distance.

So I can brute for this:

 def get_distances(from_array, to_array):
     results = []
     distances = []
     for pt in from_array:
        for to in to_array:
            results.append(calc_dist(pt, to))
        distances.append(results)
     return distances

But this is slow, I am looking for an optimized way of doing the calculation since I could have thousands of points.

The end goal is to calculate the Hausdorff Distance.

fhd = np.mean(np.min(SomeDistanceArray,axis=0))
rhd = np.mean(np.min(SomeDistanceArray,axis=1))
print (max(fhd, rhd))

I want to use numpy for this task only. My distance can either be euclidean or square euclidean distance.

So what I am looking help for is an optimized method for calculating the euclidean distance methods for two np.arrays. It should be noted that array 1 might have more rows than array 2. Means that the length of the 2D array (x,y) could be comparing 10 row to 30 rows.

I feel like I just answered this [yesterday](http://stackoverflow.com/questions/40969013/theano-row-column-wise-subtraction/40969905#40969905) — Daniel F, Dec 06 '16 at 14:03
Scipy Version 0.19.0 (development version as of Jan2017) has an implementation of ['directed Hausdorff distance'](https://scipy.github.io/devdocs/generated/scipy.spatial.distance.directed_hausdorff.html#scipy-spatial-distance-directed-hausdorff) you may want to look into. It uses an efficient algorithm (see [paper](https://dx.doi.org/10.1109/TPAMI.2015.2408351)) with a close to O(m) runtime (rather than O(m*n)). — weiji14, Jan 10 '17 at 21:00

score 5 · Accepted Answer · edited May 23 '17 at 12:24

Here's a NumPy based approach with np.einsum -

subs = from_array[:,None] - to_array
sq_eucliean_dist = np.einsum('ijk,ijk->ij',subs,subs)
eucliean_dist = np.sqrt(sq_eucliean_dist)

Note: If you are later computing np.mean(np.min(SomeDistanceArray,axis=0)), you can skip the computation for eucliean_dist and directly use sq_eucliean_dist as SomeDistanceArray, because computing square-root would be pretty expensive.

What does np.einsum('ijk,ijk->ij',subs,subs) do? It performs elementwise multiplication between the same array subs, i.e. essentially squaring and then does sum-reduction along the last axis, thus losing it in that reduction process.

So, why not explicitly do the squaring and summing? Well, the benefit with np.einsum is that it does both the squaring and summing in one step, giving us noticeable performance efficiency.

Thus, finally if from_array were (N x 2) array and to_array were (M x 2) array, the output from np.einsum would be the squared euclidean distances as a 2D array of shape (N x M). More info on the string notation itself would involve a longer discussion, some of which could be found in this post and the official documentation link posted earlier.

can you explain the einsum function? I am not familiar with the notation ijk,ikj.>ij — code base 5000, Dec 06 '16 at 15:01

score 1 · Answer 2 · answered Dec 06 '16 at 14:07

1

Numpy only. So no scipy.spatial.distance.cdist

First, don't use tuples, use 2xN and 2xM arrays. Then broadcast.

np.linalg.norm(from_array[:,:,None]-to_array[:,None,:], axis=0)

If you have an ancient version of numpy without a vecorizable linalg.norm (i.e. you're using Abaqus) do this:

np.sum((from_array[:,:,None]-to_array[:,None,:])**2, axis=0).__pow__(0.5)

answered Dec 06 '16 at 14:07

Daniel F

11,893
1
21
50

yes no cdist :(. Does this work for uneven arrays? Say a 30x2 and a 50x2 array? – code base 5000 Dec 06 '16 at 14:14
Yes, just make sure the common dimension is first, so 2x30 and 2x50, so you can broadcast as above – Daniel F Dec 06 '16 at 14:21
So I need to rotate my array from being column X,Y to X,Y as rows: – code base 5000 Dec 06 '16 at 14:49
sorry, I mean, I need to rotate my array? – code base 5000 Dec 06 '16 at 14:56
Transpose can be done by just taking the `T` attribute, e.g. `x.T` is the transpose of `x`. – Mad Physicist Dec 06 '16 at 15:15
why doesn't cdist work? was this just a constraint of the question? – 3pitt Jan 08 '18 at 21:13
@DanielF cannot believe this is not the a accepted answer – DuttaA Sep 11 '18 at 14:12
@DuttaA, `np.einsum` doesn't create a big, memory-hungry intermediate matrix. If you asked me today I'd give Divakar's answer. If I could manage to write it faster than he does. – Daniel F Sep 11 '18 at 16:47

Calculate Distance between numpy arrays

2 Answers2