3

I have two arrays, the first np.array is the points from, and the second np.array are all the distances I need to calculate.

Example:

 import numpy as np
 from_array = np.array([(0,1), (1,1), ..., (x,y)])
 to_array = np.array([(5,1), (3,1), ..., (x,y)])

What I need to do is take the first entry of the from_array and calculate all the distances between from_array[0] to all points in to_array, then keep the maximum distance.

So I can brute for this:

 def get_distances(from_array, to_array):
     results = []
     distances = []
     for pt in from_array:
        for to in to_array:
            results.append(calc_dist(pt, to))
        distances.append(results)
     return distances

But this is slow, I am looking for an optimized way of doing the calculation since I could have thousands of points.

The end goal is to calculate the Hausdorff Distance.

fhd = np.mean(np.min(SomeDistanceArray,axis=0))
rhd = np.mean(np.min(SomeDistanceArray,axis=1))
print (max(fhd, rhd))

I want to use numpy for this task only. My distance can either be euclidean or square euclidean distance.

So what I am looking help for is an optimized method for calculating the euclidean distance methods for two np.arrays. It should be noted that array 1 might have more rows than array 2. Means that the length of the 2D array (x,y) could be comparing 10 row to 30 rows.

Divakar
  • 204,109
  • 15
  • 192
  • 292
code base 5000
  • 2,892
  • 8
  • 37
  • 62
  • Look into Scipy's cdist. – Divakar Dec 06 '16 at 13:53
  • 1
    I feel like I just answered this [yesterday](http://stackoverflow.com/questions/40969013/theano-row-column-wise-subtraction/40969905#40969905) – Daniel F Dec 06 '16 at 14:03
  • Scipy Version 0.19.0 (development version as of Jan2017) has an implementation of ['directed Hausdorff distance'](https://scipy.github.io/devdocs/generated/scipy.spatial.distance.directed_hausdorff.html#scipy-spatial-distance-directed-hausdorff) you may want to look into. It uses an efficient algorithm (see [paper](https://dx.doi.org/10.1109/TPAMI.2015.2408351)) with a close to O(m) runtime (rather than O(m*n)). – weiji14 Jan 10 '17 at 21:00

2 Answers2

5

Here's a NumPy based approach with np.einsum -

subs = from_array[:,None] - to_array
sq_eucliean_dist = np.einsum('ijk,ijk->ij',subs,subs)
eucliean_dist = np.sqrt(sq_eucliean_dist)

Note: If you are later computing np.mean(np.min(SomeDistanceArray,axis=0)), you can skip the computation for eucliean_dist and directly use sq_eucliean_dist as SomeDistanceArray, because computing square-root would be pretty expensive.


What does np.einsum('ijk,ijk->ij',subs,subs) do? It performs elementwise multiplication between the same array subs, i.e. essentially squaring and then does sum-reduction along the last axis, thus losing it in that reduction process.

So, why not explicitly do the squaring and summing? Well, the benefit with np.einsum is that it does both the squaring and summing in one step, giving us noticeable performance efficiency.

Thus, finally if from_array were (N x 2) array and to_array were (M x 2) array, the output from np.einsum would be the squared euclidean distances as a 2D array of shape (N x M). More info on the string notation itself would involve a longer discussion, some of which could be found in this post and the official documentation link posted earlier.

Community
  • 1
  • 1
Divakar
  • 204,109
  • 15
  • 192
  • 292
1

Numpy only. So no scipy.spatial.distance.cdist

First, don't use tuples, use 2xN and 2xM arrays. Then broadcast.

np.linalg.norm(from_array[:,:,None]-to_array[:,None,:], axis=0)

If you have an ancient version of numpy without a vecorizable linalg.norm (i.e. you're using Abaqus) do this:

np.sum((from_array[:,:,None]-to_array[:,None,:])**2, axis=0).__pow__(0.5)
Daniel F
  • 11,893
  • 1
  • 21
  • 50