How to pair (x,y) pairs using numpy

Question

I have 2, 2D NumPy arrays consisting of ~300,000 (x,y) pairs each. Given the i^th (x, y) pair in array A, I need to find the corresponding j^th (x,y) pair in array B such that ([x_i - x_j]² + [y_i - y_j]²)^½, the distance between the two (x,y) pairs, is minimized.

What I'm currently doing is argmin like so:

thickness = []
for i in range(len(A)):
    xi = A[i][0]
    yi = A[i][1]
    idx = (np.sqrt(np.power(B[:, 0] - xi, 2) + np.power(B[:, 1] - yi, 2))).argmin()
    thickness.append([(xi + B[idx][0]) / 2, (yi + B[idx][1]) / 2,
                      A[i][2] + B[idx][2]])

Is there a faster way to accomplish this?

score 3 · Accepted Answer · answered Nov 21 '19 at 16:12

We can use Cython-powered kd-tree for quick nearest-neighbor lookup to get nearest neighbour and hence achieve our desired output, like so -

from scipy.spatial import cKDTree

idx = cKDTree(B[:,:2]).query(A[:,:2], k=1)[1]
thickness = [(A[:,0] + B[idx,0]) / 2, (A[:,1] + B[idx,1]) / 2, A[:,2] + B[idx,2]]

Dev Khadka · Answer 2 · 2019-11-21T16:00:59.460

0

you can use numpy broadcast to find the indexes in B with minimum distance like below. With broadcast distance is calculated between each row of A and B

A = np.random.rand(100,3)
B = np.random.rand(90,3)

diff = A[:, np.newaxis, :2] - B[:,:2]
distance = np.sum(diff**2, axis=2)
indx = np.argmin(distance, axis=1)
result = (A+B[indx])/2
result

edited Nov 21 '19 at 16:00

answered Nov 21 '19 at 15:16

Dev Khadka

4,361
3
16
31

@AJ Wildridge what is shape of your A and B array?, it looks like its last index has 3 elements – Dev Khadka Nov 21 '19 at 15:21
A.shape is (294662, 3) and B.shape is (310351, 3), so there is not a 1-to-1 mapping guaranteed. This can also change but will usually be around (300000, 3) – AJ Wildridge Nov 21 '19 at 15:47
1

then the solution above should work, it will not need the 1st dimension to have 1 to one mapping – Dev Khadka Nov 21 '19 at 15:59
It works but is actually slower than mine. I timed mine for A.shape (100, 3) and B.shape(310351, 3) and mine takes 1.39 s ± 22.9 ms whereas this implementation takes 1.86 s ± 13.3 ms – AJ Wildridge Nov 21 '19 at 16:24

Rafael Santana · Answer 3 · 2019-11-25T18:21:59.287

The fastest solution will require you to sort array B by their distance to the origin. You will need to break up array B into 4 hastables for each quadrant. You will need to find the A[Xi-Nx] [Yi-Ny] the index which is closest in distance to to the origin to our point B(X,Y) for each quadrant.Most of the time the closest point in A to our point in B would reside in the same quadrant. For the case in which the closest point is a different quadrant you will need to iterate through point in order of distance to the current quadrant. if you only need to find the distance to one point you could sort array B based on that distance to Initial point in A from smallest to largest. This would mean that the minimun distance will always be at index 0. THis solution will run in N(LOG(N)) complexity which is basically the cost of sorting.

import numpy as np
def quadrant(point): 
    x= point[0]
    y = point [1]
    if (x > 0 and y > 0): 
        return 1

    elif (x < 0 and y > 0): 
        return 2

    elif (x < 0 and y < 0): 
        return 3

    elif (x > 0 and y < 0): 
       return 4



    else: 
       return 0


def calculateDistance(point1,point2):
    #calculates the distance sqr
    x1 = point1[0]  
    y1 = point1[1]
    x2 = point2[0]  
    y2 = point2[1]
    dist = ((x2 - x1)**2 + (y2 - y1)**2)  
    return dist  

A = np.random.rand(10,2)
B = np.random.rand(10,2)

#sort b By distance to the origin and put it in sub_Quadran array -- 1 array per quadran
origin = [0,0]     
q1= {}  
q2= {}  
q3= {}  
q4= {}    


def quadrant(point): 
    x= point[0]
    y = point [1]
    if (x > 0 and y > 0): 
       q1 [calculateDistance(origin,point)] = point


    elif (x < 0 and y > 0): 
        q2 [calculateDistance(origin,point)] = point

    elif (x < 0 and y < 0): 
      q3  [calculateDistance(origin,point)] = point

    elif (x > 0 and y < 0): 
       q4  [calculateDistance(origin,point)] = point



    else: 
       print ("Point is the origin")




def find_near(point):
#for now assume point is on first quadrant
  dist = calculateDistance(point,origin) # get  to origin distance sqr
  # =find the point with closses distance this can be more efficient 

  idx=  min(q1_keys, key=lambda x:abs(x-dist))
  return q1[idx]




for  point in B:
    quadrant(point)


q1_keys = list()
q2_keys = list()
q3_keys = list()
q4_keys = list()
for i in q1.keys():
    q1_keys.append(i)
#Sorting can be done in the step bellow but done here to speedpup implementation
q1_keys.sort()

for i in q2.keys():
    q2_keys.append(i)
#Sorting can be done in the step bellow but done here to speedpup implementation
q2_keys.sort()

for i in q3.keys():
    q3_keys.append(i)
#Sorting can be done in the step bellow but done here to speedpup implementation
q3_keys.sort()

for i in q4.keys():
    q4_keys.append(i)
#Sorting can be done in the step bellow but done here to speedpup implementation
q4_keys.sort()



b_1 = [0.01, 0.01]   

q1_keys.sort() #point sorted as closest to the origin
print (q1)   
print(q1_keys)   
print (find_near(b_1))

` Another solution is to break the array B into multiple sub arrays and find the distance of the input point in A in those B sub_arrays. See algorithm details in the link bellow: https://courses.cs.washington.edu/courses/cse421/11su/slides/05dc.pdf

What is N? Is that the index in B? I need to find the distance to ~300,000 points. Basically I need a mapping of (x,y) pairs in A to (x,y) pairs in B and it doesn't have to be one-to-one. The number of elements in A and B are not the same. If i sort on this new distance metric that means I need to resort ~300,000 times, surely that's not the fastest? — AJ Wildridge, Nov 21 '19 at 15:54
@AJWildridge I don't think you could do any better than N log N for this problem you will need to sort the values each time. An option would be to sort the points in B base in their distance to the origin but that would still require you to check multiple points. There are other solutions that use divide and conquer to break the the arrays into multiple sub_arrays and finding the nearest point in each sub array but that still takes NlogN. Algorithm details in the link Bellow: https://courses.cs.washington.edu/courses/cse421/11su/slides/05dc.pdf — Rafael Santana, Nov 25 '19 at 16:57
Thanks, I'll check it out. I ended up just using Divakar's solution as that did all the pairings in ~0.5s and required 3 lines of code — AJ Wildridge, Nov 25 '19 at 17:26
That's pretty fast. The implementation is hidden inside a library. I was trying to reinvent the wheel. I usually write C++ when speed is needed. In c++ you usually have to write your own implementation. — Rafael Santana, Nov 25 '19 at 18:24
Yeah it's written in C++ and accessed with a python wrapper pretty much so get the best of both worlds. Very cool! — AJ Wildridge, Nov 25 '19 at 18:35

How to pair (x,y) pairs using numpy

3 Answers3