27

I have a 2 dimensional array:

MyArray = array([6588252.24, 1933573.3, 212.79, 0, 0],
                [6588253.79, 1933602.89, 212.66, 0, 0],
                 etc...)

The first two elements MyArray[0] and MyArray[1] are the X and Y coordinates of the points.

For every element in the array, I would like to find the quickest way to return its single nearest neighbor in a radius of X units. We are assuming this is in 2D space.

lets say for this example X = 6.

I have solved the problem by comparing every element to every other element, but this takes 15 minutes or so when your list is 22k points long. We hope to eventually run this on lists of about 30million points.

I have read about K-d trees and understand the basic concept, but have had trouble understanding how to script them.

Dlinet
  • 1,093
  • 3
  • 14
  • 22
  • What's a "Kt tree"? You mean "k-d tree"? For two-dimensional points you only need a [quadtree](http://en.wikipedia.org/wiki/Quadtree). There was an earlier question looking for quadtree implementations in Python: http://stackoverflow.com/questions/6060302/pure-python-quadtree-implementation – Mark Reed Oct 16 '12 at 21:30
  • Thank you! I meant a k-d tree. I will look up a quad tree. – Dlinet Oct 16 '12 at 21:33
  • 1
    There's a k-d tree implementation in the [`scipy.spatial`](http://docs.scipy.org/doc/scipy/reference/spatial.html) module – John Vinyard Oct 16 '12 at 21:49
  • Note the cKDTree, its much faster. – seberg Oct 16 '12 at 22:26
  • I have looked both of those up, but can not figure out how to use them. A relevant code example would be much appreciated! – Dlinet Oct 16 '12 at 22:40
  • @Dlinet: Your solution won't give the closest result, but rather itself since the distance to itself is 0! You should instead use k=2 and take the second closest result. – jkflying Dec 14 '12 at 21:00

1 Answers1

34

Thanks to John Vinyard for suggesting scipy. After some good research and testing, here is the solution to this question:

Prerequisites: Install Numpy and SciPy

  1. Import the SciPy and Numpy Modules

  2. Make a copy of the 5 dimensional array including just the X and Y values.

  3. Create an instance of a cKDTree as such:

    YourTreeName = scipy.spatial.cKDTree(YourArray, leafsize=100)
    #Play with the leafsize to get the fastest result for your dataset
    
  4. Query the cKDTree for the Nearest Neighbor within 6 units as such:

    for item in YourArray:
        TheResult = YourTreeName.query(item, k=1, distance_upper_bound=6)
    

    for each item in YourArray, TheResult will be a tuple of the distance between the two points, and the index of the location of the point in YourArray.

River
  • 7,472
  • 11
  • 47
  • 61
Dlinet
  • 1,093
  • 3
  • 14
  • 22
  • How about just the nearest to one particular point, rather than a collection? – Steve Yeago Nov 11 '15 at 01:42
  • @SteveYeago [query_ball_point](http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.spatial.cKDTree.query_ball_point.html#scipy.spatial.cKDTree.query_ball_point) seem to be available for this. – ldavid Jan 24 '16 at 22:08