0

I am trying to develop a poor man's NBody simulation and I'd like to speed up by interaction terms by only considering my nearest neighbors. It seems like the simplest way to do this is to bin all the point coordinates to a coarse grid and then loop over points that are contained in the nearest neighboring grid points.

Initially, I was going to do a quadTree for querying nearest neighbors, but the data is fairly dynamic from iteration to iteration and I expect to process simulations that have over a million coordinate points, so it seemed like a grid-based solution was a better place for me to start.

I currently have the following working code:

import numpy as np
cimport numpy as np
from cython.view cimport array as cvarray
cimport cython
from libc.math cimport sinh, cosh, sin, cos, acos, exp, sqrt, fabs, M_PI, floor
from collections import defaultdict

DTYPE = np.float64

ctypedef np.float64_t DTYPE_t
cdef DTYPE_t pi = 3.141592653589793


# Define Grid Structure
@cython.cdivision(True)
@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.wraparound(False)  # turn off negative index wrapping for entire function
def gridStructure(DTYPE_t[:,:,:] X, Py_ssize_t ref, int xSize, int ySize):
    Grid = defaultdict(int)
    cdef Py_ssize_t ii,jj
    cdef Py_ssize_t N = X.shape[1]
    cdef DTYPE_t[:] maxes = np.max(X[ref,:,:],axis=0)
    cdef DTYPE_t[:] mines = np.min(X[ref,:,:],axis=0)
    cdef Py_ssize_t pointX, pointY, index
    for ii in range(0,N):
        pointX = int(floor((X[ref,ii,0]-mines[0])/maxes[0]*xSize))
        pointY = int(floor((X[ref,ii,1]-mines[1])/maxes[1]*ySize))
        index = pointX + xSize*pointY
        if index in Grid.keys():
            Grid[index].append(ii)
        else:
            Grid[index] = [ii]
    return Grid

I'm using python's default dict structure to store the key (an integer grid index) and a set of integers that point to the coordinates in a memory view. It's safe to assume that the grid is sparse; the coordinates typically have a heterogeneous distribution. I think my structure is simple enough that I should use a C based structure instead of the dict so I can get away from the python wrappers.

Eventually, this function will not be directly called by the user, so I don't need any python interactions at this level.

Currently, the code can bin 1000000 points to a 256x256 grid in about a second, is it possible to do this operation faster for a potential real time poor man's simulation? Any advice for a proper data structure to speed up this function would be greatly appreciated. I basically want to return some data structure where I can call an integer based key and return a set of points associated with that key. Thanks!

1 Answers1

0

OK, I got a better solution. I found this post here: Efficient (and well explained) implementation of a Quadtree for 2D collision detection

Which talks about how to do the logic on a grid at the beginning of his second answer. Since the grid has a fixed size and the number of elements in my sim are fixed, I get 20-fold speed up if I just pre-allocate the memory. Here is the new grid function:

import numpy as np
cimport numpy as np

cimport cython
from libc.math cimport floor

DTYPE = np.float64
ctypedef np.float64_t DTYPE_t

@cython.cdivision(True)
@cython.boundscheck(False)
@cython.wraparound(False)
def gridStructure(DTYPE_t[:,:,:] X, np.int_t ref, np.int_t xSize, np.int_t[:] grid, np.int_t[:] eList):
    cdef Py_ssize_t N = X.shape[1]
    cdef np.int_t[:] lastEl = np.zeros(xSize*xSize, dtype=np.int)
    cdef DTYPE_t[:] maxes = np.max(X[ref,:,:],axis=0)
    cdef DTYPE_t[:] mines = np.min(X[ref,:,:],axis=0)
    cdef DTYPE_t widthX = max(maxes[0]-mines[0],maxes[1]-mines[1])
    cdef DTYPE_t ratio = xSize/widthX
    cdef np.int_t index,pointX,pointY
    cdef np.int_t ii,zz

    for ii in range(0,N):
        pointX = int(floor((X[ref,ii,0]-mines[0])*ratio*0.99))
        pointY = int(floor((X[ref,ii,1]-mines[1])*ratio*0.99))
        index = int(pointX + xSize*pointY)
        if grid[index] == -1:
            grid[index] = ii
            lastEl[index] = ii
        else:
            zz = lastEl[index]
            eList[zz] = ii
            lastEl[index] = ii

    return 0

1,000,000 points in a 256x256 grid in 50 msec. I can live with that for now.

The code assumes the elements for the grid and the element list (eList) are populated with -1s upon input.

EDIT: I deleted the original answer because the previous solution I posted was wrong. I had to add an extra memory buffer to store the last pointer location for each index.