13

I'm looking for a well-optimized function that accepts an n X n distance matrix and returns an n X k matrix with the indices of the k nearest neighbors of the ith datapoint in the ith row.

I find a gazillion different R packages that let you do KNN, but they all seem to include the distance computations along with the sorting algorithm within the same function. In particular, for most routines the main argument is the original data matrix, not a distance matrix. In my case, I'm using a nonstandard distance on mixed variable types, so I need to separate the sorting problem from the distance computations.

This is not exactly a daunting problem -- I obviously could just use the order function inside a loop to get what I want (see my solution below), but this is far from optimal. For example, the sort function with partial = 1:k when k is small (less than 11) goes much faster, but unfortunately returns only sorted values rather than the desired indices.

zkurtz
  • 2,880
  • 5
  • 22
  • 57
  • `library(class)` has a `knn` function. Maybe check that out. It's a classification package. – Rich Scriven May 03 '14 at 21:25
  • yes, I was looking at `class::knn`. Like the others, it takes raw data and applied the Euclidean distance. I don't see a way to give it a distance matrix directly. – zkurtz May 03 '14 at 21:37
  • Take a look [here](http://davetang.org/muse/2013/08/15/distance-matrix-computation/). `straight_distance – Rich Scriven May 03 '14 at 21:41

2 Answers2

7

Try to use FastKNN CRAN package (although it is not well documented). It offers k.nearest.neighbors function where an arbitrary distance matrix can be given. Below you have an example that computes the matrix you need.

# arbitrary data
train <- matrix(sample(c("a","b","c"),12,replace=TRUE), ncol=2) # n x 2
n = dim(train)[1]
distMatrix <- matrix(runif(n^2,0,1),ncol=n) # n x n

# matrix of neighbours
k=3
nn = matrix(0,n,k) # n x k
for (i in 1:n)
   nn[i,] = k.nearest.neighbors(i, distMatrix, k = k)

Notice: You can always check Cran packages list for Ctrl+F='knn' related functions: https://cran.r-project.org/web/packages/available_packages_by_name.html

hanna
  • 492
  • 7
  • 14
0

For the record (I won't mark this as the answer), here is a quick-and-dirty solution. Suppose sd.dist is the special distance matrix. Suppose k.for.nn is the number of nearest neighbors.

n = nrow(sd.dist)
knn.mat = matrix(0, ncol = k.for.nn, nrow = n)
knd.mat = knn.mat
for(i in 1:n){
  knn.mat[i,] = order(sd.dist[i,])[1:k.for.nn]
  knd.mat[i,] = sd.dist[i,knn.mat[i,]]
}

Now knn.mat is the matrix with the indices of the k nearest neighbors in each row, and for convenience knd.mat stores the corresponding distances.

zkurtz
  • 2,880
  • 5
  • 22
  • 57
  • 1
    This considers that the nearest neighbor of a point is the point itself, since you don't remove the point in the `order` function. – pedrostrusso Apr 20 '18 at 16:24