1

The problem should be straightforward, but I'm lost anyways...
I have n samples, and already calculated a distance matrix (b.c. I do not want to use euclidean distance and couldn't find a way to specify another distance measure for for example the knn() function).
I then found (knn_1,knn_2) and used them to get the nearest neighbors from the distance matrix (As far as I can tell it's just ordering by rows).
Now, I do not know any clusters in the beginning,and do not need to insert any new data points afterwards.
Basically my question is, how do I initialize the clusters.

An example to illustrate my problem: Let's assume our nearest neighbors (k=2, n = 4) are as follows:

i = 1: 2,3  
i = 2: 3,4  
i = 3: 1,3  
i = 4: 1,2  

How would you find the clusters?
Ideas I had: start with assigning i =1 to cluster 1, and then subsequently assign its nearest neighbors (2,3) to it. But based on that logic, in the end everything would be in this one cluster, because it just propagates.
So, next idea: Start by assigning k elements to k cluster. I.e. assign i = 1 to cluster 1, i = 2 to cluster 2 and i = 3 to cluster 3. But what justification would I have for that? It would make sense for k-means clustering, but not to KNN...
Add each element to its own clusters and subsequently merge them. Sounds good, but don't know how to do that...

If you know of any R-packages that do KNN clustering based on a distance matrix, that's exactly what I am looking for! I have looked into the FastKNN, the class, the proxy and the philentropy (latter two to calculate distances) but haven't found anything so far.

Thanks so much!

fußballball
  • 169
  • 12
  • Not sure if they handle distance matrix but might want to look at class, kernlab and e1071 packages. – screechOwl Oct 04 '17 at 17:44
  • Possible duplicate of [Find K nearest neighbors, starting from a distance matrix](https://stackoverflow.com/questions/23449726/find-k-nearest-neighbors-starting-from-a-distance-matrix) – EDi Oct 04 '17 at 17:49
  • @EDi that's actually the post I linked as knn_1 in my question, so I had already seen it. It doesn't adress my question though. But thanks anyways! – fußballball Oct 04 '17 at 19:02
  • @screechOwl I had already seen class (see my post), and while it has a knn algorithm, I can't pass it a distance matrix. I checked kernlab and e1071 (thanks for the suggestions) but they don't have k-nearest neighbor algorithms. Thanks both of you for the suggestions though! – fußballball Oct 04 '17 at 19:02
  • 1
    What is "kNN clustering"? As far as I can tell, it's a "typo" for either k-means (which does not use kNN) or for kNN classification (which is not clustering). Do you have any reference? – Has QUIT--Anony-Mousse Oct 04 '17 at 22:13
  • @Anony-Mousse As far as I can tell kNN is just the general algorithm that finds the k Nearest Neighbors. After running kNN I get a matrix of kNearest neighbors. All I want to do is find clusters based on that resulting n(number of observations)xk (number of NN) matrix. It's neither kNN classification (as I do not have an output variable) nor k-Means clustering (as I do not use the k-Means algorithm). – fußballball Oct 05 '17 at 13:35
  • 1
    Yes. But how would you do that? I don't know an algorithm that would cluster based on a kNN x N matrix. I only see people misname kmeans "kNN". – Has QUIT--Anony-Mousse Oct 05 '17 at 13:39

0 Answers0