The problem should be straightforward, but I'm lost anyways...
I have n samples, and already calculated a distance matrix (b.c. I do not want to use euclidean distance and couldn't find a way to specify another distance measure for for example the knn() function).
I then found (knn_1,knn_2) and used them to get the nearest neighbors from the distance matrix (As far as I can tell it's just ordering by rows).
Now, I do not know any clusters in the beginning,and do not need to insert any new data points afterwards.
Basically my question is, how do I initialize the clusters.
An example to illustrate my problem: Let's assume our nearest neighbors (k=2, n = 4) are as follows:
i = 1: 2,3
i = 2: 3,4
i = 3: 1,3
i = 4: 1,2
How would you find the clusters?
Ideas I had: start with assigning i =1 to cluster 1, and then subsequently assign its nearest neighbors (2,3) to it. But based on that logic, in the end everything would be in this one cluster, because it just propagates.
So, next idea: Start by assigning k elements to k cluster. I.e. assign i = 1 to cluster 1, i = 2 to cluster 2 and i = 3 to cluster 3. But what justification would I have for that? It would make sense for k-means clustering, but not to KNN...
Add each element to its own clusters and subsequently merge them. Sounds good, but don't know how to do that...
If you know of any R-packages that do KNN clustering based on a distance matrix, that's exactly what I am looking for! I have looked into the FastKNN, the class, the proxy and the philentropy (latter two to calculate distances) but haven't found anything so far.
Thanks so much!