Questions tagged [knn]

In pattern recognition, k-nearest neighbors (k-NN) is a classification algorithm used to classify example based on a set of already classified examples. Algorithm: A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. If K = 1, then the case is simply assigned to the class of its nearest neighbor.

The idea of the k-nearest neighbors (k-NN) algorithm is using the features of an example - which are known, to determine the classification of it - which is unknown.

First, some classified samples are supplied to the algorithm. When a new non-classified sample is given, the algorithm finds the k-nearest neighbors to the new sample, and determines what should its classification be, according to the classification of the classified samples, which were given as training set.

The algorithm is sometimes called lazy classification because during "learning" it does nothing - just stores the samples, and all the work is done during classification.

Algorithm

The k-NN algorithm is among the simplest of all machine learning algorithms. A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data.

The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples.

In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point.

A commonly used distance metric for continuous variables is Euclidean distance. For discrete variables, such as for text classification, another metric can be used, such as the overlap metric (or Hamming distance).

Often, the classification accuracy of k-NN can be improved significantly if the distance metric is learned with specialized algorithms such as Large Margin Nearest Neighbor or Neighbourhood components analysis.

Useful links

1478 questions
13
votes
3 answers

How to sample large database and implement K-means and K-nn in R?

I'm a new user to R, trying to move away from SAS. I'm asking this question here as I'm feeling a bit frustrated with all the packages and sources available for R, and I cant seem to get this working mainly due to data size. I have the following: A…
erichfw
  • 334
  • 2
  • 15
12
votes
4 answers

Efficient method for finding KNN of all nodes in a KD-Tree

I'm currently attempting to find K Nearest Neighbor of all nodes of a balanced KD-Tree (with K=2). My implementation is a variation of the code from the Wikipedia article and it's decently fast to find KNN of any node O(log N). The problem lies…
St. John Johnson
  • 6,342
  • 7
  • 32
  • 54
12
votes
3 answers

Error with knn function

I try to run this line : knn(mydades.training[,-7],mydades.test[,-7],mydades.training[,7],k=5) but i always get this error : Error in knn(mydades.training[, -7], mydades.test[, -7], mydades.training[, : NA/NaN/Inf in foreign function call (arg…
user2443456
  • 179
  • 1
  • 1
  • 7
11
votes
3 answers

Using Nearest Neighbour Algorithm for image pattern recognition

So I want to be able to recognise patterns in images (such as a number 4), I have been reading about different algorithms and I would really like to use the Nearest Neighbour algorithm, it looks simple and I do understand it based on this…
user293895
  • 1,221
  • 2
  • 18
  • 31
11
votes
2 answers

Setting feature weights for KNN

I am working with sklearn's implementation of KNN. While my input data has about 20 features, I believe some of the features are more important than others. Is there a way to: set the feature weights for each feature when "training" the KNN…
user2976570
  • 125
  • 1
  • 6
11
votes
2 answers

How to use opencv flann::Index?

I have some problems with opencv flann::Index - I'm creating index Mat samples = Mat::zeros(vfv_net_quie.size(),24,CV_32F); for (int i =0; i < vfv_net_quie.size();i++) { for (int j = 0;j<24;j++) { …
USeTi
  • 115
  • 1
  • 1
  • 7
9
votes
2 answers

Getting the decision boundary for KNN classifier using R

I am trying to fit a KNN model and obtain a decision boundary using Auto data set in ISLR package in R. Here I am having a difficulty to identify the decision boundary for a 3 class problem. This is my code so far.I am not getting the decision…
student_R123
  • 702
  • 6
  • 20
9
votes
3 answers

Faster kNN algorithm in Python

I want to code my own kNN algorithm from scratch, the reason is that I need to weight the features. The problem is that my program is still really slow despite removing for loops and using built in numpy functionality. Can anyone suggest a way to…
9
votes
1 answer

Increasing n_jobs has no effect on GridSearchCV

I have setup simple experiment to check importance of the multi core CPU while running sklearn GridSearchCV with KNeighborsClassifier. The results I got are surprising to me and I wonder if I misunderstood the benefits of multi cores or maybe I…
Kocur4d
  • 5,888
  • 6
  • 26
  • 45
9
votes
3 answers

How to use both binary and continuous features in the k-Nearest-Neighbor algorithm?

My feature vector has both continuous (or widely ranging) and binary components. If I simply use Euclidean distance, the continuous components will have a much greater impact: Representing symmetric vs. asymmetric as 0 and 1 and some less important…
John Hall
  • 93
  • 3
9
votes
2 answers

Grid Search parameter and cross-validated data set in KNN classifier in Scikit-learn

I'm trying to perform my first KNN Classifier using SciKit-Learn. I've been following the User Guide and other online examples but there are a few things I am unsure about. For this post lets use the following X = data Y = target 1) In most…
browser
  • 313
  • 1
  • 3
  • 11
9
votes
3 answers

How do I traverse a KDTree to find k nearest neighbors?

This question concerns the implementation of KNN searching of KDTrees. Traversal of a KDTree to find a single best match (nearest neighbor) is straightforward, akin to a modified binary search. How is the traversal modified to exhaustively and…
user2647513
  • 524
  • 3
  • 12
9
votes
3 answers

How to convert distance into probability?

Сan anyone shine a light to my matlab program? I have data from two sensors and i'm doing a kNN classification for each of them separately. In both cases training set looks like a set of vectors of 42 rows total, like this: [44 12 53 29 35 30 49; …
9
votes
2 answers

Is k-d tree efficient for kNN search. k nearest neighbors search

I have to implement k nearest neighbors search for 10 dimensional data in kd-tree. But problem is that my algorithm is very fast for k=1, but as much as 2000x slower for k>1 (k=2,5,10,20,100) Is this normal for kd trees, or am I doing something…
Andraz
  • 699
  • 2
  • 7
  • 16
9
votes
3 answers

K-nearest neighbour C/C++ implementation

Where can I find an serial C/C++ implementation of the k-nearest neighbour algorithm? Do you know of any library that has this? I have found openCV but the implementation is already parallel. I want to start from a serial implementation and…
alexsardan
  • 253
  • 1
  • 3
  • 7
1
2
3
98 99