Questions tagged [knn]

In pattern recognition, k-nearest neighbors (k-NN) is a classification algorithm used to classify example based on a set of already classified examples. Algorithm: A case is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K nearest neighbors measured by a distance function. If K = 1, then the case is simply assigned to the class of its nearest neighbor.

The idea of the k-nearest neighbors (k-NN) algorithm is using the features of an example - which are known, to determine the classification of it - which is unknown.

First, some classified samples are supplied to the algorithm. When a new non-classified sample is given, the algorithm finds the k-nearest neighbors to the new sample, and determines what should its classification be, according to the classification of the classified samples, which were given as training set.

The algorithm is sometimes called lazy classification because during "learning" it does nothing - just stores the samples, and all the work is done during classification.

Algorithm

The k-NN algorithm is among the simplest of all machine learning algorithms. A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data.

The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples.

In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point.

A commonly used distance metric for continuous variables is Euclidean distance. For discrete variables, such as for text classification, another metric can be used, such as the overlap metric (or Hamming distance).

Often, the classification accuracy of k-NN can be improved significantly if the distance metric is learned with specialized algorithms such as Large Margin Nearest Neighbor or Neighbourhood components analysis.

Useful links

1478 questions
28
votes
3 answers

Sklearn kNN usage with a user defined metric

Currently I'm doing a project which may require using a kNN algorithm to find the top k nearest neighbors for a given point, say P. im using python, sklearn package to do the job, but our predefined metric is not one of those default metrics. so I…
user2926523
  • 413
  • 1
  • 4
  • 8
24
votes
11 answers

Getting TypeError: '(slice(None, None, None), 0)' is an invalid key

Trying to plot the decision Boundary of the k-NN Classifier but is unable to do so getting TypeError: '(slice(None, None, None), 0)' is an invalid key h = .01 # step size in the mesh # Create color maps cmap_light = ListedColormap(['#FFAAAA',…
Unknown
  • 391
  • 1
  • 2
  • 8
23
votes
4 answers

K Nearest-Neighbor Algorithm

Using the KNN-algorithm, say k=5. Now I try to classify an unknown object by getting its 5 nearest neighbours. What to do, if after determining the 4 nearest neighbors, the next 2 (or more) nearest objects have the same distance? Which object of…
Gwaihir
  • 231
  • 2
  • 3
22
votes
1 answer

KNN train() in cv2 with opencv 3.0

I'm trying to run k-nearest neighbours using cv2 (python 2.7) and opencv 3.0. I've replicated the same error message using code like http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_ml/py_knn/py_knn_understanding/py_knn_understanding.html: import…
Brad Foley
  • 323
  • 1
  • 2
  • 5
21
votes
3 answers

Computing sparse pairwise distance matrix in R

I have a NxM matrix and I want to compute the NxN matrix of Euclidean distances between the M points. In my problem, N is about 100,000. As I plan to use this matrix for a k-nearest neighbor algorithm, I only need to keep the k smallest distances,…
Christopher DuBois
  • 38,442
  • 23
  • 68
  • 91
21
votes
4 answers

Missing value imputation in python using KNN

I have a dataset that looks like this 1908 January 5.0 -1.4 1908 February 7.3 1.9 1908 March 6.2 0.3 1908 April NaN 2.1 1908 May NaN 7.7 1908 June 17.7 8.7 1908 July NaN 11.0 1908 August 17.5 …
Clock Slave
  • 6,266
  • 9
  • 55
  • 94
20
votes
2 answers

Using cosine distance with scikit learn KNeighborsClassifier

Is it possible to use something like 1 - cosine similarity with scikit learn's KNeighborsClassifier? This answer says no, but on the documentation for KNeighborsClassifier, it says the metrics mentioned in DistanceMetrics are available. Distance…
Novice
  • 523
  • 1
  • 3
  • 12
19
votes
2 answers

AttributeError: 'Graph' object has no attribute 'node'

I have bellow python code to build knn graph but I have an error: AttributeError: 'Graph' object has no attribute 'node'. It seems that the nx.Graph() has no node attribute but I don't know what should I replace with that. import networkx as nx def…
nino
  • 371
  • 2
  • 3
  • 13
18
votes
1 answer

Finding K-nearest neighbors and its implementation

I am working on classifying simple data using KNN with Euclidean distance. I have seen an example on what I would like to do that is done with the MATLAB knnsearch function as shown below: load fisheriris x =…
Young_DataAnalyst
  • 263
  • 2
  • 4
  • 11
17
votes
3 answers

KNN classification with categorical data

I'm busy working on a project involving k-nearest neighbour regression. I have mixed numerical and categorical fields. The categorical values are ordinal (e.g. bank name, account type). Numerical types are, for e.g. salary and age. There are also…
Graham
  • 381
  • 1
  • 4
  • 12
16
votes
2 answers

OCR algorithm improvement

I'm creating an OCR based on Java. My objective is to extract text from a video file (post-processing). It has been a difficult search, trying to find free, open-source OCR that works purely on Java. I found Tess4J to be the only popular option,…
metsburg
  • 2,012
  • 1
  • 17
  • 32
15
votes
3 answers

Pre-processing before digit recognition with KNN classifier

Right now I'm trying to create digit recognition system using OpenCV. There are many articles and examples in WEB (and even on StackOverflow). I decided to use KNN classifier because this solution is the most popular in WEB. I found a database of…
ArtemStorozhuk
  • 8,542
  • 4
  • 30
  • 52
15
votes
2 answers

PCA and KNN algorithm

I am using KNN to classify handwritten digits. I also now have implemented PCA to reduce the dimensionality. From 256 I went to 200. But I only notice like, ~0.10% loss of information. I deleted 56 dimension. Shouldn't the loss be bigger? Only when…
Test Test
  • 2,611
  • 7
  • 41
  • 60
13
votes
4 answers

K nearest neighbour in python

I would like to calculate K-nearest neighbour in python. what library should i use?
sramij
  • 4,407
  • 5
  • 26
  • 51
13
votes
2 answers

Find K nearest neighbors, starting from a distance matrix

I'm looking for a well-optimized function that accepts an n X n distance matrix and returns an n X k matrix with the indices of the k nearest neighbors of the ith datapoint in the ith row. I find a gazillion different R packages that let you do…
zkurtz
  • 2,880
  • 5
  • 22
  • 57
1
2 3
98 99