28

Currently I'm doing a project which may require using a kNN algorithm to find the top k nearest neighbors for a given point, say P. im using python, sklearn package to do the job, but our predefined metric is not one of those default metrics. so I have to use the user defined metric, from the documents of sklearn, which can be find here and here.

It seems that the latest version of sklearn kNN support the user defined metric, but i cant find how to use it:

import sklearn
from sklearn.neighbors import NearestNeighbors
import numpy as np
from sklearn.neighbors import DistanceMetric
from sklearn.neighbors.ball_tree import BallTree
BallTree.valid_metrics

say i have defined a metric called mydist=max(x-y), then use DistanceMetric.get_metric to make it a DistanceMetric object:

dt=DistanceMetric.get_metric('pyfunc',func=mydist)

from the document, the line should looks like this

nbrs = NearestNeighbors(n_neighbors=4, algorithm='auto',metric='pyfunc').fit(A)
distances, indices = nbrs.kneighbors(A)

but where can i put the dt in? Thanks

alko
  • 39,930
  • 9
  • 90
  • 97
user2926523
  • 413
  • 1
  • 4
  • 8
  • 1
    the reason `nbrs = NearestNeighbors(n_neighbors=4, algorithm='auto',metric='pyfunc').fit(A) distances, indices = nbrs.kneighbors(A)` not working even i put `func=mydist` in there is because the parameter `algorithm=auto` not accepting user defined metrics, neither `algorithm=kd_tree` or `algorithm=brute`. Only the `algorithm=ball_tree` accepts – user2926523 Jan 10 '14 at 21:35

3 Answers3

35

You pass a metric as metric param, and additional metric arguments as keyword paramethers to NN constructor:

>>> def mydist(x, y):
...     return np.sum((x-y)**2)
...
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])

>>> nbrs = NearestNeighbors(n_neighbors=4, algorithm='ball_tree',
...            metric='pyfunc', func=mydist)
>>> nbrs.fit(X)
NearestNeighbors(algorithm='ball_tree', leaf_size=30, metric='pyfunc',
         n_neighbors=4, radius=1.0)
>>> nbrs.kneighbors(X)
(array([[  0.,   1.,   5.,   8.],
       [  0.,   1.,   2.,  13.],
       [  0.,   2.,   5.,  25.],
       [  0.,   1.,   5.,   8.],
       [  0.,   1.,   2.,  13.],
       [  0.,   2.,   5.,  25.]]), array([[0, 1, 2, 3],
       [1, 0, 2, 3],
       [2, 1, 0, 3],
       [3, 4, 5, 0],
       [4, 3, 5, 0],
       [5, 4, 3, 0]]))
alko
  • 39,930
  • 9
  • 90
  • 97
  • 3
    I am using scikit-learn 0.18.dev0 version and I get the following error - `_init_params() got an unexpected keyword argument 'func'` – Shishir Pandey Feb 20 '16 at 02:02
  • 7
    @ShishirPandey You could check the following commit, https://github.com/scikit-learn/scikit-learn/commit/ad751a3b6996a4c209c1a243d396aa6930d4acc4, NN signature has been changed. I guess you could just pass mydist directly as "metric" argument – alko Feb 21 '16 at 05:33
  • 1
    how can I define custom metric for sparse vectors? Using this method I get: `ValueError: metric 'pyfunc' not valid for sparse input` – mpr Mar 30 '17 at 15:11
20

A small addition to the previous answer. How to use a user defined metric that takes additional arguments.

>>> def mydist(x, y, **kwargs):
...     return np.sum((x-y)**kwargs["metric_params"]["power"])
...
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> Y = np.array([-1, -1, -2, 1, 1, 2])
>>> nbrs = KNeighborsClassifier(n_neighbors=4, algorithm='ball_tree',
...            metric=mydist, metric_params={"power": 2})
>>> nbrs.fit(X, Y)
KNeighborsClassifier(algorithm='ball_tree', leaf_size=30,                                                                                                                                                          
       metric=<function mydist at 0x7fd259c9cf50>, n_neighbors=4, p=2,
       weights='uniform')
>>> nbrs.kneighbors(X)
(array([[  0.,   1.,   5.,   8.],
       [  0.,   1.,   2.,  13.],
       [  0.,   2.,   5.,  25.],
       [  0.,   1.,   5.,   8.],
       [  0.,   1.,   2.,  13.],
       [  0.,   2.,   5.,  25.]]),
 array([[0, 1, 2, 3],
       [1, 0, 2, 3],
       [2, 1, 0, 3],
       [3, 4, 5, 0],
       [4, 3, 5, 0],
       [5, 4, 3, 0]]))
Mahmoud
  • 491
  • 4
  • 5
  • 3
    I actually think that in the function it needs to be kwargs["power"], not kwargs["metric_params"]["power"] . At least that's the behaviour I observe with sklearn '0.16.1' – benbo Oct 16 '15 at 20:03
  • @benbo you are right: I fixed the code and added a little comment (I edited the post by Mahmoud). – payne Nov 07 '18 at 01:18
0

Using KNeighborsRegressor() worked only by setting algorithm='brute' when trying to use a user defined metric.

Otherwise fit() works but predict() fails with error 'returned NULL without setting an error' when using JupyterLab, or 'SystemError: error return without exception set' when using Google Colab

Zvi
  • 1,858
  • 2
  • 19
  • 32