76

In order to find the index of the smallest value, I can use argmin:

import numpy as np
A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
print A.argmin()     # 4 because A[4] = 0.1

But how can I find the indices of the k-smallest values?

I'm looking for something like:

print A.argmin(numberofvalues=3)   
# [4, 0, 7]  because A[4] <= A[0] <= A[7] <= all other A[i]

Note: in my use case A has between ~ 10 000 and 100 000 values, and I'm interested for only the indices of the k=10 smallest values. k will never be > 10.

strpeter
  • 1,772
  • 2
  • 21
  • 43
Basj
  • 29,668
  • 65
  • 241
  • 451
  • 2
    See [this question](http://stackoverflow.com/questions/6910641/how-to-get-indices-of-n-maximum-values-in-a-numpy-array), especially the second answer there, for the best solution to this (it's O(n) - full sorting the entire array is not absolutely necessary). – Alex Riley Dec 11 '15 at 15:18
  • Similar: https://stackoverflow.com/questions/33623184/ – Wok Mar 01 '21 at 14:18

4 Answers4

128

Use np.argpartition. It does not sort the entire array. It only guarantees that the kth element is in sorted position and all smaller elements will be moved before it. Thus the first k elements will be the k-smallest elements.

import numpy as np

A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
k = 3

idx = np.argpartition(A, k)
print(idx)
# [4 0 7 3 1 2 6 5]

This returns the k-smallest values. Note that these may not be in sorted order.

print(A[idx[:k]])
# [ 0.1  1.   1.5]

To obtain the k-largest values use

idx = np.argpartition(A, -k)
# [4 0 7 3 1 2 6 5]

A[idx[-k:]]
# [  9.  17.  17.]

WARNING: Do not (re)use idx = np.argpartition(A, k); A[idx[-k:]] to obtain the k-largest. That won't always work. For example, these are NOT the 3 largest values in x:

x = np.array([100, 90, 80, 70, 60, 50, 40, 30, 20, 10, 0])
idx = np.argpartition(x, 3)
x[idx[-3:]]
array([ 70,  80, 100])

Here is a comparison against np.argsort, which also works but just sorts the entire array to get the result.

In [2]: x = np.random.randn(100000)

In [3]: %timeit idx0 = np.argsort(x)[:100]
100 loops, best of 3: 8.26 ms per loop

In [4]: %timeit idx1 = np.argpartition(x, 100)[:100]
1000 loops, best of 3: 721 µs per loop

In [5]: np.alltrue(np.sort(np.argsort(x)[:100]) == np.sort(np.argpartition(x, 100)[:100]))
Out[5]: True
unutbu
  • 711,858
  • 148
  • 1,594
  • 1,547
  • Any idea how this handles ties? It seems if you want random tie break the only possible way is to use lexsort which sorts the whole array. https://stackoverflow.com/a/20199459/1993389 The partition docs say introselect is not stable, but I'm not sure if this means ties are broken at random. – user27182 Aug 20 '19 at 19:05
  • @user27182: Per [the docs](https://docs.scipy.org/doc/numpy/reference/generated/numpy.argpartition.html), if `a` is an array with fields (i.e. a [structured array](https://docs.scipy.org/doc/numpy-1.15.0/user/basics.rec.html)), then you can either specify an `order` or let the unspecified fields be used to break ties. So if you pour `A` into the first field of a structured array, then pour random (tie breaking) numbers into a second field, then `np.argpartition` could be used to select the `k` smallest (or largest) with random tie breaks. – unutbu Aug 20 '19 at 20:02
  • 1
    Keep in mind that the first `k-1` elements are not guaranteed to be in order from smallest to largest. If that's something you need, you could use `np.argpartition`, slice the array using the first `k` indices, and then use `np.argsort` on the resulting array. – Leland Hepworth Dec 09 '20 at 22:29
20

You can use numpy.argsort with slicing

>>> import numpy as np
>>> A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
>>> np.argsort(A)[:3]
array([4, 0, 7], dtype=int32)
Cory Kramer
  • 98,167
  • 13
  • 130
  • 181
  • 3
    Thanks! But wouldn't it be very slow to have to compute *all* the `argsort` and then keeping only the k (<=10) first values? – Basj Dec 11 '15 at 15:07
  • Hard to say without knowing the implementation of `argsort`. Specifically, if it is implemented as a generator, and depending on the actual sorting algorithm it could potentially be lazy, or it could sort the entire set first I'm not sure. – Cory Kramer Dec 11 '15 at 15:18
  • 4
    From the other comments, it does seem that `argsort` sorts the entire set, so I would prefer one of the other suggested solutions that use `argpartition` – Cory Kramer Dec 11 '15 at 15:21
  • 1
    This nice thing about this solution (compared to `argpartition`) is that we can guarantee that the k indices we are looking for are in ascending order. – protagonist Jan 31 '19 at 02:27
2

For n-dimentional arrays, this function works well. The indecies are returned in a callable form. If you want a list of the indices to be returned, then you need to transpose the array before you make a list.

To retrieve the k largest, simply pass in -k.

def get_indices_of_k_smallest(arr, k):
    idx = np.argpartition(arr.ravel(), k)
    return tuple(np.array(np.unravel_index(idx, arr.shape))[:, range(min(k, 0), max(k, 0))])
    # if you want it in a list of indices . . . 
    # return np.array(np.unravel_index(idx, arr.shape))[:, range(k)].transpose().tolist()

Example:

r = np.random.RandomState(1234)
arr = r.randint(1, 1000, 2 * 4 * 6).reshape(2, 4, 6)

indices = get_indices_of_k_smallest(arr, 4)
indices
# (array([1, 0, 0, 1], dtype=int64),
#  array([3, 2, 0, 1], dtype=int64),
#  array([3, 0, 3, 3], dtype=int64))

arr[indices]
# array([ 4, 31, 54, 77])

%%timeit
get_indices_of_k_smallest(arr, 4)
# 17.1 µs ± 651 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Jeremiah England
  • 451
  • 6
  • 12
0

numpy.partition(your_array, k) is an alternative. No slicing necessary as it gives the values sorted until the kth element.

  • 1
    This puts the element at index `k` (possibly unsorted) in sorted position. Since the sorted position is not necessary index `k` or `k-1`, we cannot guarantee that `your_array[:k]` contains the `k` smallest elements after `numpy.partition`. – protagonist Jan 31 '19 at 02:31
  • This is the best answer for values of the array (not indexes) in Oct. 2019. @protagonist I don't understand your comment. Please correct me if wrong, but evidence that this partition function works correctly is to run the following in a loop: y = np.arange(10) ; np.random.shuffle(y) ; y.partition(3) ; assert y[:3+1].max() < y[3+1:].min() ... did the partition function have different behavior in old numpy version or something? Also FYI: `k` is zero-indexed. – Alex Gaudio Oct 30 '19 at 16:31