1

I have a list of tuples. Each tuple holds two things:

1. A distance
2. A topic name

Now I want to find out thek smallest distances from this list. The list size is very big and k is quite small (k <= 5).

I think numpy.argpartition would have an edge instead of just sorting the entire list. This answer describes how to perform argparition on an 1-D array.

For example, let's say we have the following:

L = [(0.5, 'Anime'), (0.3, 'Arduino'), (0.7, 'Chess'), (0.1, 'Coffee'), (0.9, 'Space')]
A = np.asarray(L)

Now if k = 3, then after performing argparition, we should have the following indices array:

idx = [1 3 0 2 4]

And so print(A[idx[:k]]) would give:

[(0.1, 'Coffee'), (0.3, 'Arduino'), (0.5, 'Anime')]

The above was a simulation that I want to achieve. But given that I have a tuple, how can I perform argparition based on the first element of each tuple (i.e. the distance)?

David S
  • 7,302
  • 2
  • 12
  • 31
Robur_131
  • 474
  • 4
  • 12
  • does it work? what is the question? would it make sense to sort the array (or list for that matter) so this can be performed efficiently? – RichieV Oct 10 '20 at 14:55
  • I added the question before describing the scenario. I've added the question now at the last. – Robur_131 Oct 10 '20 at 14:56
  • I don't know why it is being downvoted. I've clarified the question and added a scenario that I want to achieve. Please tell me if I've worded the question poorly. – Robur_131 Oct 10 '20 at 14:59
  • @RichieV, I have explicitly mentioned in the question that the size of list is big and ``k`` is small. So I think sorting wouldn't be the most efficient approach here. Correct me if I'm wrong. – Robur_131 Oct 10 '20 at 15:00

1 Answers1

1

I don't quite understand if you tried anything. But the following works:

import numpy as np

L = [(0.5, 'Anime'), (0.3, 'Arduino'), (0.7, 'Chess'), (0.1, 'Coffee'), (0.9, 'Space')]
A = np.asarray(L)

print(A[np.argpartition(A, 3, axis=0)[:,0]][0:3])

Where you get:

array([['0.1', 'Coffee'],
       ['0.3', 'Arduino'],
       ['0.5', 'Anime']], dtype='<U32')

Is that the solution you were looking for? the second element is in no use in this case at least.

David S
  • 7,302
  • 2
  • 12
  • 31