0

i'm trying to get the 5 most occurings elements and their count of occurence out of a list. I solved it with a solution that works in time complexity O(5*n*n).

Is there an optimal solution with a better time complexity?

Example Input:

[['16.37.123.153','119.222.456.130','38673','161','17','62','4646']
 ['16.37.456.153','119.222.123.112','56388','161','17','62','4646']..]

Example Output:

MostOccurrentElements = [['16.37.123.153','119.222.456.130','38673','161','17','62','4646']..]
Counter = [134, ..]

With first element of MostOccurrentElements List going with first element of Counter List.

QWERASDFYXCV
  • 139
  • 1
  • 13

1 Answers1

1

You can use the heapq.nlargest method to get the 5 most occurring items in O(n log(t)) time complexity, where n is the number of items and t is the number of largest items to get. collections.Counter can obtain the counts of each distinct item value in O(n) time, so overall the following code can find the 5 most occurring items in an average of O(n log(t)):

from collections import Counter
import heapq
from operator import itemgetter
l = [1,1,2,3,3,3,3,4,4,4,4,5,5,6,6,6,7]
print(heapq.nlargest(5, Counter(l).items(), key=itemgetter(1)))

This outputs:

[(3, 4), (4, 4), (6, 3), (1, 2), (5, 2)]

Edit: As @jpp points out in the comment, the above can be accomplished with an equivalent wrapper method most_common from Counter:

print(Counter(l).most_common(5))
blhsing
  • 70,627
  • 6
  • 41
  • 76
  • Ok, can i also get the output like: Elements = [3, 4 , 6, 1, 5] Counter = [4, 4, 3, 2, 2] With the elements 1,2,3,4,5,6,7 being a list of strings itself? – QWERASDFYXCV Oct 19 '18 at 17:27
  • You can do that with: `elements, counts = zip(*Counter(l).most_common(5))` – blhsing Oct 19 '18 at 17:33
  • Ok and that also works if i want to get the most occurrent elements of a nested list with strings? – QWERASDFYXCV Oct 19 '18 at 17:33
  • Not sure what you mean by a nested list of strings. Can you update your question with examples of this new requirement? – blhsing Oct 19 '18 at 17:36
  • Ok I updated the question with an example input/output. Thank you in advance for the help :) – QWERASDFYXCV Oct 19 '18 at 17:44
  • You should flatten your list of lists first. If your list of lists is stored in variable `l` then you can use `Counter(i for s in l for i in s).most_common(5)`. – blhsing Oct 19 '18 at 17:48
  • No i dont want to flatten the list. Like ['16.37.123.153','119.222.456.130','38673','161','17','62','4646'] e.g. is one element out of many. Or what do u mean? – QWERASDFYXCV Oct 19 '18 at 17:52
  • So e.g., ['16.37.123.153','119.222.456.130','38673','161','17','62','4646'] is one element and ['16.37.123.155','119.222.456.132','38456','123','17','62','4345'] another. – QWERASDFYXCV Oct 19 '18 at 18:01
  • In that case I think the existing code `Counter(l).most_common(5)` would already work, since you're simply treating each sublist as one element, right? – blhsing Oct 19 '18 at 18:04
  • I'm not at home at the moment, but i will test the solution with `elements, counts = zip(*Counter(l).most_common(5))` asap and will give feedback if it worked. Thanks so far :) – QWERASDFYXCV Oct 19 '18 at 18:10
  • Yeah im treating each sublist as one element, but the function has to check the sublist elements to differentiate between the elements. – QWERASDFYXCV Oct 19 '18 at 18:12
  • And the `elements, counts = zip(*Counter(l).most_common(5))` solution has the same time complexity like the first solution with a heap queue? – QWERASDFYXCV Oct 19 '18 at 18:51
  • I get a error message when i use `elements, counts = zip(*Counter(l).most_common(5))` Error Message:`File "/usr/lib/python2.7/collections.py", line 477, in __init__ self.update(*args, **kwds) File "/usr/lib/python2.7/collections.py", line 567, in update self[elem] = self_get(elem, 0) + 1 TypeError: unhashable type: 'list'` – QWERASDFYXCV Oct 22 '18 at 21:54