3

I am trying to print the top 10 frequent words using the following code. However, its not working. Any idea on how to fix it?

def reducer_count_words(self, word, counts):
    # send all (num_occurrences, word) pairs to the same reducer.
    # num_occurrences is so we can easily use Python's max() function.
    yield None, (sum(counts), word)




# discard the key; it is just None
def reducer_find_max_10_words(self, _, word_count_pairs):
    # each item of word_count_pairs is (count, word),
    # so yielding one results in key=counts, value=word

        tmp = sorted(word_count_pairs)[0:10]
        yield tmp
Leftium
  • 10,906
  • 6
  • 51
  • 75
A.M.
  • 1,470
  • 5
  • 20
  • 37
  • @Veedrac: more similar to this question: http://stackoverflow.com/questions/3121979/how-to-sort-list-tuple-of-lists-tuples – Leftium May 28 '14 at 19:45
  • @Leftium I strongly disagree with your interpretation of the question. Also, how the hell did "its not working. Any idea on how to fix it?" get upvotes? – Veedrac May 28 '14 at 19:49
  • @Veedrac: my interpretation is based on the question title and the asker's responses to other answers. – Leftium May 28 '14 at 19:58
  • @Leftium I stick by my opinion, but I don't really care about a question of this quality anyway. – Veedrac May 28 '14 at 20:00

3 Answers3

2

Use collections.Counter and its most_common method:

>>>from collections import Counter
>>>my_words = 'a a foo bar foo'
>>>Counter(my_words.split()).most_common()
[('foo', 2), ('a', 2), ('b', 1)]
BeetDemGuise
  • 884
  • 5
  • 9
1

Use collections.most_common()

Example:

most_common([n])
Return a list of the n most common elements and their counts from the most common to the least. If n is not specified, most_common() returns all elements in the counter. Elements with equal counts are ordered arbitrarily:

>>> from collections import Counter
>>> Counter('abracadabra').most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
johntellsall
  • 11,853
  • 3
  • 37
  • 32
  • I am using this command in my code but seeing this error: unhashable type 'list'. If I want to use this format it seems like I cannot use `most.common()` – A.M. May 28 '14 at 18:29
  • 1
    Run `most_common()` on the list of words, not on the `(word, count)` tuples – johntellsall May 28 '14 at 18:32
0
tmp = sorted(word_count_pairs, key=lambda pair: pair[0], reverse=True)[0:10]

Explanation:

  • The key parameter of sorted() allows you to run a function on each element before comparison.
  • lambda pair: pair[0] is a function that extracts the number from your word_count_pairs.
  • reverse sorts in descending order, instead of ascending order.

Sources:


aside: If you have many different words, sorting the entire list to find the top ten is inefficient. There are much more efficient algorithms. The most_common() method mentioned in another answers probably utilizes a more efficient algorithm.

Leftium
  • 10,906
  • 6
  • 51
  • 75