0

I have a list of Twitter hashtags named li. I want to make a new list top_10 of the most frequent hashtags from that. So far I have done (#):

li = ['COVID19', 'Covid19', 'covid19', 'coronavirus', 'Coronavirus',...]
tag_counter = dict()
for tag in li:
    if tag in tag_counter:
         tag_counter[tag] += 1
    else:
         tag_counter[tag] = 1
 
popular_tags = sorted(tag_counter, key = tag_counter.get, reverse = True)

top_10 = popular_tags[:10]

print('\nList of the top 10 popular hashtags are :\n',top_10)

As the hashtags are not case-sensitive, I want to apply case-insensitivity while creating my tag_counter.

raf
  • 85
  • 9

4 Answers4

2

Use collections.Counter from the standard library

from collections import Counter

list_of_words = ['hello', 'hello', 'world']
lowercase_words = [w.lower() for w in list_of_words]

Counter(lowercase_words).most_common(1)

Returns:

[('hello', 2)]
Vasili Syrakis
  • 8,340
  • 1
  • 31
  • 51
1

Normalize data first, with lower or upper.

li = ['COVID19', 'Covid19', 'covid19', 'coronavirus', 'Coronavirus']
li = [x.upper() for x in li] # OR, li = [x.lower() for x in li]
tag_counter = dict()
for tag in li:
    if tag in tag_counter:
         tag_counter[tag] += 1
    else:
         tag_counter[tag] = 1
 
popular_tags = sorted(tag_counter, key = tag_counter.get, reverse = True)

top_10 = popular_tags[:10]

print('\nList of the top 10 popular hashtags are :\n',top_10)
Ricardo Rendich
  • 536
  • 3
  • 5
  • Thank you so much! Can you please tell me what `key = tag_counter.get` is doing the function `sorted()`? Also, how can I improve/shorten my code? – raf Aug 15 '20 at 09:44
1

You can use Counter from collections library

from collections import Counter

li = ['COVID19', 'Covid19', 'covid19', 'coronavirus', 'Coronavirus']

print(Counter([i.lower() for i in li]).most_common(10))

Output:

[('covid19', 3), ('coronavirus', 2)]
bigbounty
  • 13,123
  • 4
  • 20
  • 50
1

See below

from collections import Counter

lst = ['Ab','aa','ab','Aa','Cct','aA']
lower_lst = [x.lower() for x in lst ]
counter = Counter(lower_lst)
print(counter.most_common(1))
balderman
  • 12,419
  • 3
  • 21
  • 36