1

How to count how many times a word appears in a list of strings?

For example:

['This is a sentence', 'This is another sentence']

and the result for the word "sentence" is 2

Martijn Pieters
  • 889,049
  • 245
  • 3,507
  • 2,997
user2578185
  • 397
  • 1
  • 4
  • 10

2 Answers2

12

Use a collections.Counter() object and split your words on whitespace. You probably want to lowercase your words as well, and remove punctuation:

from collections import Counter

counts = Counter()

for sentence in sequence_of_sentences:
    counts.update(word.strip('.,?!"\'').lower() for word in sentence.split())

or perhaps use a regular expression that only matches word characters:

from collections import Counter
import re

counts = Counter()
words = re.compile(r'\w+')

for sentence in sequence_of_sentences:
    counts.update(words.findall(sentence.lower()))

Now you have a counts dictionary with per-word counts.

Demo:

>>> sequence_of_sentences = ['This is a sentence', 'This is another sentence']
>>> from collections import Counter
>>> counts = Counter()
>>> for sentence in sequence_of_sentences:
...     counts.update(word.strip('.,?!"\'').lower() for word in sentence.split())
... 
>>> counts
Counter({'this': 2, 'is': 2, 'sentence': 2, 'a': 1, 'another': 1})
>>> counts['sentence']
2
Martijn Pieters
  • 889,049
  • 245
  • 3,507
  • 2,997
3

You could do what you want pretty easily with a little regex and a dictionary.

import re

dict = {}
sentence_list = ['This is a sentence', 'This is a sentence']
for sentence in sentence_list:
    for word in re.split('\s', sentence): # split with whitespace
        try:
            dict[word] += 1
        except KeyError:
            dict[word] = 1
print dict
Ross
  • 244
  • 1
  • 6