Something weird happens in this code:
fh = open('romeo.txt', 'r')
lst = list()
for line in fh:
line = line.split()
for word in line:
lst.append(word)
for word in lst:
numberofwords = lst.count(word)
if numberofwords > 1:
lst.remove(word)
lst.sort()
print len(lst)
print lst
romeo.txt is taken from http://www.pythonlearn.com/code/romeo.txt
Result:
27
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']
As you can see, there are two 'the'. Why is that? I can run this part of code again:
for word in lst:
numberofwords = lst.count(word)
if numberofwords > 1:
lst.remove(word)
After running this code a second time it deletes the remaining 'the', but why doesn't it work the first time?
Correct output:
26
['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']