Remove punctuation in sentiment analysis in python

Question

I have the following code I made. It works great but problems arise when I add sentences with commas, full-stops etc. I've researched and can see strip() as a potential option to fix it? I can't see where to add it and have tried but just error after error!

Thanks

sent_analysis = {"beer": 10, "wine":13,"spirit": 11,"cider":16,"shot":16}

def sentiment_analysis(dic, text):
    split_text = text.split()
    result = 0.00
    for i in split_text:
        if i in dic:
            result+= dic[i]
    return result


print sentiment_analysis(sent_analysis,"the beer, wine and cider were    great")
print sentiment_analysis(sent_analysis,"the beer and the wine were great")

@JustenIngels thanks. if you look at the two last statements, because there is a comma after beer in the first, the value of beer is not captured. Is there a way to remove all commas/punctuation to ensure this doesn't happen? — Phil, Apr 16 '16 at 14:05
Have a look at http://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string-in-python — jusx, Apr 16 '16 at 14:07
Having a bit of fun with Python: `split_text = map(lambda i: i.strip(".,"), text.split())`. Or with a regex: `split_text = re.split("[ ,.]+", text)` (which does not need a space after comma's). But I think @Justen's`translate` suffices. — Jongware, Apr 16 '16 at 14:11

score 1 · Answer 1 · answered Apr 16 '16 at 14:26

Regular expressions can be used to remove all non alpha-numeric characters from a string. In the code below the ^\w\s matches anything not (as indicated by the ^) a-z, A-Z,0-9, and spaces, and removes them. The return statement iterates though the split string, finding any matches, adding it to a list, then returning the sum of those numbers.

Regex \s

Regex \w

import re
sent_analysis = {"beer": 10, "wine":13,"spirit": 11,"cider":16,"shot":16}

def sentiment_analysis(dic, text):
    result = 0.00
    s = re.sub(r'[^\w\s]','',text)
    return sum([dic[x] for x in s.split() if x in dic])

print(sentiment_analysis(sent_analysis,"the beer,% wine &*and cider @were great"))

Output: 39

This will account for most punctuation, as indicated by the many different ones added in the example string.

Remove punctuation in sentiment analysis in python

1 Answers1