3

I'm trying to delete all punctuation from a text using regex. The problem is, punctuation regex doesn't seem to have any effect (neither \p{P} nor \p{Punct}).

import re

hello_world = 'Hello, world!'
hello_world = re.sub('\p{Punct}', '', hello_world)
print(hello_world)

Am I doing something wrong? The following produces the desired effect, but I still don't get why the code above doesn't work.

# import string

# ...

hello_world = re.sub('[{}]'.format(string.punctuation), '', hello_world)
shooqie
  • 824
  • 6
  • 16
  • related: [Remove punctuation from Unicode formatted strings](http://stackoverflow.com/q/11066400/4279) – jfs Oct 31 '15 at 12:53
  • related: [Best way to strip punctuation from a string in Python](http://stackoverflow.com/q/265960/4279) – jfs Oct 31 '15 at 12:53
  • may try re.sub(r'[^a-zA-Z0-9\s]+','','Hello, world!') – SIslam Oct 31 '15 at 12:55
  • 1
    I could be wrong, but I don't think the syntax you are using works with the `re` module. Try this https://pypi.python.org/pypi/regex – elethan Oct 31 '15 at 12:55
  • Look at this http://stackoverflow.com/questions/1832893/python-regex-matching-unicode-properties/4316097#4316097 – Tomasz Jakub Rup Oct 31 '15 at 12:55

1 Answers1

5

stdlib's re module does not support specifying properties (\p{}). There is regex module that does support the properties and it is a drop-in replacement for the re module.

jfs
  • 346,887
  • 152
  • 868
  • 1,518
  • That explains it, my IDE was suggesting all kinds of `\p` properties, so I assumed it would work. Thank you! – shooqie Oct 31 '15 at 12:58
  • @shooqie: note that your pattern must be in a raw string: `regex.sub(r'\p{Punct}', '', hello_world)` or you must escape the slash: `regex.sub('\\p{Punct}', '', hello_world)` – Casimir et Hippolyte Oct 31 '15 at 13:07
  • @CasimiretHippolyte: `'\p' == '\\p'` therefore it is not "must"; it is "should". Also, on Python 2: `flags=re.UNICODE` and `u'\p{P}'` (Unicode pattern should be used). See [the question I've linked above](http://stackoverflow.com/questions/33451657/python-punctuation-regex-doesnt-seem-to-work/33451836#comment54690249_33451657) – jfs Oct 31 '15 at 13:23