I am trying to remove all non alphanumeric characters except emojis. So the wrote the following code:
>>> import re
>>> re.sub(r"[^a-zA-Z0-9_#@\s\U00010000-\U0010ffff]", '', "THAT ASc776 ^ .? + #> _")
Its works fine and returns:
'THAT ASc776 ? #> _'
But if I put emoji in the text, I still get the same result:
>>> re.sub(r"[^a-zA-Z0-9_#@\s\U00010000-\U0010ffff]", '', "THAT ASc776 ^ .? + #> _")
'THAT ASc776 ? #> _'
I realize that Emojis are unicode, so I also tried the following
>>> RE_EMOJI = re.compile('[^\U00010000-\U0010ffffa-zA-Z0-9_#@\s]', flags=re.UNICODE)
>>> RE_EMOJI.sub('','AHAT ASc776 ^ .? + #> _')
'AHAT ASc776 ? #> _'
But it still doesn't recognize the emoji. So what's the correct way to remove all alphanumeric characters excluding emojis from a text.
EDIT:
With python3.5 the code works correctly and produces the correct output. However, I am using python2.7, and it doesn't work with python2.7.