0

Was wondering what is the best way to match "clear 18+ from your history" from "clear 18+ from your history? blah blah blah" is? Using Python.

I've tried this,

keyword = "clear 18+ from your history"
prepped_string = "clear 18+ from your history? blah blah blah"

is_flagged = False
if re.search(r'\b' + keyword + r'\b', prepped_string):
     is_flagged = True

The above code only works with no special character. If there is a special character like the plus sign, it won't work. Thanks in advance.

Here's the full code:

def _get_user_blacklist_weights(self, prepped_string, user):
        """
        Returns a list of (word, weight, reason) for every user word that is found.
        """
        out = []
        if user.blacklist:          
            matches = user.blacklist.search(prepped_string)
            for match in matches:
                is_flagged = False
                try:
                    if re.search(r'\b' + keyword + r'\b', prepped_string):
                        is_flagged = True
                except Exception as e:
                    # The condition below fixes both python 3.4 and 3.6 error message on repeating characters.
                    if (str(e)).startswith(C.REPEAT_ERROR_MESSAGES):
                        is_flagged = True
                    else: # pragma: no cover
                        error_logging(e)

                if is_flagged:
                    out.append((match, C.USER_BLACKLIST_MATCH_WEIGHT,
                            '%s or one of his/her accountability partners asked that "%s" be flagged.' % (user.person.first_name.title(), match)))               
        return out
catherine
  • 21,020
  • 12
  • 54
  • 75
  • 1
    I'm 90% certain that Wiktor would have closed this question, so I am also voting to close it. An answer which just says to escape the `+` is effectively just fixing a typo IMHO. – Tim Biegeleisen Nov 04 '19 at 10:23
  • @TimBiegeleisen it just a sample from the real code. All of the special characters doesn't work from the regex I did so I'm posting here to ask for help. I've also done lot's of research from here but I can't find that much my problem. – catherine Nov 04 '19 at 10:32
  • @TimBiegeleisen, it's not a typo, it needs a new function call. – Tim Nov 04 '19 at 18:11

2 Answers2

4

The + is interpreted as one or more. You can escape the +, or wrap it in a character set. For example:

keyword = r'clear 18\+ from your history'

or:

keyword = 'clear 18[+] from your history'

You can make use of re.escape(..) [python-doc] if you want to automatically escape a string. For example:

>>> print(re.escape('clear 18+ from your history'))
clear\ 18\+\ from\ your\ history
Willem Van Onsem
  • 321,217
  • 26
  • 295
  • 405
3

Use re.escape

Ex:

import re    
keyword = "clear 18+ from your history"
prepped_string = "clear 18+ from your history? blah blah blah"

is_flagged = False
if re.search(r'\b' + re.escape(keyword) + r'\b', prepped_string):
    is_flagged = True
print(is_flagged)  # -->True
Rakesh
  • 75,210
  • 17
  • 57
  • 95