0

I want to create a python function that takes a word as argument and uses regular expression package re to reject words that contain any character other than '0123456789:MF' and '\s'.

Structure:

def function(word):
    pattern = re.compile('REGEXHERE')
    if pattern.match(word):
         return True
    else:
         return False

The problem is that I do not know the regular expression that does just that.

SOLUTION

Since I can't answer my own question yet, I publish here the solution provided by @MartinBonner that worked just fine:

def function(word):
    return not re.compile('[^0-9::MF\\s]').search(word)
pmanresa93
  • 303
  • 4
  • 10

2 Answers2

1

Use a character class. Your regex should be:

'[\\d:MF\\s]+'

To ensure the whole string is matched, you surround it with ^ and $:

'^[\\d:MF\\s]+$'
Mateen Ulhaq
  • 18,406
  • 13
  • 75
  • 112
  • That's an inefficient way to do it : rather than matching the whole string to make sure all its characters are accepted, you'd better match up to the first character that isn't accepted – Aaron Jun 16 '17 at 09:27
  • @Aaron Doesn't this accomplish that? The match should fail in `O(n)` time. – Mateen Ulhaq Jun 16 '17 at 09:28
  • Hmmm so my justification was stupid, you're right that both will fail/succeed as quickly. I think my option is marginally better memory-wise because it only has to match a single character, while your option will have to maintain the matched-so-far string in memory – Aaron Jun 16 '17 at 09:33
  • I've been running some tests on regex101 ([with yours](https://regex101.com/r/z7UhSW/1) / [with mine](https://regex101.com/r/jPK06b/1)), it looks like my option performs generally better with their PCRE engine. It also points out that they don't behave the same way with the empty string, since your regex guarantees at least one correct character while mine just guarantees at most 0 incorrect character. – Aaron Jun 16 '17 at 09:42
0

If you mean \s from the regex reserved words, take this

^[\d:MF\s]+$
Greaka
  • 708
  • 9
  • 16