Regular Expression to reject any string that do not contain any of these characters?

Question

I want to create a python function that takes a word as argument and uses regular expression package re to reject words that contain any character other than '0123456789:MF' and '\s'.

Structure:

def function(word):
    pattern = re.compile('REGEXHERE')
    if pattern.match(word):
         return True
    else:
         return False

The problem is that I do not know the regular expression that does just that.

SOLUTION

Since I can't answer my own question yet, I publish here the solution provided by @MartinBonner that worked just fine:

def function(word):
    return not re.compile('[^0-9::MF\\s]').search(word)

`if condition : return True else return False` is better written as `return condition`. — Aaron, Jun 16 '17 at 09:24
he meant `return pattern.match(word)`. You don't need the if statement — tushortz, Jun 16 '17 at 09:27
I would 1. Invert the characters being searched for. 2. Invert the test (return true if they are found), 3. Use `search`, not `match`. As a one-liner: `def function(word): return ! re.compile('[^0-9::MF\\s]').search(word)` — Martin Bonner supports Monica, Jun 16 '17 at 09:31
Thank you @MartinBonner, your solution worked perfectly! Just changed: `def function(word): return not re.compile('[^0-9::MF\\s]').search(word)` — pmanresa93, Jun 16 '17 at 09:40

Mateen Ulhaq · Answer 1 · 2017-06-16T09:26:25.483

1

Use a character class. Your regex should be:

'[\\d:MF\\s]+'

To ensure the whole string is matched, you surround it with ^ and $:

'^[\\d:MF\\s]+$'

edited Jun 16 '17 at 09:26

answered Jun 16 '17 at 09:24

Mateen Ulhaq

18,406
13
75
112

That's an inefficient way to do it : rather than matching the whole string to make sure all its characters are accepted, you'd better match up to the first character that isn't accepted – Aaron Jun 16 '17 at 09:27
@Aaron Doesn't this accomplish that? The match should fail in `O(n)` time. – Mateen Ulhaq Jun 16 '17 at 09:28
Hmmm so my justification was stupid, you're right that both will fail/succeed as quickly. I think my option is marginally better memory-wise because it only has to match a single character, while your option will have to maintain the matched-so-far string in memory – Aaron Jun 16 '17 at 09:33
I've been running some tests on regex101 ([with yours](https://regex101.com/r/z7UhSW/1) / [with mine](https://regex101.com/r/jPK06b/1)), it looks like my option performs generally better with their PCRE engine. It also points out that they don't behave the same way with the empty string, since your regex guarantees at least one correct character while mine just guarantees at most 0 incorrect character. – Aaron Jun 16 '17 at 09:42

score 0 · Answer 2 · answered Jun 16 '17 at 09:26

0

If you mean \s from the regex reserved words, take this

^[\d:MF\s]+$

answered Jun 16 '17 at 09:26

Greaka

708
9
16

Regular Expression to reject any string that do not contain any of these characters?

2 Answers2