1

I just learnt that [^ab] will catch any character other than a or b. So [^ab]* should match 0 or more characters none of which are a or b.

Yet python somehow matches [^ab]* to a.

pattern = '[^ab]*'
str = 'a'
r = re.compile(pattern)
m = r.match(str)
if m is None:
    print 'No match'
else:
    print 'match'

This code snippet prints a match. I believe either I am wrong in understanding the usage of ^ or made some error in the code

user3828311
  • 617
  • 2
  • 9
  • 18

3 Answers3

2

The []* means zero or more instances of what is in the brackets. In this case the empty string is matched since it is part of your string.

See the doc, section match.

If zero or more characters at the beginning of string match the regular expression pattern...

The string 'a' has the empty string '' at the beginning if you will. You could say the reg ex is seeing it as ''+'a' where the beginning matches your pattern.

Maybe you want to try fullmatch instead.

gonutz
  • 3,677
  • 1
  • 14
  • 33
  • Okay I understand what [^ab]* matches an empty string from str. What is being matched with 'a' from str – user3828311 Oct 26 '17 at 22:04
  • Okay, I think I misunderstood the difference between match and full match. so match will match any part of str, full match makes sure the entire str matches the pattern right ? – user3828311 Oct 26 '17 at 22:05
  • No, `match` will match the **start** of a string, see the quote from the docs in the answer. – gonutz Oct 27 '17 at 06:09
1

The "a" isn't the one being matched, it's an empty string ("") that's being matched.

As you know, the * in regex signifies that the previous group matches 0 or more times - your regex is matching [^ab] 0 times, which is an empty string. Because of this, m is not None, but contains the empty string, which is different from no match.

I wrote the assignments into a REPL:

>>> import re
>>> pattern = '[^ab]*'
>>> str = 'a'
>>> r = re.compile(pattern)
>>> m = r.match(str)
>>> m.groups()
=> ()
>>> m.group(0)
=> ''
>>> m.group(1)
Traceback (most recent call last):
  File "python", line 1, in <module>
IndexError: no such group

You can see that m.groups() is an empty tuple (and therefore not None), and m.group(0) returns an empty string, which means pattern matched 1 item (since m.group(1) doesn't work), which is an empty string.

Try using [^ab]+ to not match anything - the + is like *, but it matches at least once.

Qwerp-Derp
  • 457
  • 6
  • 20
0

As the previous answers states, using * regex expression matches 0 or more matches. That's the tricky part.

You can do some testing, and better understand how Python treats *. The match object has the groups() method which returns the matches in groups:

>>> r.match(str_).groups()
()

This seems odd. How come this returns an empty group? This is even more strange: the findall method returns a list of matched items

>>> r.findall(str_)
['', '']

Two empty strings. This means even when there is no match, you'll get an empty item (list / tuple). If you change the * to + (match one or more) the results are different:

>>> pattern = '[^ab]+'
>>> re.match(pattern, 'a').groups()

Traceback (most recent call last):
  File "<pyshell#135>", line 1, in <module>
    re.match(pattern, 'a').groups()
AttributeError: 'NoneType' object has no attribute 'groups'

This returns None.

So, to sum it up, Python treats * with additional empty string at the beginning of the searched string when used in pattern such []*. However, when the pattern []+ is used it doesn't do that.

Chen A.
  • 7,798
  • 2
  • 26
  • 48