-1

I'm stuck with the following regular expression in Python 3.6.4:

import re
regex = r'\d{1,3}[-\s]?\d{3}[-\s]?\d{3}'
m = re.match(regex, '12377-456-789')

The output of the above code is:

<_sre.SRE_Match object; span=(0, 9), match='12377-456'>

The 7.2. re — Regular expression operations in the online Python documentation at:

https://docs.python.org/2/library/re.html#regular-expression-syntax

says the following:

{m} Specifies that exactly m copies of the previous RE should be matched; fewer matches cause the entire RE not to match. For example, a{6} will match exactly six 'a' characters, but not five.

{m,n} Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as many repetitions as possible.

Since the hyphen or space [-\s]? is optional after \d{1,3}, we don't have exactly 3 digits required by \d{3}, instead we have only two digits 77 followed by a hyphen. So how did Python return a match?

According to the official description, the regex should not match the string but surprisingly it does!

So I'm wondering as to how it is possible to get the above match output by Python.

Thanks a lot.

Community
  • 1
  • 1
Khurmi
  • 1
  • 1
    I'm pretty sure the first set of digits in matching `12`, then your second and third groups match `377` and `456`. – Blckknght Aug 04 '18 at 05:08

2 Answers2

0

It did not match \d{3} at the beginning - as you said, it wouldn't match correctly if that was the case. Since quantifiers (such as {1,3}) are greedy, it first tried to find a match for the entire RE starting with \d{3}, but it failed, so then it checked for a match starting with \d{2}, which does succeed. You can see this clearly if you put the initial \d repetition in a group, and put the rest of the RE in another group:

import re
regex = r'(\d{1,3})([-\s]?\d{3}[-\s]?\d{3})'
print(re.match(regex, '12377-456-789').groups())

Output:

('12', '377-456')

https://regex101.com/r/PuUCu1/1

CertainPerformance
  • 260,466
  • 31
  • 181
  • 209
  • Thank you so much for the answer and for the excellent reference to the website regex101 which can help debug and understand a regex. I forgot that the \d{1,3} had an upper-bound, so it would match up-to three occurrences and would also accept less than three. Keep up the good work! – Khurmi Aug 04 '18 at 05:32
  • When an answer solves your problem, consider marking it as Accepted to indicate that the issue is resolved – CertainPerformance Aug 04 '18 at 05:59
0

The Regex \d{1,3}[-\s]?\d{3}[-\s]?\d{3} is matching the 12377-456 out of 12377-456-789 as follows:

Step 1: 123
Step 2: 123 => ok
Step 3: 123 => backtrack
Step 4: 12  
Step 5: 12  => ok
Step 6: 12377
Step 7: 12377-
Step 8: 12377-456
Match found in 8 steps.
Andie2302
  • 4,551
  • 3
  • 19
  • 39