0

Take this example:

import re
re.search(r"\bsr\.?\b","sr. manager")

<_sre.SRE_Match object; span=(0, 2), match='sr'>

This result is not what I had expected.

The ? qualifier is greedy, so it should match as much text as possible (reference).

Reading the pattern it should say "match a word boundary, followed by "sr", followed by 0 or 1 dot (but as much characters as possible), followed by another word boundary". So I expected the patter to math "sr." and not just "sr". This is the workaround that I have found:

re.search(r"\bsr(\.|\b)","sr. manager")

<_sre.SRE_Match object; span=(0, 3), match='sr.'>

The non greedy version gives instead what I expected for the non-greedy version:

re.search(r"\bsr\.??\b","sr. manager")

<_sre.SRE_Match object; span=(0, 2), match='sr'>

Why is the greedy version not giving the answer I expect? What is wrong with my understanding of this type of qualifiers?

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
robertspierre
  • 1,669
  • 1
  • 19
  • 23
  • 1
    Quantifiers, not qualifiers – ctwheels Nov 09 '17 at 16:03
  • `\b` is throwing it off. It doesn't match anything at the `.` location since `. ` (dot followed by space) doesn't include a word boundary character. You should instead use `\bsr\b\.?`. Optionally, you can add a `\B` at the end as such `\bsr\b\.?\B`. The latter will ensure what follows the `.` is not a word character. – ctwheels Nov 09 '17 at 16:06
  • @ctwheels the documentation calls them "repetition qualifiers" multiple times – robertspierre Nov 09 '17 at 16:07
  • Very interesting... That shouldn't be like that. It's actually a quantifier and not a qualifier. – ctwheels Nov 09 '17 at 16:09
  • I've sent a message to the python documentation team to have this addressed. That terminology is incorrect. – ctwheels Nov 09 '17 at 16:14

0 Answers0