0

I want to match:

first second

and

second first

so the regular expression:

re.match(r'(?:(?P<f>first) (?P<s>second)|(?P=s) (?P=f))', 'first second')

matches, but this one:

re.match(r'(?:(?P<f>first) (?P<s>second)|(?P=s) (?P=f))', 'second first')

does not matches. Is this a bug on backreference in A|B ?

Matteo
  • 65
  • 1
  • 6
  • 2
    When in doubt, do *not* blame the regular expression engine; it rarely is a bug in the engine. – Martijn Pieters Apr 10 '14 at 15:05
  • That's ok of course, but if you try it doesn't work, documentation does not report anything about that, and the syntax is right to me. so if you find the problem anywhere else you're right – Matteo Apr 10 '14 at 15:07
  • There is more information about backreferences in the "Groups" section of the [Stack Overflow Regular Expressions FAQ](http://stackoverflow.com/a/22944075/2736496). – aliteralmind Apr 10 '14 at 15:11
  • @aliteralmind: the FAQ doesn't (yet) have any *proper* references for backreferences; the best that's there now is a specific phone-number pattern that doesn't say much about how they work. – Martijn Pieters Apr 10 '14 at 15:16
  • Hm. Okay. Then perhaps I'll look around for better ones, and please let me know if you find (or create! :) any...or feel free to add them yourself. – aliteralmind Apr 10 '14 at 15:24

2 Answers2

2

You've misunderstood how backreferences work. For a backreference to match anything, the original reference must have matched too.

In your second example, the (?P<f>first) group didn't match anything, so the (?P=f) back reference cannot match anything either.

Back references are the wrong tool here; you'll have to repeat at least one of your groups, literally:

r'(?:(?P<f>first )?(?P<s>second)(?(f)| first))'

would use a conditional pattern that only matches first after second if there was no f match before second:

>>> import re
>>> pattern = re.compile(r'(?:(?P<f>first )?(?P<s>second)(?(f)$| first))')
>>> pattern.match('first second').group()
'first second'
>>> pattern.match('second first').group()
'second first'
Martijn Pieters
  • 889,049
  • 245
  • 3,507
  • 2,997
  • Thank you, I understand. Do you have any idea of doing that (first second|second first) without making copy/paste in the regex? – Matteo Apr 10 '14 at 15:09
0

How about:

(?=.*(?P<f>first))(?=.*(?P<s>second))

(?=...) is a positive lookahead it assumes that the word first is present somewhere in the string without making it part of the match (it's a zero length assertion). It's the same for second.

This regex is true if there is first and second in any order in the string.

Toto
  • 83,193
  • 59
  • 77
  • 109