2

I don't understand why the regex (?<=i:(?>\D*))\d does not match the string i:>1.

The way I undertand it:

  • at index 0: the lookbehind i won't match
  • at index 1: the lookbehind i: won't match
  • at index 2: the lookbehind i:(?>\D*) will match i: but the \d after the lookbehind won't match >
  • at index 3: the lookbehind i:(?>\D*) will match i:> and the \d after the lookbehind will match 1 -> the regex is satisfied
AXO
  • 5,659
  • 4
  • 46
  • 51
  • I know that it works if I replace the atomic group `(?>\D*)` with a `\D*`, but I want to know what is happening with the atomic group. This is a simplified version of a more complex regex that I had an issue with. – AXO Jan 03 '18 at 21:36
  • The problem here is that the lookbehind is implemented in such a way that the `(?>\D*)` is executed before `i:`, and `i:` is matched with `\D*`, and as it is inside an atomic group, no backtracking is possible. – Wiktor Stribiżew Jan 03 '18 at 21:39
  • @WiktorStribiżew "lookbehind is implemented in such a way that the ‍‍`(?>\D*)` is executed before `i:`", thanks, but why have they done that? Is this documented somewhere? – AXO Jan 03 '18 at 21:49
  • 1
    The only link I can find is [one of Kobi's answers](https://stackoverflow.com/a/13425789/3832970). – Wiktor Stribiżew Jan 03 '18 at 21:50

1 Answers1

2

See Regular Expressions Cookbook: Detailed Solutions in Eight Programming Languages:

.NET allows you to use anything inside lookbehind, and it will actually apply the regular expression from right to left. Both the regular expression inside the lookbehind and the subject text are scanned from right to left.

The (?<=i:(?>\D*))\d pattern does not match the 1 in i:>1 because the atomic group (?>\D*) prevents any backtracking into its pattern. The i: (actually, : and then i gets matched) is matched with \D*, and then there no way to re-match i: as the atomic group does not allow backtracking.

You can also see that (?<=i:(?>[^:\d]*))\d will match 1 in i:>1 because here, [^:\d]* matches any char but : and digits, and thus only comes up to i: and i: is still there to be matched.

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397