-1

Optional quantifier as anyone other supposed to be greedy by default, i.e. should match as much as possible.

Let's try to apply (AB)?.*B regex to input AB. What I expect is that (AB)? will greedily match full string, and there will be no more characters to match for the rest .*B.

Actual behavior differs, it matches given input. Why?

Adamovskiy
  • 1,051
  • 1
  • 10
  • 37
  • 1
    The trailing `B` forces the regex to guarantee that it matches a `B` at the end so in your example it's easier to read your regex from right to left. – MonkeyZeus Oct 02 '19 at 12:57
  • The whole pattern needs to match, that means also the B at the end. When the optional AB matches it continues and try to match the B. That does not match any more and it will backtrack. Then `.*` can match the rest and that can backtrack to match the final `B` – The fourth bird Oct 02 '19 at 12:58
  • So, regex is being matched not left-to-right, but in priority order? How it may be forced to match optional group first (and fail matching of this example)? – Adamovskiy Oct 02 '19 at 13:09
  • If you want `(AB)?` to be important then use `(AB)?.*B?` so that the trailing `B` is less important and now your regex will read from left-to-right. – MonkeyZeus Oct 02 '19 at 13:12
  • @AlexeyAdamovskiy This example has got nothing to do with "greediness" or "priority". If the `(AB)` matches, then the `.*B` part of the regex would not match. Therefore the whole thing would not match. – Tom Lord Oct 02 '19 at 13:12
  • @TomLord this is exactly what I expect here. But nonetheless it matches. – Adamovskiy Oct 02 '19 at 13:17
  • @MonkeyZeus `B?` will make `B` optional and given input matching again. This is not what I want to get. What I in general want, is to keep validating input string by an existing regex, but with an optional prefix. – Adamovskiy Oct 02 '19 at 13:20
  • @AlexeyAdamovskiy It matches, because the `(AB)?` is optional. So, `(AB)?` matches against nothing, `.*` matches against `A` and `B` matches against `B`. In other words, `(AB)?.*B` does match `AB`, but the capture group is blank. – Tom Lord Oct 02 '19 at 13:26
  • @TomLord This would explain why it would match input like `XB`. `(AB)?` is optional, but it is greedy. Why should it skip `AB` match? Regex as I believed before should apply leftmost operator first, and this operator fits well. – Adamovskiy Oct 02 '19 at 13:34
  • 1
    I really do not understand what your question is. As it is written you simply misunderstand quantifiers and the regex engine. I suggest playing around at https://regex101.com/ to better understand how regex works. If you wish to update your question with examples of things which should or should not match then I may be able to offer more advice but as it stands you simply do not understand what is being explained to you. – MonkeyZeus Oct 02 '19 at 13:35
  • Even though `(AB)?` is greedy, it is optional so that means it is not required to match. – MonkeyZeus Oct 02 '19 at 13:40
  • https://www.regular-expressions.info/refrepeat.html is a good reference – MonkeyZeus Oct 02 '19 at 13:40
  • @MonkeyZeus so, thanks, it helped. Thing that gives behavior that I've been expecting from greedy quantifier, is actually "Possessive quantifier". So my regex should be `(AB)?+.*B` – Adamovskiy Oct 02 '19 at 13:56

2 Answers2

2

That is because you are asking regex to guarantee a trailing B

Let's break it down:

  • (AB)? - try and give me a sequential AB
  • .* - try and give me zero or more of anything
  • B - give me B or give me death! B must be matched so .* matches the A and (AB)? is okay with getting nothing

If your input is ABB then AB would go to capture group #1


Had you done:

(A|B)?.*B

then at least A would have went to group #1 but B is still explicitly matched by the trailing B

MonkeyZeus
  • 18,445
  • 3
  • 30
  • 67
1

Regex

(AB)?.*B

applied to string

AB

works like this:

  1. There MUST be a B at the end.
  2. Sequence AB can appear once, or not at all.
  3. If sequence AB is considered to appear once, then a final B does not exist. So the regex engine decides that sequence AB does not appear at all.
  4. .* will match A
  5. B will match B
virolino
  • 1,780
  • 3
  • 20