1

Given (aba?)+ as the Regex and abab is the string.

Why does it only matches aba?

Since the last a in the regex is optional, isn't abab a match as well?

tested on https://regex101.com/

Allan
  • 11,170
  • 3
  • 22
  • 43
Saldeho
  • 113
  • 5
  • `aba` is matched first, then `b` remains to be consumed, but it does not match, thus `abab` is OK. Maybe you want to use `(ab(?=a|$))+` or `(ab(?![^a]))+` – Wiktor Stribiżew Dec 04 '17 at 08:48
  • Just [see this debugging page](https://regex101.com/r/n53vxV/1/debugger), you will see why. – Wiktor Stribiżew Dec 04 '17 at 09:00
  • So, all you need are anchors, `^` and `$`. There is absolutely no need in lazy `??`. Greediness is of no importance here, I already mentioned that in another comment. [`^(aba?)+$`](https://regex101.com/r/hObylO/1) will behave the same as `^(aba??)+$`. – Wiktor Stribiżew Dec 05 '17 at 07:44

3 Answers3

1

The reason (aba?)+ only matches aba out of abab is greedy matching: The optional a in the first loop is tested before the group is tested again, and matches. Therefore, the remaining string is b, which does not match (aba?) again.

If you want to turn off greedy matching for this optional a, use a??, or write your regex differently.

YSelf
  • 2,548
  • 1
  • 14
  • 18
  • When you use `a??` at the end of the pattern, it always matches an empty string, the `a` is not even tested. This has nothing to do with greediness anyway. – Wiktor Stribiżew Dec 04 '17 at 08:51
  • Yeah aba is not even matched anymore... – Allan Dec 04 '17 at 08:53
  • 1
    Unless you use beginning / ending string anchors `^(aba??)+$` @WiktorStribiżew – revo Dec 04 '17 at 08:54
  • @revo As I said, the `a??` *at the end of the pattern*, and in your (not OP) `^(aba??)+$` regex, `a??` is not at the end. This answer is not answering the question. DonCallisto's was correct, no idea why the answer got deleted. – Wiktor Stribiżew Dec 04 '17 at 08:55
  • I find this answer a straightforward explanation. That's totally it. – revo Dec 04 '17 at 08:57
0

Since (aba?)+ is greedy, your pattern tries to match as much as possible. And since it first matches "aba", the remaining "b" is not matched.

Try the non-greedy version (it will match the first and second "ab"'s):

$ echo "abab" | grep -Po "(aba?)+"
aba
$ echo "abab" | grep -Po "(aba??)+"
abab
Serhat Cevikel
  • 670
  • 3
  • 11
0

The correct regex for this is:

^(aba??)+$

and not (aba??)+ as discussed with @WiktorStribizew and YSelf.

Allan
  • 11,170
  • 3
  • 22
  • 43
  • Actually the OP does not specify explicitly if he wants to match the full input or just (any number of) parts from it. If the latter then usage of `^` and `$` would be incorrect. – Imre Pühvel Dec 05 '17 at 16:20