1

I was practicing regular expressions of java in the tutorial of Oracle. In order to understand greedy, reluctant, and possessive quantifiers better, I created some examples. My question is how those quantifiers work while capturing groups. I didn't understand using quantifiers in that manner, for example, reluctant quantifier looks as if it doesn't work at all. Also, I searched a lot in the Internet and only saw expressions like (.*?). Is there a reason why people usually use quantifiers with that syntax, not something like "(.foo)??"?

Here is the reluctant example:

Enter your regex: (.foo)??

Enter input string to search: xfooxxxxxxfoo

I found the text "" starting at index 0 and ending at index 0.

I found the text "" starting at index 1 and ending at index 1.

I found the text "" starting at index 2 and ending at index 2.

I found the text "" starting at index 3 and ending at index 3.

I found the text "" starting at index 4 and ending at index 4.

I found the text "" starting at index 5 and ending at index 5.

I found the text "" starting at index 6 and ending at index 6.

I found the text "" starting at index 7 and ending at index 7.

I found the text "" starting at index 8 and ending at index 8.

I found the text "" starting at index 9 and ending at index 9.

I found the text "" starting at index 10 and ending at index 10.

I found the text "" starting at index 11 and ending at index 11.

I found the text "" starting at index 12 and ending at index 12.

I found the text "" starting at index 13 and ending at index 13.

For reluctant, shouldn't it show "xfoo" for index 0 and 4 ? And here is the possessive one:

Enter your regex: (.foo)?+

Enter input string to search: afooxxxxxxfoo

I found the text "afoo" starting at index 0 and ending at index 4

I found the text "" starting at index 4 and ending at index 4.

I found the text "" starting at index 5 and ending at index 5.

I found the text "" starting at index 6 and ending at index 6.

I found the text "" starting at index 7 and ending at index 7.

I found the text "" starting at index 8 and ending at index 8.

I found the text "xfoo" starting at index 9 and ending at index 13.

I found the text "" starting at index 13 and ending at index 13.

And for possessive, shouldn't it try the input only for one time ? I'm really confused especially by this one because of trying every possibility.

Thanks in advance !

Community
  • 1
  • 1
mwb
  • 193
  • 1
  • 1
  • 4
  • 1
    Check the [SO regex reference](http://stackoverflow.com/a/22944075) under **quantifiers**, it will link you to [this question](http://stackoverflow.com/questions/5319840/greedy-vs-reluctant-vs-possessive-quantifiers) – HamZa May 05 '14 at 14:37

1 Answers1

1

The regex engine checks (basically) every character of your string one by one, starting from the left, trying to make them fit in your pattern. It returns the first match it finds.

A reluctant quantifier applied to a subpattern means that the regex engine will give priority to (as in, try first) the following subpattern.

See what happens step by step with .*?b on aabab:

aabab # we try to make '.*?' match zero '.', skipping it directly to try and 
^     # ... match b: that doesn't work (we're on a 'a'), so we reluctantly 
      # ... backtrack and match one '.' with '.*?'
aabab # again, we by default try to skip the '.' and go straight for b:
 ^    # ... again, doesn't work. We reluctantly match two '.' with '.*?'
aabab # FINALLY there's a 'b'. We can skip the '.' and move forward:
  ^   # ... the 'b' in '.*?b' matches, regex is over, 'aab' is a general match

In your pattern, there's no equivalent to the b. The (.foo) is optional, the engine gives priority to the following part of the pattern.

Which is nothing, and that matches an empty string: an overall match is found, and it's always an empty string.


Regarding the possessive quantifiers, you're confused about what they do. They have no direct incidence on the number of matches: it's not clear chat tool you use to apply your regex but it looks for global matches and that's why it doesn't stop at the first match.

See http://www.regular-expressions.info/possessive.html for more info on them.

Also, as HamZa pointed out, https://stackoverflow.com/a/22944075 is becoming a great reference for regex related questions.

Community
  • 1
  • 1
Robin
  • 8,479
  • 2
  • 30
  • 44
  • For regex: a*?b and pattern : aabab, can I say b is applied to the pattern and found none, and then "reluctantly" a* is used and match? I had null values for regex : a*? or a?? for pattern : aabab, so, can I say as middle * and ? can take 0 as value, zero is checked first? For greedy, regex a+ and pattern aabab gives aa result, so is the one of the differences between greedy and reluctant is, reluctant checks first after the "?" character ? As a+? gives predictable output because middle + takes 1 or more, this seems true according to my assumption. – mwb May 05 '14 at 15:27
  • I'd say `b` is applied to the current character, yes. "zero is checked first" would I guess be a way of saying it too. `a+?` is equivalent to `aa*?`: once one `a` is match we find the usual pattern again. – Robin May 05 '14 at 15:35
  • @mwb: You have to tweak it a little bit but you can kind of see what's happening step by step here: http://regex101.com/r/yQ5wO9 , in the debugger view (middle left). Or try to find out how java can output it itself: http://stackoverflow.com/questions/1137437/what-tools-are-there-for-debugging-stepping-through-a-regular-expression – Robin May 05 '14 at 15:43