61

From the Java 6 Pattern documentation:

Special constructs (non-capturing)

(?:X)   X, as a non-capturing group

(?>X)   X, as an independent, non-capturing group

Between (?:X) and (?>X) what is the difference? What does the independent mean in this context?

erickson
  • 249,448
  • 50
  • 371
  • 469
Peter Hart
  • 4,745
  • 2
  • 20
  • 28

4 Answers4

48

It means that the grouping is atomic, and it throws away backtracking information for a matched group. So, this expression is possessive; it won't back off even if doing so is the only way for the regex as a whole to succeed. It's "independent" in the sense that it doesn't cooperate, via backtracking, with other elements of the regex to ensure a match.

erickson
  • 249,448
  • 50
  • 371
  • 469
9

I think this tutorial explains what exactly "independent, non-capturing group" or "Atomic Grouping" is

The regular expression a(bc|b)c (capturing group) matches abcc and abc. The regex a(?>bc|b)c (atomic group) matches abcc but not abc.

When applied to abc, both regexes will match a to a, bc to bc, and then c will fail to match at the end of the string. Here their paths diverge. The regex with the capturing group has remembered a backtracking position for the alternation. The group will give up its match, b then matches b and c matches c. Match found!

The regex with the atomic group, however, exited from an atomic group after bc was matched. At that point, all backtracking positions for tokens inside the group are discarded. In this example, the alternation's option to try b at the second position in the string is discarded. As a result, when c fails, the regex engine has no alternatives left to try.

kajibu
  • 124
  • 1
  • 6
6

If you have foo(?>(co)*)co, that will never match. I'm sure there are practical examples of when this would be useful, try O'Reilly's book.

Vlad
  • 17,187
  • 4
  • 39
  • 68
-2

(?>X?) equals (?:X)?+, (?>X*) equals (?:X)*+, (?>X+) equals (?:X)++.

Edit: The "syntax" above means this: (?>X?) equals (?:X)?+, (?>X*) equals (?:X)*+, (?>X+) equals (?:X)++.

Taking away the fact that X must be a non-capturing group, the preceding equivalence is:

(?>X?) equals X?+, (?>X*) equals X*+, (?>X+) equals X++.

beibichunai
  • 82
  • 10
  • The word `independent` in [the Pattern JavaDocs](http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html) is important. They aren't exactly the same, because `(?>X)` doesn't do any backtracking when a partial match fails, so some things that match using one will not match using the other. The [article @erickson linked to was helpful for me.](http://www.regular-expressions.info/atomic.html) – xdhmoore Aug 31 '17 at 17:26
  • Sorry, I'm not into this currently, so maybe my answer is not accurate. But from your own reference: "Most of these also support possessive quantifiers, which are essentially a notational convenience for atomic grouping." This is what I was trying to express. In the latter case, the additional '+' character means the possesive qualifiers. – beibichunai Dec 06 '17 at 21:18
  • `[?/*/+]` is the same as `[?+*/]` and is a [character class](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#classes) matching one of the 4 characters (`?`, `+`, `*`, `/`), and the `+` after the `]` is a [greedy quantifier](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#greedy) that makes the character class match *one or more times*. There is no [possessive quantifier](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#poss) anywhere in your regex. – Andreas May 15 '20 at 19:52
  • (?>X?) equals (?:X)?+, (?>X*) equals (?:X)*+, (?>X+) equals (?:X)++. – beibichunai May 28 '20 at 11:08