1

In this example, I understand it can match strings with three letters or three dashes at their starts and the final three characters must be digits. But I do not understand what ?: does in this example:

re.match("(?:(?:\w{3})|(?:\-{3}))\d\d\d$", v)

Could someone please explain when we need non-capturing groups? Thanks.

Boann
  • 44,932
  • 13
  • 106
  • 138
Amelia
  • 41
  • 3
  • Check this [answer](http://stackoverflow.com/questions/36524507/notation-in-regular-expression) for explanation. –  Apr 10 '16 at 03:54

3 Answers3

2

You never absolutely need non-capturing groups, but they have a few advantages:

  • Capturing groups are numbered from left to right. You use those numbers to refer to the group in backreferences, and when extracting the text matched by the group. By marking some groups as non-capturing, they do not contribute to the numbering, which means the numbering for the groups you do care about will be simpler: 1,2,3... without any gaps; and you can later insert or remove non-capturing groups without the numbers changing for any of the capturing groups.

  • Not capturing a group makes it more efficient (depending on the particular regex API), since it does not need to store or return the string matched for that group.

  • Documentation: Marking which groups are capturing and non-capturing makes their individual purposes clearer.

In your specific example, the two inner groups are totally unnecessary, since they are not used for capturing, nor alternation, nor any other feature. It could be shortened to: (?:\w{3}|-{3})\d\d\d$

Boann
  • 44,932
  • 13
  • 106
  • 138
1

I've used non capturing groups with preg_match() in php where an optional group was needed for the pattern but didn't want it included in results, e.g:

Apr(?:il)? ([0-9]{1,2})

Would match the date in both "Apr 10" and "April 10" while only capturing the date "10". If the "il" portion were captured I'd have no easy way of knowing which group to reference in the result set.

billynoah
  • 17,021
  • 9
  • 67
  • 90
0

Non capturing group help to don't get unwanted data in capturing groups.

For instance you string look like

abc and bcd
def or cef

Here you want to capture first and third column data which is separated by and && or. so you write the regex as follows

(\w+)\s+(and|or)\s+(\w+) 

Here $1 contain first column

abc def

then $3 contain

bcd cef

and then unnecessary data stored in to the $2 which is and or. In this case you don't want to store the unnecessary data so will use non capturing group.

(\w+)\s+(?:and|or)\s+(\w+) 

Here $1 contain

abc 
def

$2 contain

bcd
def

And will get the exact data from the non capturing group.

For example

(?:don't (want))

Now the $1 contain the data want.

Then it also help to perform the | condition inside grouping. For example

(?:don't(want)|some(what))

In the above example $1 contain the data want and the $2 contain the data what.

mkHun
  • 5,507
  • 1
  • 25
  • 65