131

I'm using rubular.com to build my regex, and their documentation describes the following:

(...)   Capture everything enclosed
(a|b)   a or b

How can I use an OR expression without capturing what's in it? For example, say I want to capture either "ac" or "bc". I can't use the regex

(a|b)(c)

right? Since then I capture either "a" or "b" in one group and "c" in another, not the same. I know I can filter through the captured results, but that seems like more work...

Am I missing something obvious? I'm using this in Java, if that is pertinent.

gblomqvist
  • 102
  • 1
  • 1
  • 11
goggin13
  • 6,996
  • 6
  • 26
  • 43

4 Answers4

212

Depending on the regular expression implementation you can use so called non-capturing groups with the syntax (?:…):

((?:a|b)c)

Here (?:a|b) is a group but you cannot reference its match. So you can only reference the match of ((?:a|b)c) that is either ac or bc.

Gumbo
  • 594,236
  • 102
  • 740
  • 814
  • that did it! Thanks for the super fast response. I will accept after the time limit (which I didn't know existed) expires. – goggin13 Jul 31 '10 at 15:49
  • 4
    I thought the idea was not to capture the `a` or `b` at all. In other words, to *match* `ac` or `bc`, but only *capture* the `c`: `(?:a|b)(c)` – Alan Moore Jul 31 '10 at 21:16
  • 1
    @AlanMoore Is it possible to capture one and not the other in the or statement? So I'm looking for the pattern `ac` or `ab`, but I want to output `ab` if `ab` and only 'c' is output is 'ac'. – Moondra Aug 03 '17 at 21:12
27

If your implementation has it, then you can use non-capturing parentheses:

(?:a|b)
Marc Mutz - mmutz
  • 22,883
  • 10
  • 72
  • 86
  • @mmutz Thanks for the fast response! I wish I could accept both answers, that was just what I was looking for – goggin13 Jul 31 '10 at 15:50
4

If your OR alternatives are all single characters - you can just use "character set" operator:

([ab]c)

it will only match ac or bc and it's more readable.

yrtimiD
  • 435
  • 6
  • 7
3

Even rubular doesn't make you use parentheses and the precedence of | is low. For example a|bc does not match ccc

msw
  • 40,500
  • 8
  • 77
  • 106
  • what does the '!~' operator do? I like your expression, with fewer parens, regex is messy enough already – goggin13 Jul 31 '10 at 16:09
  • !~ is a perlism for "does not match", it was sloppy writing on my part; fixed, thanks. – msw Jul 31 '10 at 16:15
  • 3
    I don't get you. The low precedence of `|` is why you *do* have to use parens. `(?:a|b)c` matches `ac` or `bc` (the desired behavior), while `a|bc` matches `a` or `bc`. – Alan Moore Jul 31 '10 at 21:29