5

I have a Perl regex. But I'm not sure what "?" means in this context.

m#(?:\w+)#

What does ? mean here?

Sinan Ünür
  • 113,391
  • 15
  • 187
  • 326
Nikita
  • 767
  • 1
  • 9
  • 22
  • 5
    Beginning with the most obvious: perlre (http://perldoc.perl.org/perlre.html). – musiKk Oct 08 '10 at 13:11
  • @msw and one of them is [this page right here](http://stackoverflow.com/questions/3890739/what-does-mean-in-this-perl-regex?rq=1). – rightfold Nov 14 '13 at 23:06
  • Possible duplicate of [Reference - What does this regex mean?](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – tripleee May 17 '19 at 03:42

5 Answers5

25

In this case, the ? is actually being used in connection with the :. Put together, ?: at the beginning of a grouping means to group but not capture the text/pattern within the parentheses (as in, it will not be stored in any backreferences like \1 or $1, so you will not be able to access the grouped text directly).

More specifically, a ? has three distinct meanings in regex:

  1. The ? quantifier signifies "zero or one repetitions" of an expression. One of the canonical examples I've seen is s?he which will match both she and he since the ? makes the s "optional"

  2. When a quantifier (+, *, ?, or the general {n,m}) is followed by a ? then the match is non-greedy (i.e. it will match the shortest string starting from that position that allows the match to proceed)

  3. A ? at the beginning of a parenthesized group signifies that you want to perform a special action. As in this case, : means to group but not capture. The exact list of actions available will vary somewhat from one regex engine to another, but here's a list (not necessarily all-inclusive) of some of them:

    A. Non-capturing group: (?:text)
    B. Lookaround: (?=a) for a lookahead, ?! for negative lookahead, or ?<= and ?<! for lookbehinds (positive and negative, respectively).
    C. Conditional Matches: (?(condition)then|else).
    D. Atomic Grouping: a(?>bc|b)c (matches abcc but not abc; see the link)
    E. Inline enabling/disabling of regex matching modifiers: ?i to enable a mode, ?-i to disable. You can also enable/disable more than one modifier at a time by simply concatenating them, such as ?im (i is case insensitive and m is multiline).
    F. Named capture groups: (?P<name>pattern), which can later be referenced using (?P=name). The .NET regex engine uses the syntax (?<name>pattern) instead.
    G. Comments: (?#Comment text). I personally think this just adds clutter, but I guess it could serve some use...free-spacing mode might be a better option (the (?x) modifier).

So essentially, the purpose of the ? is just contextual. If you wanted zero or more repetitions of a literal ( character you'd have to use \(? to escape the paren.

eldarerathis
  • 32,541
  • 9
  • 86
  • 93
  • For point #3, there's also `(?>...)`, which is an [atomic group](http://www.regular-expressions.info/atomic.html) in flavours that support it, and `(?i)` and `(?-i)` for inline enabling/disabling of [modifiers](http://www.regular-expressions.info/modifiers.html). – Daniel Vandersluis Oct 08 '10 at 14:51
  • @Daniel: Thanks. I think I'm going to clean up #3 and add a list with some links, so that then other people can continue to add to it as well. – eldarerathis Oct 08 '10 at 15:01
  • Just for clarity, `(?im)` enables two modes (case insensitive and multiline) ;) – Daniel Vandersluis Oct 08 '10 at 16:17
  • @Daniel Vandersluis: Right, edited to make that clearer in the answer. I could see how that was not evident in my original phrasing. I think this is a bit better :) – eldarerathis Oct 08 '10 at 16:26
7

$ perldoc perlreref:

(?:...) Groups subexpressions without capturing (cluster)

You can also use YAPE::Regex::Explain:

C:\\Temp> perl -MYAPE::Regex::Explain -e \ 
"print YAPE::Regex::Explain->new(qr#(?:\w+)#)->explain"

The regular expression:

(?-imsx:(?:\w+))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
Sinan Ünür
  • 113,391
  • 15
  • 187
  • 326
2

Those are non-capturing parentheses. They're used for grouping (just like normal parentheses) but the group won't be added to the capture array (i.e. it won't be referenceable with \n).

See here: http://www.regular-expressions.info/refadv.html

Alin Purcaru
  • 40,402
  • 12
  • 70
  • 88
2

In short, the sequence (? starts a regular expression special feature. The things that follow the (? specify the special feature, in this case, a non-capturing grouping. We cover this in both Intermediate Perl and Effective Perl Programming. The perlre documents Perl regular expressions.

brian d foy
  • 121,466
  • 31
  • 192
  • 551
1

See the regex tutorial that is installed with every version of Perl (in particular, this section).

Dave Cross
  • 62,464
  • 3
  • 46
  • 83