3

I have a regex that contains a character class followed by TWO "cadinality" characters - not sure what else they are called. If it matters the regex engine it's running on is the built in java regex. The java string literal is:

"[a-zA-Z]{2}[ -]?+\\d{6}"

Or in non java land:

[a-zA-Z]{2}[ -]?+\d{6}

So specifically what does the [ -]?+ part mean? From testing as far as i can tell it's like the + isn't even there (originally I thought due to 'order of operations' that i wasn't aware of perhaps it would get applied to everything in front of it like there were parenthesis there).

The following pass:

ab123456, ab-123456, ab 123456

The following fail:

aa--123456, aa  123456, aa - -123456, aa-aa-123456
Russ
  • 1,866
  • 3
  • 18
  • 28

2 Answers2

3

?+ indicates a possessive quantifier. Have a look at this SO answer for more info Greedy vs. Reluctant vs. Possessive Quantifiers

[a-zA-Z]    # Match a single character present in the list below
               # A character in the range between “a” and “z”
               # A character in the range between “A” and “Z”
   {2}         # Exactly 2 times
[ -]        # Match a single character present in the list below
               # The character “ ”
               # The character “-”
   ?+          # Between zero and one times, as many times as possible, 
               # without giving back (possessive)
\d          # Match a single digit 0..9
   {6}         # Exactly 6 times
Community
  • 1
  • 1
Narendra Yadala
  • 9,110
  • 1
  • 24
  • 43
3

It's a possessive quantifier, essentially meaning that the ? will not give up its match for backtracking. It does not affect the number of repetitions that the ? will match - it will still be zero or one (greedy). The [ -] preceding it is simply a character class containing the space and hyphen characters.

Other quantifier operators can also be made possessive by adding a + (i.e. *+ or ++ would be possessive versions of * and +, respectively).

eldarerathis
  • 32,541
  • 9
  • 86
  • 93
  • If I'm understanding this correctly there wouldn't be much performance increase here because we're only matching against 1 character maximum anyway right? – Russ Nov 04 '11 at 14:38
  • @Russ: In this case I'd say you're correct, you don't gain much (if anything, really). That expression is already pretty restrictive. – eldarerathis Nov 04 '11 at 14:43