0

I have the following regex :

.*(?:(?:(?<!a)cc|string).*number).*

And I am trying to understand what the ? in the beginning of the string between brackets mean. I know the a? means that the previous character 'a' can be repeated zero or one time. But what does it mean when it appears in the beginning of a string ?

akari
  • 575
  • 1
  • 10
  • 31

2 Answers2

5

The answer requires a little history lesson. When Larry Wall wanted to add new features to regexes in Perl, he couldn't just change the meaning of existing metacharacters, or assign special meanings to characters that didn't have them. That would have broken a lot of regexes that had been working. Instead, he had to look for character sequences that would never appear in a regex.

There was only the one kind of group originally: what we now call capturing groups. The opening parenthesis was a metacharacter, so it would make no sense to follow it with a quantifier. You could match a literal open-paren zero or one time with \(?, or you could match (and capture) a literal question mark with (\?), but if you tried to use (? in regex it would throw an exception.

Larry changed the rule so (? could appear in a regex, but it must form the beginning of a special-group construct, which requires at least one more character. So, to answer your question, the string doesn't start with ?. The sequence (?: forms a single token, representing the beginning of a non-capturing group. We also have (?= and (?! for positive and negative lookaheads, (?<= and (?<! for lookbehinds, and so on.

Alan Moore
  • 68,531
  • 11
  • 88
  • 149
2

(?:) is a non-capturing group. It do a matching operation only. It won't capture anything.

(?<!) is a Negative lookbehind.

Avinash Raj
  • 160,498
  • 22
  • 182
  • 229