4

In the following why does the condition evaluate to false?

$_ = "aa11bb";  
if(/(.)\111/){  
    print "It matched!\n";  
}  

Does \11 or \111 have special meaning that Perl can not "see" \1?

Jim
  • 17,102
  • 31
  • 115
  • 227

2 Answers2

6

Actually Perl is interpreting \111 as an octal, which is not found in your string. It would only consider two or more digits backreferences if such number of groups is found. To avoid the ambiguity, use \g or \g{}. Quoting the docs (perlre - Capture Groups):

The \g and \k notations were introduced in Perl 5.10.0. Prior to that there were no named nor relative numbered capture groups. Absolute numbered groups were referred to using \1 , \2 , etc., and this notation is still accepted (and likely always will be). But it leads to some ambiguities if there are more than 9 capture groups, as \10 could mean either the tenth capture group, or the character whose ordinal in octal is 010 (a backspace in ASCII). Perl resolves this ambiguity by interpreting \10 as a backreference only if at least 10 left parentheses have opened before it. Likewise \11 is a backreference only if at least 11 left parentheses have opened before it. And so on. \1 through \9 are always interpreted as backreferences. There are several examples below that illustrate these perils. You can avoid the ambiguity by always using \g{} or \g if you mean capturing groups; and for octal constants always using \o{} , or for \077 and below, using 3 digits padded with leading zeros, since a leading zero implies an octal constant.

sidyll
  • 51,853
  • 11
  • 92
  • 142
  • I'm reading the docs but it's not completely clear for me. If `\111` is interpreted as octal, then `\11` should work and it doesn't. And also `\001` should work because it's the octal number of first backreference and it neither matches. What did I miss? – Birei Aug 05 '13 at 20:30
  • 1
    @Birei It's an octal *character* reference to character `0111` = character 73 = `"I"`. So your pattern matches any character followed by the letter I. – hobbs Aug 05 '13 at 20:32
  • @hobbs: Thank you. I've already understood that point, but what about the `\11`? – Birei Aug 05 '13 at 20:45
  • 1
    @Birei `\11` will look for character `011` octal = character 9 = tab (unless your pattern has at least eleven left-parentheses in it). Only `\1` through `\9` are automatically backreferences in regexes. – hobbs Aug 05 '13 at 21:25
  • 1
    @hobbs: I thought that the one digit backreference had priority over the octal interpretation, unless a leading zero. All clear now, thank you. – Birei Aug 05 '13 at 21:36
5

It's treating the \111 as a single item, because there's nothing separating the \1 from the 11. If you use the /x modifier to allow spacing you can remove the ambiguity:

if(/(.)\1 11/x) { ...
AKHolland
  • 4,256
  • 21
  • 34
  • 1
    or `if(/(.)\g{1}11/)` – hmatt1 Aug 05 '13 at 22:53
  • 1
    @Matt that will only work in perls 5.10 or better. A lot of us are stuck in 5.8.8. – AKHolland Aug 06 '13 at 13:37
  • Yeah you're right good call. We have it here for reference now :) I'm pretty sure Perl 5.8.8 is default on Red Hat 5 which is supported until 2017 (or something) so we may have many programmers stuck there for a while. – hmatt1 Aug 13 '13 at 00:42