Do recursive regexes understand named captures? There is a note in the docs for (?{{ code }})
that it's an independent subpattern with its own set of captures that are discarded when the subpattern is done, and there's a note in (?PARNO)
that its "similar to (?{{ code }})
. Is (?PARNO)
discarding its own named captures when it's done?
I'm writing about Perl's recursive regular expressions for Mastering Perl. perlre already has an example with balanced parens (I show it in Matching balanced parenthesis in Perl regex), so I thought I'd try balanced quote marks:
#!/usr/bin/perl
# quotes-nested.pl
use v5.10;
$_ =<<'HERE';
He said 'Amelia said "I am a camel"'
HERE
say "Matched!" if m/
(
['"]
(
(?:
[^'"]+
|
( (?1) )
)*
)
['"]
)
/xg;
print "
1 => $1
2 => $2
3 => $3
4 => $4
5 => $5
";
This works and the two quotes show up in $1
and $3
:
Matched!
1 => 'Amelia said "I am a camel"'
2 => Amelia said "I am a camel"
3 => "I am a camel"
4 =>
5 =>
That's fine. I understand that. However, I don't want to know the numbers. So, I make the first capture group a named capture and look in %-
expecting to see the two substrings I previously saw in $1
and $2
:
use v5.10;
$_ =<<'HERE';
He said 'Amelia said "I am a camel"'
HERE
say "Matched [$+{said}]!" if m/
(?<said>
['"]
(
(?:
[^'"]+
|
(?1)
)*
)
['"]
)
/xg;
use Data::Dumper;
print Dumper( \%- );
I only see the first:
Matched ['Amelia said "I am a camel"']!
$VAR1 = {
'said' => [
'\'Amelia said "I am a camel"\''
]
};
I expected that (?1)
would repeat everything in the first capture group, including the named capture to said
. I can fix that a bit by naming a new capture:
use v5.10;
$_ =<<'HERE';
He said 'Amelia said "I am a camel"'
HERE
say "Matched [$+{said}]!" if m/
(?<said>
['"]
(
(?:
[^'"]+
|
(?<said> (?1) )
)*
)
['"]
)
/xg;
use Data::Dumper;
print Dumper( \%- );
Now I get what I expected:
Matched ['Amelia said "I am a camel"']!
$VAR1 = {
'said' => [
'\'Amelia said "I am a camel"\'',
'"I am a camel"'
]
};
I thought that I could fix this by moving the named capture up one level:
use v5.10;
$_ =<<'HERE';
He said 'Amelia said "I am a camel"'
HERE
say "Matched [$+{said}]!" if m/
(
(?<said>
['"]
(
(?:
[^'"]+
|
(?1)
)*
)
['"]
)
)
/xg;
use Data::Dumper;
print Dumper( \%- );
But, this doesn't catch the smaller substring in said
either:
Matched ['Amelia said "I am a camel"']!
$VAR1 = {
'said' => [
'\'Amelia said "I am a camel"\''
]
};
I think I understand this, but I also know that there are people here who actually touch the C code that makes it happen. :)
And, as I write this, I think I should overload the STORE tie for %-
to find out, but then I'd have to find out how to do that.