0

I have following Perl script to extract numbers from a log. It seems that the non-capturing group with ?: isn't working when I define the sub-pattern in a variable. It's only working when I leave out the grouping in either the regex-pattern or the sub-pattern in $number.

#!/usr/bin/perl
use strict;
use warnings;

my $number = '(:?-?(?:(?:\d+\.?\d*)|(?:\.\d+))(?:[Ee][+-]?\d+)?)';
#my $number = '-?(?:(?:\d+\.?\d*)|(?:\.\d+))(?:[Ee][+-]?\d+)?';

open(FILE,"file.dat") or die "Exiting with: $!\n";
while (my $line = <FILE>) {
        if ($line =~ m{x = ($number). y = ($number)}){
        print "\$1= $1\n";
        print "\$2= $2\n";
        print "\$3= $3\n";
        print "\$4= $4\n";
    };
}
close(FILE);

The output for this code looks like:

$1= 12.15
$2= 12.15
$3= 3e-5
$4= 3e-5

for an input of:

asdf x = 12.15. y = 3e-5 yadda

Those doubled outputs aren't desired.

Is this because of the m{} style in contrast to the regular m// patterns for regex? I only know the former style to get variables (sub-strings) in my regex expressions. I just noticed this for the backreferencing so possibly there are other differences for metacharacters?

EverythingRightPlace
  • 1,189
  • 11
  • 33
  • 1
    `It seems that the non-capturing group with :? isn't working` That's not a non-capturing group, its a regular parenthesis. – TLP Sep 03 '13 at 09:13
  • You should clarify your question. What is this subpattern you can't reference? Your code works like you say, is there a problem? – TLP Sep 03 '13 at 09:25
  • Yes the code works like I said but I didn't want those doubled groups. Sorry for the unclear question. – EverythingRightPlace Sep 03 '13 at 10:00

2 Answers2

2

The delimiters you use for the regular expression aren't causing any problems but the following is:

(:?-?(?:(?:\d+\.?\d*)|(?:\.\d+))(?:[Ee][+-]?\d+)?)
 ^^
Notice this isn't a capturing group, it is an optional colon :

Probably a typo mistake but it is causing the trouble.

Edit: It looks that it is not a typo mistake, i substituted the variables in the regex and I got this:

x = ((:?-?(?:(?:\d+\.?\d*)|(?:\.\d+))(?:[Ee][+-]?\d+)?)). y = ((:?-?(?:(?:\d+\.?\d*)|(?:\.\d+))(?:[Ee][+-]?\d+)?))
    ^^           first and second group               ^^      ^^    third and fourth grouop                     ^^

As you can see the first and second capturing group are capturing exactly the same thing, the same is happening for the third and fourth capturing group.

Ibrahim Najjar
  • 18,190
  • 4
  • 65
  • 91
  • I was thinking typo too, but then it would be extremely odd to both mention it as `:?`, while using a ton of `?:` groups in the regex. – TLP Sep 03 '13 at 09:18
  • Awaiting clarification from the OP. I don't even really see what he is asking about. – TLP Sep 03 '13 at 09:23
  • Thx. Really stupid typo by me... I assumed there are differences through saving a sub-regex-string into a variable and using it in a regex. Seems this isn't the case. – EverythingRightPlace Sep 03 '13 at 10:02
0

You're going to kick yourself...

Your regexp reads out as:

capture {
 maybe-colon
 maybe-minus
 cluster {     (?:(?:\d+\.?\d*)|(?:\.\d+))
  cluster {    (?:\d+\.?\d*)
   1+ digits
   maybe-dot
   0+ digits
  }
  -or-
  cluster {    (?:\.\d+)
   dot
   1+digits
  }
 }
 maybe cluster {
   E or e
   maybe + or -
   1+ digets
 }             (?:[Ee][+-]?\d+)?
}

... which is what you're looking for.

However, when you then do your actual regexp, you do:

$line =~ m{x = $number. y = $number})

(the curly braces are a distraction.... you may use any \W if the m or s has been specified)

What this is asking is to capture whatever the regexp defined in $number is.... which is, itself, a capture.... hence $1 and $2 being the same thing.

Simply remove the capture braces from either $number or the regexp line.

CodeGorilla
  • 597
  • 2
  • 13
  • Like I wrote in the question with `It's only working when I leave out the grouping in either the regex-pattern or the sub-pattern in $number.` The problem is that I didn't understand this behaviour because I only wanted to match every entry once. And the only issue was the typo of `:?`, for which I am kicking myself indeed... – EverythingRightPlace Sep 03 '13 at 13:22
  • Yes.... because `$number` contains a capture, then when you do '($number)' you do a **second** capture - meaning '$1' and '$2' - you then repeat the '($number)' later in the regexp, giving you '$3' & '$4'. – CodeGorilla Sep 03 '13 at 13:30