2

I have the following code which successfully prints out all of the strings that match my regex into the console (perl myscript.pl sample_text.txt).

$filename=shift;
open text, $filename or die "error opening $filename\n";

while (my $line = <text>) {
    push @matches, $1 while $line
        =~ m{
          (( [ACGT]{6} )
          CTGA
          [ACGT]
          GAG
          ( [ACGT]{3,6} )
          [ACGT]{2,100}
          (??{ $3 =~ tr/ACGT/TGCA/r })
          ( CGAAA[ACGT] ))
        }xgi;
}

print "$_\n" for @matches;

(This is only a simplified version of my regex and my capture groups are much more complicated and do not have fixed length.)

My sample_text can be downloaded here.

I want the output to be as it is (one match per line), but I want the sub-strings that matched the first ( [ACGT]{6} ) and last ( CGAAA[ACGT] ) capture groups in my regex to have brackets around them when the entire match is printed in the console.

To give an example, when I run the entire script above on the sample file I have attached (sample_text), one of the matching results I get is:

TTTATGCTGATGAGAAAAAACATAAGAAAACGTATAATTTTTTCTAAAAAAGGAAAAAAGACCGAAATTTTAAGCTGTTTTTCGAAAA

I instead want to see such an output:

(TTTATG)CTGATGAGAAAAAACATAAGAAAACGTATAATTTTTTCTAAAAAAGGAAAAAAGACCGAAATTTTAAGCTGTTTTT(CGAAAA)

votresignu
  • 71
  • 4
  • 1
    Always add [`$!` variable](https://perldoc.perl.org/perlvar.html#$!) to an error message to see the reason, `... or die "... : $!\n";` – zdim Jan 29 '18 at 23:00
  • 1
    Lexical filehandles are better, `open my $fh, ....; while (my $line = ) { }` – zdim Jan 29 '18 at 23:01

1 Answers1

5
push @matches, "($1)$2($4)"
    while
        $line =~ m{
            ( [ACGT]{6} )
            ( CTGA
              [ACGT]
              GAG
              ( [ACGT]{3,6} )
              [ACGT]{2,100}
              (??{ $3 =~ tr/ACGT/TGCA/r })
            )
            ( CGAAA [ACGT] )
        }xgi;

With named captures:

#push @matches, sprintf "(%s)%s(%s)", @+{qw( pre main suf )}
push @matches, "($+{pre})$+{main}($+{suf})"
    while
        $line =~ m{
            (?<pre> [ACGT]{6} )
            (?<main> CTGA
              [ACGT]
              GAG
              ( [ACGT]{3,6} )
              [ACGT]{2,100}
              (??{ $^N =~ tr/ACGT/TGCA/r })
            )
            (?<suf> CGAAA [ACGT] )
        }xgi;
ikegami
  • 322,729
  • 15
  • 228
  • 466