I have the following code which successfully prints out all of the strings that match my regex into the console (perl myscript.pl sample_text.txt
).
$filename=shift;
open text, $filename or die "error opening $filename\n";
while (my $line = <text>) {
push @matches, $1 while $line
=~ m{
(( [ACGT]{6} )
CTGA
[ACGT]
GAG
( [ACGT]{3,6} )
[ACGT]{2,100}
(??{ $3 =~ tr/ACGT/TGCA/r })
( CGAAA[ACGT] ))
}xgi;
}
print "$_\n" for @matches;
(This is only a simplified version of my regex and my capture groups are much more complicated and do not have fixed length.)
My sample_text can be downloaded here.
I want the output to be as it is (one match per line), but I want the sub-strings that matched the first ( [ACGT]{6} )
and last ( CGAAA[ACGT] )
capture groups in my regex to have brackets around them when the entire match is printed in the console.
To give an example, when I run the entire script above on the sample file I have attached (sample_text
), one of the matching results I get is:
TTTATGCTGATGAGAAAAAACATAAGAAAACGTATAATTTTTTCTAAAAAAGGAAAAAAGACCGAAATTTTAAGCTGTTTTTCGAAAA
I instead want to see such an output:
(TTTATG)CTGATGAGAAAAAACATAAGAAAACGTATAATTTTTTCTAAAAAAGGAAAAAAGACCGAAATTTTAAGCTGTTTTT(CGAAAA)