I'm searching for patterns in a String starting with ATG
, ending with TAG, TAA or TGA
and length = multiple of 3. ATG
and TAG, TAA or TGA
can only appear at respectively beginning or end. Which means:
From ATGTTGTGATGT
extract ATGTTGTGA
From ATGATGTTGTGATGT
extract ATGTTGTGA
Currently I'm using regex (ATG)([ATG]{3})+?(TAG|TAA|TGA)
.
For ATGATGTTGTGATGT
this gets me the wrong result ATGATGTTGTGA
.
I've tried:
(^ATG)(!?=.*ATG)([ATG]{3})+?(TAG|TAA|TGA)
(^ATG)(!?=(ATG)+)([ATG]{3})+?(TAG|TAA|TGA)
How to tell it to contain ATG
only once in the beginning and no more after that?