In Javascript, I have a string abcdef
and cannot figure out this strange behavior:
(?=abc)def
doesnt match the stringabc(?=def)
does match the string
Why?
In Javascript, I have a string abcdef
and cannot figure out this strange behavior:
(?=abc)def
doesnt match the stringabc(?=def)
does match the stringWhy?
In (?=abc)def
the (?=abc)
capture is zero width, and doesn't move the cursor forward in the input string following a successful match. That construct is simply saying look ahead at the next three characters to see if they are abc
, if they are then check to see if those same characters are def
. At this point the match fails..
You need to understand how the regex engine works to complete your match. Consider your input string abcdef
and your regex abc(?=def)
. The engine starts by matching the a
then moves the cursor inside the input string over to the next character and attempts to match the b
because the cursor in the input string is on b
the match succeeds. Then the engine moves the cursor inside the input string over and attempts to match the c
and because the cursor is in the input string is on a c
the match succeeds and the cursor in the input string is again moved to the next character. Now the engine encounters the (?=def)
at this point the engine just looks ahead to see if the next three characters from where the cursor is in the input sting are in fact def
without moving the cursor, which they are and the match completes successfully.
Now consider the input string xyz
and a regex x(?=y)Z
. The regex engine put the cursor on the first letter in the input string and checks to see it it is an x
and finds that an x
so it moves the cursor to the next character in the input string. Now it looks ahead to see if the next character is a y
, which it is, but the engine doesn't move the input text cursor foreword so the cursor in the input text stays on the y
. Next the engine looks to see if the cursor is on the letter z
, but because the cursor in the input text is still on the letter y
the match fails.
You can read a lot more about both positive and negative lookaheads at http://www.regular-expressions.info/lookaround.html
(?=...)
is a lookahead, in other words that tests the string on its right. Note too that a lookahead is a zero-width assertion that don't eat character. In your first example: (?=abc)
that means must be followed by abc
encounters def
. This is the reason why the pattern fails.
In you second example it finds def
after abc
, then the string is matched
MDN definition of lookaheads in javascript
x(?=y)
Matches 'x' only if 'x' is followed by 'y'. This is called a lookahead.For example,
/Jack(?=Sprat)/
matches 'Jack' only if it is followed by 'Sprat'./Jack(?=Sprat|Frost)/
matches 'Jack' only if it is followed by 'Sprat' or 'Frost'. However, neither 'Sprat' nor 'Frost' is part of the match results.
So (?=y)
is preceded by another statement, in this case an empty string, then it will match only if the first statement is followed by the second one. Without the leading statement the expression (?="abc")
will match on the first 3 characters abc without capturing them, and then check again to see if those characters are def, which will fail.
Based off your response to my comment, I think what you want is a positive look-behind:
(?<=abc)def
Edit:
Since you're using JavaScript (sorry, I only read your question—I didn't look at the tags), why not just use a regular capture group and include the match in the replace-pattern?
"abcdef".replace(/(abc)def/, "$1")