7

In Javascript, I have a string abcdef and cannot figure out this strange behavior:

  • (?=abc)def doesnt match the string
  • abc(?=def) does match the string

Why?

James Green
  • 1,523
  • 10
  • 18
Zaffy
  • 14,842
  • 8
  • 42
  • 70
  • Maybe you want `(?:abc)def` instead, which is using a non-capturing group instead of a positive lookahead. – DaoWen Jun 10 '13 at 01:31
  • @DaoWen No, I have already tried that ... try `"abcdef".replace(/(?:abc)def/, "")` it replaces whole string – Zaffy Jun 10 '13 at 01:34

4 Answers4

19

In (?=abc)def the (?=abc) capture is zero width, and doesn't move the cursor forward in the input string following a successful match. That construct is simply saying look ahead at the next three characters to see if they are abc, if they are then check to see if those same characters are def. At this point the match fails..

You need to understand how the regex engine works to complete your match. Consider your input string abcdef and your regex abc(?=def). The engine starts by matching the a then moves the cursor inside the input string over to the next character and attempts to match the b because the cursor in the input string is on b the match succeeds. Then the engine moves the cursor inside the input string over and attempts to match the c and because the cursor is in the input string is on a c the match succeeds and the cursor in the input string is again moved to the next character. Now the engine encounters the (?=def) at this point the engine just looks ahead to see if the next three characters from where the cursor is in the input sting are in fact def without moving the cursor, which they are and the match completes successfully.

Now consider the input string xyz and a regex x(?=y)Z. The regex engine put the cursor on the first letter in the input string and checks to see it it is an x and finds that an x so it moves the cursor to the next character in the input string. Now it looks ahead to see if the next character is a y, which it is, but the engine doesn't move the input text cursor foreword so the cursor in the input text stays on the y. Next the engine looks to see if the cursor is on the letter z, but because the cursor in the input text is still on the letter y the match fails.

You can read a lot more about both positive and negative lookaheads at http://www.regular-expressions.info/lookaround.html

Ro Yo Mi
  • 13,586
  • 4
  • 31
  • 40
  • Still havent got it. If it must be "after" something try to add `^` at the beginning, result will be the same. – Zaffy Jun 10 '13 at 01:36
  • 1
    Updated the explanation to cover how the regex engine handles the match successful match and to show what would happen if the lookahead where inserted into the middle of a example expression. The key is in keeping track of where the cursor is inside the input text. – Ro Yo Mi Jun 10 '13 at 03:17
  • 1
    awesome explanation! – user1993 Jun 28 '17 at 11:55
4

(?=...) is a lookahead, in other words that tests the string on its right. Note too that a lookahead is a zero-width assertion that don't eat character. In your first example: (?=abc) that means must be followed by abc encounters def. This is the reason why the pattern fails.

In you second example it finds def after abc, then the string is matched

Casimir et Hippolyte
  • 83,228
  • 5
  • 85
  • 113
2

MDN definition of lookaheads in javascript

x(?=y)
Matches 'x' only if 'x' is followed by 'y'. This is called a lookahead.

For example, /Jack(?=Sprat)/ matches 'Jack' only if it is followed by 'Sprat'. /Jack(?=Sprat|Frost)/ matches 'Jack' only if it is followed by 'Sprat' or 'Frost'. However, neither 'Sprat' nor 'Frost' is part of the match results.

So (?=y) is preceded by another statement, in this case an empty string, then it will match only if the first statement is followed by the second one. Without the leading statement the expression (?="abc") will match on the first 3 characters abc without capturing them, and then check again to see if those characters are def, which will fail.

Community
  • 1
  • 1
Ben McCormick
  • 23,365
  • 11
  • 48
  • 70
2

Based off your response to my comment, I think what you want is a positive look-behind:

(?<=abc)def

Edit:

Since you're using JavaScript (sorry, I only read your question—I didn't look at the tags), why not just use a regular capture group and include the match in the replace-pattern?

"abcdef".replace(/(abc)def/, "$1")
DaoWen
  • 31,184
  • 6
  • 65
  • 95
  • Sure, this would be *classic* but in javascript there are no lookbehinds. – Zaffy Jun 10 '13 at 01:39
  • Because when replacing with `/g` it is skipping the replaced text. For example `"a1a2a".replace(/(a)(\d)(a)/g, "$1b$2b$3");` will be `ab1ba2a` not `ab1bab2ba` – Zaffy Jun 10 '13 at 01:56
  • @Zaffy - Only replace the look-*behind* with capturing groups, not look-*ahead*, e.g.: `"a1a2a".replace(/(a)(\d)(?=a)/g, "$1b$2b") // => "ab1bab2ba"` – DaoWen Jun 10 '13 at 02:15