5

Consider:

var re = /(?<=foo)bar/gi;

It is an invalid regular expression in Plunker. Why?

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Jackie
  • 117
  • 1
  • 6

2 Answers2

19

2020 update: Javascript implementations are beginning to natively support regular expression lookbehinds. A draft proposal for RegExp Lookbehind Assertions, accepted to the ECMA-262 draft specs for ECMAScript 2021, was implemented in V8's Irregexp in Chrome 62+ (released 2017-10-17) and that has been picked up via a shim layer for Irregexp in Firefox 78+ (ESR, released 2020-06-30). Other JS interpreters will follow.

See more detailed support listings here.


Legacy workaround to implement lookbehinds

JavaScript lacks support for regular expression lookbehinds like (?<=…) (positive) and (?<!…) (negative), but that doesn't mean you can't still implement this sort of logic in JavaScript.

Matching (not global)

Positive lookbehind match:

// from /(?<=foo)bar/i
var matcher = mystring.match( /foo(bar)/i );
if (matcher) {
  // do stuff with matcher[1] which is the part that matches "bar"
}

Fixed width negative lookbehind match:

// from /(?<!foo)bar/i
var matcher = mystring.match( /(?!foo)(?:^.{0,2}|.{3})(bar)/i );
if (matcher) {
  // do stuff with matcher[1] ("bar"), which does not follow "foo"
}

Negative lookbehinds can be done without the global flag, but only with a fixed width, and you have to calculate that width (which can get difficult with alternations). Using (?!foo).{3}(bar) would be simpler and roughly equivalent, but it won't match a line starting with "rebar" since . can't match newlines, so we need the above code's alternation to match lines featuring "bar" before character four.

If you need it with a variable width, use the below global solution and put a break at the end of the if stanza. (This limitation is quite common. .NET, vim, and JGsoft are the only regex engines that support variable width lookbehind. PCRE, PHP, and Perl are limited to fixed width. Python requires an alternate regex module to support this. That said, the logic to the workaround below should work for all languages that support regex.)

Matching (global)

When you need to loop on each match in a given string (the g modifier, global matching), you have to redefine the matcher variable in each loop iteration and you must use RegExp.exec() (with the RegExp created before the loop) because String.match() interprets the global modifier differently and will create an infinite loop!

Global positive lookbehind:

var re = /foo(bar)/gi;  // from /(?<=foo)bar/gi
while ( matcher = re.exec(mystring) ) {
  // do stuff with matcher[1] which is the part that matches "bar"
}

"Stuff" may of course include populating an array for further use.

Global Negative lookbehind:

var re = /(foo)?bar/gi;  // from /(?<!foo)bar/gi
while ( matcher = re.exec(mystring) ) {
  if (!matcher[1]) {
    // do stuff with matcher[0] ("bar"), which does not follow "foo"
  }
}

Note that there are cases in which this will not fully represent the negative lookbehind. Consider /(?<!ba)ll/g matching against Fall ball bill balll llama. It will find only three of the desired four matches because when it parses balll, it finds ball and then continues one character late at l llama. This only occurs when a partial match at the end could interfere with a partial match at a different end (balll breaks (ba)?ll but foobarbar is fine with (foo)?bar) The only solution to this is to use the above fixed width method.

Replacing

Mimicking Lookbehind in JavaScript is a great article that describes how to do this.
It even has a follow-up that points to a collection of short functions that implement this in JS.

Implementing lookbehind in String.replace() is much easier since you can create an anonymous function as the replacement and handle the lookbehind logic in that function.

These work on the first match but can be made global by merely adding the g modifier.

Positive lookbehind replacement:

// assuming you wanted mystring.replace(/(?<=foo)bar/i, "baz"):
mystring = mystring.replace( /(foo)?bar/i,
  function ($0, $1) { return ($1 ? $1 + "baz" : $0) }
);

This takes the target string and replaces instances of bar with baz so long as they follow foo. If they do, $1 is matched and the ternary operator (?:) returns the matched text and the replacement text (but not the bar part). Otherwise, the ternary operator returns the original text.

Negative lookbehind replacement:

// assuming you wanted mystring.replace(/(?<!foo)bar/i, "baz"):
mystring = mystring.replace( /(foo)?bar/i,
  function ($0, $1) { return ($1 ? $0 : "baz") }
);

This is essentially the same, but since it's a negative lookbehind, it acts when $1 is missing (we don't need to say $1 + "baz" here because we know $1 is empty).

This has the same caveat as the other dynamic-width negative lookbehind workaround and is similarly fixed by using the fixed width method.

Adam Katz
  • 10,689
  • 2
  • 49
  • 68
  • The original question had refinements listed elsewhere (comments to the question and another answer). I also answered that refined version [here](https://stackoverflow.com/revisions/35143111/12) (scroll down to "Your specific use case") in an older version of this answer but have since removed it to make the answer more succinct and more applicable to the actual question. – Adam Katz May 11 '16 at 21:35
  • About the note in **Global matching lookahead**: To avoid the *interference* problem, with `(? – Casimir et Hippolyte Jul 01 '19 at 14:23
  • About **Fixed width negative lookbehind match**: the pattern `(?!foo)(?:^.{0,2}|.{3})(bar)` is better written like that: `(?:^.{0,2}|(?!foo).{3})(bar)` (in addition, imagine that instead of *foo/bar* you have to deal with *foo/oar* in a string starting with `fooar`). – Casimir et Hippolyte Jul 01 '19 at 14:35
  • *"PCRE, PHP, and Perl are limited to fixed width"* : PHP uses the PCRE regex engine. Also note that alternations of fixed width subpatterns are possible: `(?<=ab|abc|abcd)` – Casimir et Hippolyte Oct 30 '19 at 21:29
  • @CasimiretHippolyte – Yes, hover over the PHP link and you'll see I've mentioned that it uses libpcre. You can indeed use alternations in fixed-width lookbehinds so long as the alternations all have the same width, but differing widths in alternations won't work in all engines. I have not vetted your other comments, but a forward lookahead in a global match may have issues with iterating. – Adam Katz Oct 30 '19 at 22:02
  • great post, super helpful, thank you – Geoffrey Hale Mar 18 '21 at 21:47
1

Here is a way to parse HTML string using DOM in JS and perform replacements only outside of tags:

var s = '<span class="css">55</span> 2 >= 1 2 > 1';
var doc = document.createDocumentFragment();
var wrapper = document.createElement('myelt');
wrapper.innerHTML = s;
doc.appendChild( wrapper );

function textNodesUnder(el){
  var n, walk=document.createTreeWalker(el,NodeFilter.SHOW_TEXT,null,false);
  while(n=walk.nextNode())
  {
       if (n.parentNode.nodeName.toLowerCase() === 'myelt')
        n.nodeValue =  n.nodeValue.replace(/>=?/g, "EQUAL"); 
  }
  return el.firstChild.innerHTML;
} 
var res = textNodesUnder(doc);
console.log(res);
alert(res);
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • thanks. Could you make the demo working in regex101? https://regex101.com/r/fH0nF3/1 – Jackie Feb 01 '16 at 23:55
  • Demo of the `>=?` regex? [Here it is](https://regex101.com/r/fH0nF3/2) – Wiktor Stribiżew Feb 01 '16 at 23:56
  • Sorry, I need more complicated test case. Please see [this](https://regex101.com/r/fH0nF3/3) – Jackie Feb 02 '16 at 00:00
  • 3
    It is common knowledge that HTML in JS should be processed with a DOM parser, and regex should be run against the tex nodes only. See [*RegEx match open tags except XHTML self-contained tags*](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). – Wiktor Stribiżew Feb 02 '16 at 00:08