152

Is there a way to achieve the equivalent of a negative lookbehind in javascript regular expressions? I need to match a string that does not start with a specific set of characters.

It seems I am unable to find a regex that does this without failing if the matched part is found at the beginning of the string. Negative lookbehinds seem to be the only answer, but javascript doesn't have one.

EDIT: This is the regex that I would like to work, but it doesn't:

(?<!([abcdefg]))m

So it would match the 'm' in 'jim' or 'm', but not 'jam'

Andrew Ensley
  • 10,940
  • 15
  • 57
  • 72
  • Consider posting the regex as it would look with a negative lookbehind; that may make it easier to respond. – Daniel LeCheminant Mar 13 '09 at 03:53
  • 1
    Those who want to track the lookbehind etc. adoption please refer to [ECMAScript 2016+ compatibility table](http://kangax.github.io/compat-table/es2016plus/) – Wiktor Stribiżew Aug 29 '19 at 14:03
  • @WiktorStribiżew : Look-behinds were added in the 2018 spec. Chrome supports them, but [Firefox still hasn't implemented the spec](https://bugzilla.mozilla.org/show_bug.cgi?id=1225665). – Lonnie Best Dec 08 '19 at 00:32
  • Does this even need a look behind? What about `(?:[^abcdefg]|^)(m)`? As in `"mango".match(/(?:[^abcdefg]|^)(m)/)[1]` – slebetman May 22 '20 at 06:30

12 Answers12

91

Since 2018, Lookbehind Assertions are part of the ECMAScript language specification.

// positive lookbehind
(?<=...)
// negative lookbehind
(?<!...)

Answer pre-2018

As Javascript supports negative lookahead, one way to do it is:

  1. reverse the input string

  2. match with a reversed regex

  3. reverse and reformat the matches


const reverse = s => s.split('').reverse().join('');

const test = (stringToTests, reversedRegexp) => stringToTests
  .map(reverse)
  .forEach((s,i) => {
    const match = reversedRegexp.test(s);
    console.log(stringToTests[i], match, 'token:', match ? reverse(reversedRegexp.exec(s)[0]) : 'Ø');
  });

Example 1:

Following @andrew-ensley's question:

test(['jim', 'm', 'jam'], /m(?!([abcdefg]))/)

Outputs:

jim true token: m
m true token: m
jam false token: Ø

Example 2:

Following @neaumusic comment (match max-height but not line-height, the token being height):

test(['max-height', 'line-height'], /thgieh(?!(-enil))/)

Outputs:

max-height true token: height
line-height false token: Ø
JBE
  • 9,856
  • 6
  • 45
  • 44
  • 36
    the problem with this approach is that it doesn't work when you have both lookahead and lookbehind – kboom Oct 09 '14 at 11:25
  • 3
    can you please show a working example, say I want to match `max-height` but not `line-height` and i only want the match to be `height` – neaumusic May 14 '15 at 22:27
  • It does not help if the task is to replace two consecutive identical symbols (and no more than 2) that are not preceded with some symbol. `''(?!\()` will replace the apostrophes in `''(''test'''''''test` from the other end, thus leaving `(''test'NNNtest` rather than `(''testNNN'test`. – Wiktor Stribiżew Apr 25 '16 at 11:20
75

Lookbehind Assertions got accepted into the ECMAScript specification in 2018.

Positive lookbehind usage:

console.log(
  "$9.99  €8.47".match(/(?<=\$)\d+\.\d*/) // Matches "9.99"
);

Negative lookbehind usage:

console.log(
  "$9.99  €8.47".match(/(?<!\$)\d+\.\d*/) // Matches "8.47"
);

Platform support:

Okku
  • 6,358
  • 3
  • 25
  • 43
67

Let's suppose you want to find all int not preceded by unsigned:

With support for negative look-behind:

(?<!unsigned )int

Without support for negative look-behind:

((?!unsigned ).{9}|^.{0,8})int

Basically idea is to grab n preceding characters and exclude match with negative look-ahead, but also match the cases where there's no preceeding n characters. (where n is length of look-behind).

So the regex in question:

(?<!([abcdefg]))m

would translate to:

((?!([abcdefg])).|^)m

You might need to play with capturing groups to find exact spot of the string that interests you or you want to replace specific part with something else.

nhahtdh
  • 52,949
  • 15
  • 113
  • 149
Kamil Szot
  • 15,477
  • 6
  • 53
  • 62
  • 3
    This should be the correct answer. See: `"So it would match the 'm' in 'jim' or 'm', but not 'jam'".replace(/(j(?!([abcdefg])).|^)m/g, "$1[MATCH]")` returns `"So it would match the 'm' in 'ji[MATCH]' or 'm', but not 'jam'"` It is pretty simple and it works! – Asrail Aug 19 '15 at 02:21
  • Brilliant! Use a negative look-ahead as a workaround for older JavaScript! – Peter Thoeny Aug 15 '20 at 03:42
  • can you help me to solve this also: /\B(? – Ehsan sarshar Apr 26 '21 at 15:52
41

Mijoja's strategy works for your specific case but not in general:

js>newString = "Fall ball bill balll llama".replace(/(ba)?ll/g,
   function($0,$1){ return $1?$0:"[match]";});
Fa[match] ball bi[match] balll [match]ama

Here's an example where the goal is to match a double-l but not if it is preceded by "ba". Note the word "balll" -- true lookbehind should have suppressed the first 2 l's but matched the 2nd pair. But by matching the first 2 l's and then ignoring that match as a false positive, the regexp engine proceeds from the end of that match, and ignores any characters within the false positive.

Jason S
  • 171,795
  • 155
  • 551
  • 900
  • 5
    Ah, you are correct. However, this is a lot closer than I was before. I can accept this until something better comes along (like javascript actually implementing lookbehinds). – Andrew Ensley Mar 13 '09 at 15:38
33

Use

newString = string.replace(/([abcdefg])?m/, function($0,$1){ return $1?$0:'m';});
Mijoja
  • 658
  • 6
  • 7
  • 10
    This doesn't do anything: `newString` will always equal `string`. Why so many upvotes? – MikeM Jan 03 '13 at 19:26
  • @MikeM: because the point is simply to demonstrate a matching technique. – bug Mar 12 '13 at 02:59
  • 58
    @bug. A demonstration that doesn't do anything is a strange kind of demonstration. The answer comes across as if it was just copy and pasted without any understanding of how it works. Thus the lack of accompanying explanation and the failure to demonstrate that anything has been matched. – MikeM Mar 12 '13 at 15:41
  • 2
    @MikeM: the rule of SO is, if it answers the question *as written*, it's correct. OP didn't specify a use case – bug Mar 26 '13 at 01:34
  • 7
    The concept is correct, but yes it's not demo'd very well. Try running this in the JS console... `"Jim Jam Momm m".replace(/([abcdefg])?m/g, function($0, $1){ return $1 ? $0 : '[match]'; });`. It should return `Ji[match] Jam Mo[match][match] [match]`. But also note that as Jason mentioned below, it can fail on certain edge cases. – Simon East Sep 16 '14 at 04:24
  • I've written [a more comprehensive answer](https://stackoverflow.com/questions/35142364/regex-negative-lookbehind-not-valid-in-javascript/35143111#35143111 "regex Negative Lookbehind not valid in javascript") with full descriptions of how to match and replace with both positive and negative lookbehinds. Now that I've found this question, I've also flagged the question I answered as a duplicate of this one. – Adam Katz Feb 09 '16 at 01:29
11

You could define a non-capturing group by negating your character set:

(?:[^a-g])m

...which would match every m NOT preceded by any of those letters.

Alan Moore
  • 68,531
  • 11
  • 88
  • 149
Klemen Slavič
  • 19,181
  • 3
  • 31
  • 42
  • 2
    I think the match would actually also cover the preceding character. – Sam Sep 28 '13 at 06:37
  • 4
    ^ this is true. A character class represents...a character! All your non-capturing group is doing is not making that value available in a replace context. Your expression is not saying "every m NOT preceded by any of those letters" it is saying "every m *preceded by a character* that is NOT any of those letters" – theflowersoftime Mar 13 '14 at 22:13
  • 5
    For the answer to also solve the original problem (beginning of string), it must also include an option, so the resulting regex would be `(?:[^a-g]|^)m`. See https://regex101.com/r/jL1iW6/2 for running example. – Johny Skovdal Jan 18 '16 at 22:35
  • Using void logic does not always have the desired effect. – GoldBishop Aug 09 '17 at 13:49
2

This is how I achieved str.split(/(?<!^)@/) for Node.js 8 (which doesn't support lookbehind):

str.split('').reverse().join('').split(/@(?!$)/).map(s => s.split('').reverse().join('')).reverse()

Works? Yes (unicode untested). Unpleasant? Yes.

Fishrock123
  • 113
  • 2
  • 7
1

following the idea of Mijoja, and drawing from the problems exposed by JasonS, i had this idea; i checked a bit but am not sure of myself, so a verification by someone more expert than me in js regex would be great :)

var re = /(?=(..|^.?)(ll))/g
         // matches empty string position
         // whenever this position is followed by
         // a string of length equal or inferior (in case of "^")
         // to "lookbehind" value
         // + actual value we would want to match

,   str = "Fall ball bill balll llama"

,   str_done = str
,   len_difference = 0
,   doer = function (where_in_str, to_replace)
    {
        str_done = str_done.slice(0, where_in_str + len_difference)
        +   "[match]"
        +   str_done.slice(where_in_str + len_difference + to_replace.length)

        len_difference = str_done.length - str.length
            /*  if str smaller:
                    len_difference will be positive
                else will be negative
            */

    }   /*  the actual function that would do whatever we want to do
            with the matches;
            this above is only an example from Jason's */



        /*  function input of .replace(),
            only there to test the value of $behind
            and if negative, call doer() with interesting parameters */
,   checker = function ($match, $behind, $after, $where, $str)
    {
        if ($behind !== "ba")
            doer
            (
                $where + $behind.length
            ,   $after
                /*  one will choose the interesting arguments
                    to give to the doer, it's only an example */
            )
        return $match // empty string anyhow, but well
    }
str.replace(re, checker)
console.log(str_done)

my personal output:

Fa[match] ball bi[match] bal[match] [match]ama

the principle is to call checker at each point in the string between any two characters, whenever that position is the starting point of:

--- any substring of the size of what is not wanted (here 'ba', thus ..) (if that size is known; otherwise it must be harder to do perhaps)

--- --- or smaller than that if it's the beginning of the string: ^.?

and, following this,

--- what is to be actually sought (here 'll').

At each call of checker, there will be a test to check if the value before ll is not what we don't want (!== 'ba'); if that's the case, we call another function, and it will have to be this one (doer) that will make the changes on str, if the purpose is this one, or more generically, that will get in input the necessary data to manually process the results of the scanning of str.

here we change the string so we needed to keep a trace of the difference of length in order to offset the locations given by replace, all calculated on str, which itself never changes.

since primitive strings are immutable, we could have used the variable str to store the result of the whole operation, but i thought the example, already complicated by the replacings, would be clearer with another variable (str_done).

i guess that in terms of performances it must be pretty harsh: all those pointless replacements of '' into '', this str.length-1 times, plus here manual replacement by doer, which means a lot of slicing... probably in this specific above case that could be grouped, by cutting the string only once into pieces around where we want to insert [match] and .join()ing it with [match] itself.

the other thing is that i don't know how it would handle more complex cases, that is, complex values for the fake lookbehind... the length being perhaps the most problematic data to get.

and, in checker, in case of multiple possibilities of nonwanted values for $behind, we'll have to make a test on it with yet another regex (to be cached (created) outside checker is best, to avoid the same regex object to be created at each call for checker) to know whether or not it is what we seek to avoid.

hope i've been clear; if not don't hesitate, i'll try better. :)

1

Using your case, if you want to replace m with something, e.g. convert it to uppercase M, you can negate set in capturing group.

match ([^a-g])m, replace with $1M

"jim jam".replace(/([^a-g])m/g, "$1M")
\\jiM jam

([^a-g]) will match any char not(^) in a-g range, and store it in first capturing group, so you can access it with $1.

So we find im in jim and replace it with iM which results in jiM.

Traxo
  • 14,948
  • 4
  • 64
  • 80
1

As mentioned before, JavaScript allows lookbehinds now. In older browsers you still need a workaround.

I bet my head there is no way to find a regex without lookbehind that delivers the result exactly. All you can do is working with groups. Suppose you have a regex (?<!Before)Wanted, where Wanted is the regex you want to match and Before is the regex that counts out what should not precede the match. The best you can do is negate the regex Before and use the regex NotBefore(Wanted). The desired result is the first group $1.

In your case Before=[abcdefg] which is easy to negate NotBefore=[^abcdefg]. So the regex would be [^abcdefg](m). If you need the position of Wanted, you must group NotBefore too, so that the desired result is the second group.

If matches of the Before pattern have a fixed length n, that is, if the pattern contains no repetitive tokens, you can avoid negating the Before pattern and use the regular expression (?!Before).{n}(Wanted), but still have to use the first group or use the regular expression (?!Before)(.{n})(Wanted) and use the second group. In this example, the pattern Before actually has a fixed length, namely 1, so use the regex (?![abcdefg]).(m) or (?![abcdefg])(.)(m). If you are interested in all matches, add the g flag, see my code snippet:

function TestSORegEx() {
  var s = "Donald Trump doesn't like jam, but Homer Simpson does.";
  var reg = /(?![abcdefg])(.{1})(m)/gm;
  var out = "Matches and groups of the regex " + 
            "/(?![abcdefg])(.{1})(m)/gm in \ns = \"" + s + "\"";
  var match = reg.exec(s);
  while(match) {
    var start = match.index + match[1].length;
    out += "\nWhole match: " + match[0] + ", starts at: " + match.index
        +  ". Desired match: " + match[2] + ", starts at: " + start + ".";   
    match = reg.exec(s);
  }
  out += "\nResulting string after statement s.replace(reg, \"$1*$2*\")\n"
         + s.replace(reg, "$1*$2*");
  alert(out);
}
0

This effectively does it

"jim".match(/[^a-g]m/)
> ["im"]
"jam".match(/[^a-g]m/)
> null

Search and replace example

"jim jam".replace(/([^a-g])m/g, "$1M")
> "jiM jam"

Note that the negative look-behind string must be 1 character long for this to work.

Curtis Yallop
  • 5,490
  • 3
  • 37
  • 27
  • 1
    Not quite. In "jim", I don't want the "i"; just the "m". And `"m".match(/[^a-g]m/)` yeilds `null` as well. I want the "m" in that case too. – Andrew Ensley Apr 13 '16 at 14:14
-1

/(?![abcdefg])[^abcdefg]m/gi yes this is a trick.

Techsin
  • 532
  • 3
  • 18