314

I found these things in my regex body but I haven't got a clue what I can use them for. Does somebody have examples so I can try to understand how they work?

(?!) - negative lookahead
(?=) - positive lookahead
(?<=) - positive lookbehind
(?<!) - negative lookbehind

(?>) - atomic group
grenierm5
  • 186
  • 3
  • 13
Spidfire
  • 5,233
  • 5
  • 26
  • 36
  • 18
    Why doesn't the regex website have some simple table like this? Instead they have blocks of text explaining only. http://www.regular-expressions.info/lookaround.html – Whitecat Aug 22 '16 at 17:30
  • 3
    @Whitecat Try: https://regex101.com http://www.regexr.com – Andrew Mar 28 '17 at 14:18

3 Answers3

1014

Examples

Given the string foobarbarfoo:

bar(?=bar)     finds the 1st bar ("bar" which has "bar" after it)
bar(?!bar)     finds the 2nd bar ("bar" which does not have "bar" after it)
(?<=foo)bar    finds the 1st bar ("bar" which has "foo" before it)
(?<!foo)bar    finds the 2nd bar ("bar" which does not have "foo" before it)

You can also combine them:

(?<=foo)bar(?=bar)    finds the 1st bar ("bar" with "foo" before it and "bar" after it)

Definitions

Look ahead positive (?=)

Find expression A where expression B follows:

A(?=B)

Look ahead negative (?!)

Find expression A where expression B does not follow:

A(?!B)

Look behind positive (?<=)

Find expression A where expression B precedes:

(?<=B)A

Look behind negative (?<!)

Find expression A where expression B does not precede:

(?<!B)A

Atomic groups (?>)

An atomic group exits a group and throws away alternative patterns after the first matched pattern inside the group (backtracking is disabled).

  • (?>foo|foot)s applied to foots will match its 1st alternative foo, then fail as s does not immediately follow, and stop as backtracking is disabled

A non-atomic group will allow backtracking; if subsequent matching ahead fails, it will backtrack and use alternative patterns until a match for the entire expression is found or all possibilities are exhausted.

  • (foo|foot)s applied to foots will:

    1. match its 1st alternative foo, then fail as s does not immediately follow in foots, and backtrack to its 2nd alternative;
    2. match its 2nd alternative foot, then succeed as s immediately follows in foots, and stop.

Some resources

Online testers

SaidbakR
  • 11,955
  • 16
  • 89
  • 173
skyfoot
  • 17,783
  • 7
  • 43
  • 70
  • 1
    What do you mean by "finds the second bar" part? There is only one bar in the expression/string. Thanks – ziggy Feb 08 '14 at 11:22
  • 5
    @ziggy the string being tested is "foobarbarfoo". As you can see there are two foo and two bar in the string. – skyfoot Feb 12 '14 at 10:56
  • @ziggy try to go to http://pythex.org/ and play a little bit about it. you will understand it totally – stanleyli Mar 30 '15 at 19:09
  • Place two bars side by side, like, `barbar` in the text on which these regexs will be tried. – Obi Wan - PallavJha May 31 '17 at 13:08
  • 4
    Can someone explain when one may need an atomic group? If I only need to match with the first alternative, why would I want to give multiple alternatives? – arviman Aug 09 '17 at 12:27
  • @skyfoot or anyone on here. I can see that the "(?<=B)A" lookbehind is always before the actual lookup. Does it mean it must always comes before? Can this also be done "A(?<=B)"? As the name suggest it looks "behind" and it looks "ahead". Thank you if anyone can explain. – Chopnut Apr 21 '18 at 00:53
  • 3
    **Better explanation about atomic group** at [this answer](https://stackoverflow.com/a/14412277/287948). Can someone edit here to complete this didatic answer? – Peter Krauss Apr 27 '18 at 10:18
  • 5
    Just a note that this answer was essential when I ended up on a project that required serious regex chops. This is an excellent, concise explanation of look-arounds. – Tom Coughlin May 23 '19 at 20:49
224

Lookarounds are zero width assertions. They check for a regex (towards right or left of the current position - based on ahead or behind), succeeds or fails when a match is found (based on if it is positive or negative) and discards the matched portion. They don't consume any character - the matching for regex following them (if any), will start at the same cursor position.

Read regular-expression.info for more details.

  • Positive lookahead:

Syntax:

(?=REGEX_1)REGEX_2

Match only if REGEX_1 matches; after matching REGEX_1, the match is discarded and searching for REGEX_2 starts at the same position.

example:

(?=[a-z0-9]{4}$)[a-z]{1,2}[0-9]{2,3}

REGEX_1 is [a-z0-9]{4}$ which matches four alphanumeric chars followed by end of line.
REGEX_2 is [a-z]{1,2}[0-9]{2,3} which matches one or two letters followed by two or three digits.

REGEX_1 makes sure that the length of string is indeed 4, but doesn't consume any characters so that search for REGEX_2 starts at the same location. Now REGEX_2 makes sure that the string matches some other rules. Without look-ahead it would match strings of length three or five.

  • Negative lookahead

Syntax:

(?!REGEX_1)REGEX_2

Match only if REGEX_1 does not match; after checking REGEX_1, the search for REGEX_2 starts at the same position.

example:

(?!.*\bFWORD\b)\w{10,30}$

The look-ahead part checks for the FWORD in the string and fails if it finds it. If it doesn't find FWORD, the look-ahead succeeds and the following part verifies that the string's length is between 10 and 30 and that it contains only word characters a-zA-Z0-9_

Look-behind is similar to look-ahead: it just looks behind the current cursor position. Some regex flavors like javascript doesn't support look-behind assertions. And most flavors that support it (PHP, Python etc) require that look-behind portion to have a fixed length.

  • Atomic groups basically discards/forgets the subsequent tokens in the group once a token matches. Check this page for examples of atomic groups
mike
  • 4,509
  • 4
  • 33
  • 72
Amarghosh
  • 55,378
  • 11
  • 87
  • 119
  • following your explanation, does not seem to work in javascript, /(?=source)hello/.exec("source...hummhellosource") = null. Is your explanation correct? – Helin Wang Jun 01 '13 at 17:47
  • @HelinWang That explanation is correct. Your regex expects a string that is both source and hello at the same time! – Amarghosh Jun 04 '13 at 11:54
  • @jddxf Care to elaborate? – Amarghosh Oct 04 '16 at 05:19
  • @Amarghosh I agree with "They check for a regex (towards right or left of the current position - based on ahead or behind), succeeds or fails when a match is found (based on if it is positive or negative) and discards the matched portion.". So lookahead should check for a regex towards right of the current position and the syntax of positive lookahead should be x(?=y) – jddxf Oct 05 '16 at 11:28
  • @Amarghosh would `(?=REGEX_1)REGEX_2` only match if `REGEX_2` comes *after* `REGEX_1`? – aandis May 22 '18 at 11:50
0

Grokking lookaround rapidly.
How to distinguish lookahead and lookbehind? Take 2 minutes tour with me:

(?=) - positive lookahead
(?<=) - positive lookbehind

Suppose

    A  B  C #in a line

Now, we ask B, Where are you?
B has two solutions to declare it location:

One, B has A ahead and has C bebind
Two, B is ahead(lookahead) of C and behind (lookhehind) A.

As we can see, the behind and ahead are opposite in the two solutions.
Regex is solution Two.

AbstProcDo
  • 14,203
  • 14
  • 49
  • 94