0

This is the (simplified) text that I am working on:

# header 1
Lorem ipsum

# random header
dolor si

# header 2
amet

I would like to catch this snippet A:

# header 1
Lorem ipsum

# random header
dolor si

If I use regex a: # header 1(?:[^#]+|(?!# header 2)#)*, I get snippet A as expected.

But if I use regex b: # header 1(?:[^#]*|(?!# header 2)#)*, I only get snippet B:

# header 1
Lorem ipsum

I would expect to get snippet A in both cases - what's happening in the * case that makes the match stop prematurely ? Regex flavor is php (link to relevant regex101).


FYI: I know there are simpler ways to match this snippet, this pattern makes sense in the un-simplifed version. I solved my actual problem (with something like (?:[^#]|(?!# header 2)#)*+) now I am curious to understand why these two regex a and b behave differently.

Robin
  • 8,479
  • 2
  • 30
  • 44
  • There are 2 reasons, and they are all well known: 1) the `*` matches **0** or more occurrences, so `[^#]*` match match empty strings before non-matching chars, 2) since the first branch in the alternation group always matches, the second is *never* even tried. – Wiktor Stribiżew Nov 10 '17 at 10:00
  • It's because `*` can satisfy engine even if next immediate character is a `#` as long as `*` means more or ***zero***. – revo Nov 10 '17 at 10:00
  • Ah, basic mistake... Sorry for the duplicate, thanks ! – Robin Nov 10 '17 at 10:04

0 Answers0