2

I'm trying to get a regular expression to match something inbetween two strings that includes a third. I'm having trouble getting the lazy quantifier to cooperate, as there are multiple instances of these strings in the input and the RegEx matches something that is not useful, i.e.:

Start...End...Start...End...Start...Middle...End

Whet I'm actually looking for (only one instance of Start and End for each match):

Start...Middle...End or Start...Center...End

I'm pretty sure I need to use lookahead/lookbehind, but while I do conceptually understand them, putting them into practice is really difficult. Here's where I'm at:

/<Start[\s\S]*?(Middle|Center)[\s\S]*?End>/gm

Joe C
  • 13,953
  • 7
  • 35
  • 48
Cake4
  • 33
  • 3
  • Just a slight modification would allow your regex to work: `Start[\s\S]*?(Middle|Center)?[\s\S]*?End` - I made `(Middle|Center)?` optional. I would use `\bStart\b.*?(Middle|Center)?.*?\bEnd\b` instead though and turn on the `s` modifier (dot matches newline). Also note it uses `\b` so that you don't accidentally catch something else like `Endow` – ctwheels Nov 14 '17 at 22:09
  • You have added four tags ([tag:java], [tag:c#], [tag:python] and [tag:perl]) which have nothing to do with your question. We take tags seriously on this site, and we ask that you only tag what your question is about. I have since removed them. – Joe C Nov 14 '17 at 22:11
  • Sorry about that, won't happen again. @ctwheels Matching the string in the middle is mandatory, so unfortunately that doesn't cut it. – Cake4 Nov 14 '17 at 22:31
  • @Cake4 my apologies, I misunderstood your question. Wiktor got it though – ctwheels Nov 14 '17 at 22:32

1 Answers1

2

Make use of the tempered greedy token:

Start(?:(?!Start|End)[\s\S])*?(Middle|Center)[\s\S]*?End
     ^^^^^^^^^^^^^^^^^^^^^^^^^ 

See the regex demo

Details

  • Start - a literal string
  • (?:(?!Start|End)[\s\S])*? - any char, 0+ repetitions, as few as possible, that is not a starting point of Start or End sequence
  • (Middle|Center) - Group 1: Middle or Center
  • [\s\S]*? - any 0+ chars, as few as possible
  • End - a literal string
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • 1
    Thank you very much! I looked for quite a while, but it looks like I wasn't able to formulate my question correctly. This explanation clears it up a bit better for me: https://stackoverflow.com/a/40999431/8941643 – Cake4 Nov 14 '17 at 22:25