1

I'm not an expert with Regular Expressions, and I'm having serious problems matching a particular pattern.

The pattern is:

A sequence of consecutive, arbitrary words marked with a prefix and a suffix. Inside the word there should be at least one character.

I mean, suppose that the prefix is "AB" and the suffix is "YZ". With this input:

AB----YZAB====YZABYZ//AB++YZ,,,AB====YZAB---YZ

The matched groups should be:

AB----YZAB====YZ , AB++YZ , AB====YZAB---YZ

The group ABYZ should not be matched, because it is "empty" (there is nothing between the prefix and the suffix.

I tried with

(AB(.*?)YZ)+

But the ABYZ is detected as part of the sequence, as the "*" may match nothing. If I force to use non-empty groups with

(AB(.+?)YZ)+

But still no lock, it detects groups

AB----YZAB_____YZABYZ//AB++YZ and AB====YZAB---YZ

I tried many other, more complex, regExps, with no luck.

Any help would be very appreciated!

1 Answers1

1

You may use

(?:AB(?:(?!AB).)+?YZ)+

See the regex demo.

Details

  • (?:AB(?:(?!AB).)+?YZ)+ - one or more repetitions of
    • AB - an AB substring
    • (?:(?!AB).)+? (or (?:(?!AB|YZ).)+) - any char but a line break char, 1 or more repetitions, as few as possible, that does not start an AB char sequence (a so-called tempered greedy token)
    • YZ - a YZ substring.
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397