2

I always assumed regex worked like this, but I guess I never hit a case like this until now and I'm not sure the best way to tackle it.

String to consider:

apple
apple
apple
cat

I want to use something like apple.*?cat, however, this matches the first apple to the cat when I really want the last apple and cat.

Please keep in mind this is just an example, I'm looking for a generalized way to do this (ie telling me to just match one newline between apple and cat won't work in my real case)

1 Answers1

4

You can use this negative lookahead based on tempered greedy token regex in python:

reg = re.compile(r'apple(?:(?!apple).)*cat', re.DOTALL)

RegEx Demo

(?:(?!apple).)* will match 0 or more any character that don't have apple at next position thus making sure we don't have apple in our match. Note that negative lookahead will be asserted for each character in the match.

anubhava
  • 664,788
  • 59
  • 469
  • 547
  • I can't find `tempered greedy token` in python's documentation. Is that phrase just made up? That web site is a lot of trivial junk and does nothing to develop a deep understanding of regex. I do see one of his pages that solicit though http://www.rexegg.com/regex-consultant.html –  Aug 24 '16 at 16:54
  • I don't think python regex documentation will have any reference. Phrase `tempered greedy token` is not very well documented but this pattern of `(?:(?!apple).)*` is fairly generic and well known. – anubhava Aug 24 '16 at 16:56
  • Of course. Since I'd never heard of it, I had to look. Like I said, bunch of made up trivial junk. –  Aug 24 '16 at 16:57