-1

I need help finding a substring using regex, starting with an example:

Given the following string:

test_str = "start: 1111 kill 22:22 start: 3333 end"

I would like to extract the string between start and end that doesn't involve kill:

wanted_result = (start: 3333 end)

Note: I need to get all matches of start blablab end that don't have kill between them

Several tries failed, the latest one:

pattern = re.compile(r'start:(.+?)(([^kill])end)',flags = re.DOTALL)
results = pattern.findall(test_str)

which results in a different result:

result = (' 1111 kill 22:22 start: 3333', ' end', ' end')
TigerhawkT3
  • 44,764
  • 6
  • 48
  • 82
Despair
  • 665
  • 1
  • 5
  • 14

2 Answers2

3

You need to use a negative lookahead based regex.

pattern = re.compile(r'start:(?:(?!kill).)*?end',flags = re.DOTALL)

(?:(?!kill).)*? would do checking before match a character. It checks that the character going to be matched would be any but it must not be a start of the substring kill.

Example:

>>> import re
>>> test_str = "start: 1111 kill 22:22 start: 3333 end"
>>> pattern = re.compile(r'start:(?:(?!kill).)*?end',flags = re.DOTALL)
>>> pattern.findall(test_str)
['start: 3333 end']
Avinash Raj
  • 160,498
  • 22
  • 182
  • 229
1

As a hint you may note that negated character class will exclude the characters within the character class not the words.For that aim you need to use a negative look-ahead.

So instead of [^kill] you need (?!kill).

And read this question about regular-expression-to-match-line-that-doesnt-contain-a-word

Community
  • 1
  • 1
kasravnd
  • 94,640
  • 16
  • 137
  • 166