1

I have text in following format.

|start| this is first para to remove |end|.
this is another text.
|start| this is another para to remove |end|. Again some free text

I want to remove all text in between |start| and |end|

I have tried following re.

regex = '(?<=\|start\|).+(?=\|end\|)'
re.sub(regex, ''. text)

It returns

"Again some free text"

But I expect to return

this is another text. Again some free text

Hima
  • 9,037
  • 2
  • 18
  • 30

2 Answers2

1

Note the start/end delimiters are in lookaround constructs in your pattern and thus will remain in the resulting string after re.sub. You should convert the lookbehind and lookahead into consuming patterns.

Also, you seem to want to remove special chars after the right hand delimiter, so you need to add [^\w\s]* at the end of the regex.

You may use

import re
text = """|start| this is first para to remove |end|.
this is another text.
|start| this is another para to remove |end|. Again some free text"""
print( re.sub(r'(?s)\|start\|.*?\|end\|[^\w\s]*', '', text).replace('\n', '') )
# => this is another text. Again some free text

See the Python demo.

Regex details

  • (?s) - inline DOTALL modifier
  • \|start\| - |start| text
  • .*? - any 0+ chars, as few as possible
  • \|end\| - |end| text
  • [^\w\s]* - 0 or more chars other than word and whitespace chars.
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
0

Try this:

import re

your_string = """|start| this is first para to remove |end|.
this is another text.
|start| this is another para to remove |end|. Again some free text"""

regex = r'(\|start\|).+(\|end\|\.)'

result = re.sub(regex, '', your_string).replace('\n', '')

print(result)

Outputs:

this is another text. Again some free text
Rithin Chalumuri
  • 1,361
  • 2
  • 17