0

I have raw HTML and am trying to remove this whole block like this [%~ as..abcd ~%] from the output string. Using re library of python

teststring = "Check the direction . [%~ MACRO wdwDate(date) BLOCK;
                 SET tmpdate = date.clone();
                 END ~%] Determine if both directions."
cleanM = re.compile('\[\%\~ .*? \~\%\]')
scleantext = re.sub(cleanM,'', teststring)

what is wrong in the code ?

2 Answers2

1

Your pattern should be

cleanM = re.compile(r'\[\%\~ .*? \~\%\]',re.S)

. matches any character except new line, S allows to match the newline

mkHun
  • 5,507
  • 1
  • 25
  • 65
  • The caveat is that you need to use `re.compile` when you want to use re.S. It does not work directly in re.sub for whatever reason... – mrCarnivore Dec 01 '17 at 11:29
  • You can also exclude the markers from the match: `r'(?<=\[%~ ).*(?= \~%])'`. BTW: Always use raw strings (`r'...'`) on regular expressions. – Klaus D. Dec 01 '17 at 11:31
0

You need to use [\S\s]* instead of .* and you can leave out compile:

import re
teststring = '''Check the direction . [%~ MACRO wdwDate(date) BLOCK;
                 SET tmpdate = date.clone();
                 END ~%] Determine if both directions.'''
scleantext = re.sub('(\[%~ [\S\s]* ~%\])', '', teststring)

print(scleantext)
mrCarnivore
  • 2,992
  • 1
  • 9
  • 25