The issue is related to \n
. This token .*
does not match new lines, and it will break whenever there is a new line. This code \n
is interpreted in the Python REPL as a new line, but as a regular text in the Regex 101 website. Try to think of your string this way:
<speak><prosody volume=\"x-loud\">This text should match?<break time='500ms'/><mark name='punchline'/>
<say-as interpret-as='interjection'>boing</say-as><break time='1ms'/>!
</prosody></speak>
The string above will not be matched by your current regex code. Check it out here: https://regex101.com/r/rpUNK9/4
To solve this, replace .*
by something that can match new lines such as [\s\S]*
The whole code will be:
(?<=<speak><prosody volume=\\\"x-loud\\\">)([\s\S]*)(?=<\/prosody>(?:<metadata>|<\/speak>))
Example: https://regex101.com/r/rpUNK9/5
Python code:
import re
test = """<speak><prosody volume=\"x-loud\">This text should match?<break time='500ms'/><mark name='punchline'/>\n\n<say-as interpret-as='interjection'>boing</say-as><break time='1ms'/>!\n</prosody></speak>"""
rex = '(?<=<speak><prosody volume=\\\"x-loud\\\">)([\s\S]*)(?=<\/prosody>(?:<metadata>|<\/speak>))'
m = re.search(rex,test)