2

This is pretty simple but I can't seem to figure this one out. What am I doing wrong here?

The online tester shows this works fine: https://regex101.com/r/rpUNK9/3

But I am getting nothing returned when I try it in the Python REPL:

test = """<speak><prosody volume=\"x-loud\">This text should match?<break time='500ms'/><mark name='punchline'/>\n\n<say-as interpret-as='interjection'>boing</say-as><break time='1ms'/>!\n</prosody></speak>"""
rex = '(?<=<speak><prosody volume=\\\"x-loud\\\">)(.*)(?=<\/prosody>(?:<metadata>|<\/speak>))'
m = re.search(rex,test)
Max
  • 717
  • 3
  • 9
  • 20

1 Answers1

0

The issue is related to \n. This token .* does not match new lines, and it will break whenever there is a new line. This code \n is interpreted in the Python REPL as a new line, but as a regular text in the Regex 101 website. Try to think of your string this way:

<speak><prosody volume=\"x-loud\">This text should match?<break time='500ms'/><mark name='punchline'/>

<say-as interpret-as='interjection'>boing</say-as><break time='1ms'/>!
</prosody></speak>

The string above will not be matched by your current regex code. Check it out here: https://regex101.com/r/rpUNK9/4

To solve this, replace .* by something that can match new lines such as [\s\S]*

The whole code will be:

(?<=<speak><prosody volume=\\\"x-loud\\\">)([\s\S]*)(?=<\/prosody>(?:<metadata>|<\/speak>))

Example: https://regex101.com/r/rpUNK9/5

Python code:

import re
test = """<speak><prosody volume=\"x-loud\">This text should match?<break time='500ms'/><mark name='punchline'/>\n\n<say-as interpret-as='interjection'>boing</say-as><break time='1ms'/>!\n</prosody></speak>"""
rex = '(?<=<speak><prosody volume=\\\"x-loud\\\">)([\s\S]*)(?=<\/prosody>(?:<metadata>|<\/speak>))'
m = re.search(rex,test)
Ibrahim
  • 5,333
  • 2
  • 32
  • 48