Python Match Regex works in tester but not in consol

Question

This is pretty simple but I can't seem to figure this one out. What am I doing wrong here?

The online tester shows this works fine: https://regex101.com/r/rpUNK9/3

But I am getting nothing returned when I try it in the Python REPL:

test = """<speak><prosody volume=\"x-loud\">This text should match?<break time='500ms'/><mark name='punchline'/>\n\n<say-as interpret-as='interjection'>boing</say-as><break time='1ms'/>!\n</prosody></speak>"""
rex = '(?<=<speak><prosody volume=\\\"x-loud\\\">)(.*)(?=<\/prosody>(?:<metadata>|<\/speak>))'
m = re.search(rex,test)

Sorry please elaborate. What is DOM? Edit: okay quick Google search tells me. Thanks! — Max, Feb 23 '19 at 01:04

Ibrahim · Accepted Answer · 2019-02-23T01:58:35.973

The issue is related to \n. This token .* does not match new lines, and it will break whenever there is a new line. This code \n is interpreted in the Python REPL as a new line, but as a regular text in the Regex 101 website. Try to think of your string this way:

<speak><prosody volume=\"x-loud\">This text should match?<break time='500ms'/><mark name='punchline'/>

<say-as interpret-as='interjection'>boing</say-as><break time='1ms'/>!
</prosody></speak>

The string above will not be matched by your current regex code. Check it out here: https://regex101.com/r/rpUNK9/4

To solve this, replace .* by something that can match new lines such as [\s\S]*

The whole code will be:

(?<=<speak><prosody volume=\\\"x-loud\\\">)([\s\S]*)(?=<\/prosody>(?:<metadata>|<\/speak>))

Example: https://regex101.com/r/rpUNK9/5

Python code:

import re
test = """<speak><prosody volume=\"x-loud\">This text should match?<break time='500ms'/><mark name='punchline'/>\n\n<say-as interpret-as='interjection'>boing</say-as><break time='1ms'/>!\n</prosody></speak>"""
rex = '(?<=<speak><prosody volume=\\\"x-loud\\\">)([\s\S]*)(?=<\/prosody>(?:<metadata>|<\/speak>))'
m = re.search(rex,test)

Python Match Regex works in tester but not in consol

1 Answers1