3

I would like to match entire line in a multi-line string (this code is part of unit test that checks the correct output format).

Python 3.5.2 (default, Nov 12 2018, 13:43:14) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.match(r".*score = 0\.59.*", r"score = 0.65\nscore = 0.59\nscore = 1.0", re.MULTILINE)
<_sre.SRE_Match object; span=(0, 39), match='score = 0.65\\nscore = 0.59\\nscore = 1.0'>

This works fine, i can match anything within multiline string. However, i would like to make sure that i match entire line. The documentation sais that the ^ and $ should match the beginning and end of line when re.MULTILINE is used. However, this somehow does not work for me:

>>> re.match(r".*^score = 0\.59$.*", r"score = 0.65\nscore = 0.59\nscore = 1.0", re.MULTILINE)
>>> 

Here are a few more experiments i made:

>>> import os
>>> re.match(r".*^score = 0\.59$.*", "score = 0.65{}score = 0.59{}score = 1.0".format(os.linesep, os.linesep), re.MULTILINE)
>>>
>>> re.match(r".*^score = 0\.65$.*", "score = 0.65{}score = 0.59{}score = 1.0".format(os.linesep, os.linesep), re.MULTILINE)
<_sre.SRE_Match object; span=(0, 12), match='score = 0.65'>
>>> re.match(r".*^score = 0\.65$.*", r"score = 0.65\nscore = 0.59\nscore = 1.0", re.MULTILINE)
>>> 

I guess i'm missing something rather simple, but couldn't figure that out.

k6ps
  • 319
  • 3
  • 11

3 Answers3

3

problem is that since you're using raw strings for your string, \n is seen as ... well \ then n. Regexes will understand \n in the pattern, but not in the input string.

Also, even if not important there, always use flags= keyword, as some regex functions have an extra count parameter and that can lead to errors.

like this:

re.match(r".*^score = 0\.65$.*", "score = 0.65\nscore = 0.59\nscore = 1.0", flags=re.MULTILINE)
<_sre.SRE_Match object; span=(0, 12), match='score = 0.65'>

and as I noted in comments, .* needs re.DOTALL to match newlines

>>> re.match(r".*^score = \d+\.\d+$.*", "score = 0.65\nscore = 0.59\nscore = 1.0", re.MULTILINE|re.DOTALL)
<_sre.SRE_Match object; span=(0, 37), match='score = 0.65\nscore = 0.59\nscore = 1.0'>

(as noted in Python regex, matching pattern over multiple lines.. why isn't this working? and How do I match any character across multiple lines in a regular expression? of which this could be a duplicate if it wasn't for the raw string bit)

(sorry, my floating point regex is probably a bit weak, you can find better ones around)

Jean-François Fabre
  • 126,787
  • 22
  • 103
  • 165
  • Yes, that seems to work. I messed up with raw vs. non-raw strings, and didn't notice the DOTALL option. Thank you, also for the extra suggestion! – k6ps Nov 22 '18 at 09:25
  • @WiktorStribiżew yes there were 2 issues. Raw strings & the DOTALL thing. Thanks. I answered only because of the raw string stuff, which isn't covered by the duplicate – Jean-François Fabre Nov 22 '18 at 09:27
  • I normally don't do that, but here I really don't understand that downvote spree so ok. – Jean-François Fabre Jan 15 '19 at 10:07
2

You need to match against a non raw string and use DOTALL mode:

print re.match(r".*^score = 0\.59$.*", "score = 0.65\nscore = 0.59\nscore = 1.0",
    re.MULTILINE|re.DOTALL)

<_sre.SRE_Match object at 0x7fd2426d0648>
Tim Biegeleisen
  • 387,723
  • 20
  • 200
  • 263
  • Yes, that seems to work. I messed up with raw vs. non-raw strings, and didn't notice the DOTALL option. Thank you! Sorry that i cannot check two answers as "solved my problem" – k6ps Nov 22 '18 at 09:24
  • @k6ps Wiktor duped your question, but he basically does that with every regex question, so don't feel bad about that. – Tim Biegeleisen Nov 22 '18 at 09:26
  • at some point, when you're very much into a subject, everything is a dupe. But on that one, the raw string bit isn't covered. – Jean-François Fabre Nov 22 '18 at 09:27
  • @Jean-FrançoisFabre Someone really needs to open up a broad discussion on the Meta site about this. I will tell you that the duplicate threshhold on the SQL tag definitely seems to be substantially lower than regex. The question is duplicate with respect to _which_ level of engineer? – Tim Biegeleisen Nov 22 '18 at 09:29
  • I think I'm going to reopen it, quoting the links Wiktor left as dupes, because duplicate only answer partially (my first comment answered that already, then I found out the raw string bit) @TimBiegeleisen: reopened :) – Jean-François Fabre Nov 22 '18 at 09:33
1

The real answer to your question is that you only confused match and search:

>>> import os, re
>>> print(re.match(r".*^score = 0\.59$.*", "score = 0.65\nscore = 0.59\nscore = 1.0", flags=re.MULTILINE))
None
>>> print(re.search(r".*^score = 0\.59$.*", "score = 0.65\nscore = 0.59\nscore = 1.0", flags=re.MULTILINE))
<_sre.SRE_Match object; span=(13, 25), match='score = 0.59'>
>>> 

That's why one of your non-raw examples worked, while the other did not.

mportes
  • 1,036
  • 3
  • 11