Regular expression search all substrings between two keywords with linebreaks

Question

Lets say I have a python string '\\[this\\] is \\[some\n text\\].'

s = "\\[this\\] is \\[some\n text\\]."

I would like a regular expression that would return me substrings "this" and "some\n text". I've tried

re.search(r'\\[(.*)\\]',s)

but it does not work (return None)

score 1 · Answer 1 · answered Feb 14 '21 at 10:19

1

You miss one backslash in the regex, and use re.DOTALL for the dot . to match the newline char

import re

s = "\\[this\\] is \\[some\n text\\]."
r = re.findall(r'\\\[(.*?)\\\]', s, flags=re.DOTALL)
print(r)  # ['this', 'some\n text']

answered Feb 14 '21 at 10:19

azro

Fast! I had `x=re.findall(r'(?s)\[([^\]]*)\\\]',s)` with links to http://www.regular-expressions.info/modifiers.html and https://riptutorial.com/regex/example/32238/why-doesn-t-dot-----match-the-newline-character----n--- – John Feb 14 '21 at 10:29

score 0 · Answer 2 · answered Feb 14 '21 at 10:30

0

I will take the string you posted literally, but you can easily edit the regex to match another pattern.

I think that this can do the work:

'\\\\\[(.*?)\\\\\]'

Explained:

\ escapes a character, so with \ you escape a backslash. Since you have to find 2 backslashes, you need 2 more of them as escape characters (4 in total)
For the same reason as above, you need one more \ to escape the [ character
( sets your capturing group
. matches any character
* as many times as possible, but followed by a ? it means as few times as possible
) closes your capturing group
the other 5 \ followed by ] work as explained before (escaping the backslash/bracket sequence)

Hope I helped ;)

answered Feb 14 '21 at 10:30

subundhu

That would match `r"\\[this\\] is \\[some\n text\\]."` not `"\\[this\\] is \\[some\n text\\]."` ;) these are different string, not the `r` preffix – azro Feb 14 '21 at 10:50
@azro yep, but the OP didn't ask that explicitly (I said that I used the literal string in fact). If you need that a match starts with `\\[` and ends with `\\]` you can use [word boundaries](https://www.regular-expressions.info/wordboundaries.html). The regex will look like this: `\B\\\\\[(.*?)\\\\\]\B` – subundhu Feb 14 '21 at 11:39

score 0 · Answer 3 · answered Feb 14 '21 at 11:15

You can use use negated character class ([^][]*) with a capture group, and match the \ right before the closing ] outside of the group.

import re
s = "\\[this\\] is \\[some\n text\\]."

print(re.findall(r"\[([^][]*)\\]", s))

Output

['this', 'some\n text']

3 Answers3