-2

Lets say I have a python string '\\[this\\] is \\[some\n text\\].'

s = "\\[this\\] is \\[some\n text\\]."

I would like a regular expression that would return me substrings "this" and "some\n text". I've tried

re.search(r'\\[(.*)\\]',s)

but it does not work (return None)

azro
  • 35,213
  • 7
  • 25
  • 55
Jake B.
  • 393
  • 2
  • 8

3 Answers3

1

You miss one backslash in the regex, and use re.DOTALL for the dot . to match the newline char

import re

s = "\\[this\\] is \\[some\n text\\]."
r = re.findall(r'\\\[(.*?)\\\]', s, flags=re.DOTALL)
print(r)  # ['this', 'some\n text']
azro
  • 35,213
  • 7
  • 25
  • 55
  • Fast! I had `x=re.findall(r'(?s)\[([^\]]*)\\\]',s)` with links to http://www.regular-expressions.info/modifiers.html and https://riptutorial.com/regex/example/32238/why-doesn-t-dot-----match-the-newline-character----n--- – John Feb 14 '21 at 10:29
0

I will take the string you posted literally, but you can easily edit the regex to match another pattern.

I think that this can do the work:

'\\\\\[(.*?)\\\\\]'

Explained:

  • \ escapes a character, so with \ you escape a backslash. Since you have to find 2 backslashes, you need 2 more of them as escape characters (4 in total)
  • For the same reason as above, you need one more \ to escape the [ character
  • ( sets your capturing group
  • . matches any character
  • * as many times as possible, but followed by a ? it means as few times as possible
  • ) closes your capturing group
  • the other 5 \ followed by ] work as explained before (escaping the backslash/bracket sequence)

Hope I helped ;)

subundhu
  • 9
  • 2
  • That would match `r"\\[this\\] is \\[some\n text\\]."` not `"\\[this\\] is \\[some\n text\\]."` ;) these are different string, not the `r` preffix – azro Feb 14 '21 at 10:50
  • @azro yep, but the OP didn't ask that explicitly (I said that I used the literal string in fact). If you need that a match starts with `\\[` and ends with `\\]` you can use [word boundaries](https://www.regular-expressions.info/wordboundaries.html). The regex will look like this: `\B\\\\\[(.*?)\\\\\]\B` – subundhu Feb 14 '21 at 11:39
0

You can use use negated character class ([^][]*) with a capture group, and match the \ right before the closing ] outside of the group.

import re
s = "\\[this\\] is \\[some\n text\\]."

print(re.findall(r"\[([^][]*)\\]", s))

Output

['this', 'some\n text']
The fourth bird
  • 96,715
  • 14
  • 35
  • 52