Why is my Python REGEX findall returning an unexpected list of matches?

Question

I am attempting to detect the following pattern: even number of \ followed by $ and text.

This is valid: \\$hello or $goodbye.

I am trying to do this in Python:

txt = r"\\$hello"
regex = r"(?<!\\)(\\\\)*(?!\\)\$[a-zA-Z_]\w*"

x = re.findall(regex, txt)

if x:
  print(x)
else:
  print("No match")

When I run this, I get this output ['\\\\'] even though when I try it here: https://regex101.com/, I get a full match of \\$hello. How can I adjust this to get the entire portion to get matched? Or even better, just the part without the slashes?

Other things I've tried:

Remote escaping characters in regex: r"(?<!\)(\\\)*(?!\)\$[a-zA-Z_]\w*" This leads to error re.error: missing ), unterminated subpattern at position 11

@jonathan.scholbach I am pretty sure that in raw expressions, \ doesn't need to be escaped. My text input is correct. It can be written like this: `"\\\\$hello"` or `r"\\$hello"`. Both means the same thing (two slashes followed by $hello). — shurup, Oct 19 '20 at 21:27

jonathan.scholbach · Answer 1 · 2020-10-19T21:35:30.337

I don't understand the idea behind your regex, so I cannot really say where you went wrong. But the following works with your verbal description of the matching pattern ("an even number of backslashes, followed by a dollar sign, followed by text") and retrieves the text after the $ sign:

import re


txt = r"\\$hello"
regex = r"(\\)*\$(.*)"
match = re.findall(regex,txt)[0][1]

If you want the dollar sign included in the matchstring, just adapt:

regex = r"(\\)*(\$.*)"

Pranav Hosangadi · Answer 2 · 2020-10-19T21:36:54.543

-1

You're capturing the wrong thing. Make the (\\\\) a non-capturing group like so: (?:\\\\) and capture the part after the slashes like so: (\$[a-zA-Z_]\w*). Then your code gives x = ['$hello']

txt = r"\\$hello"
regex = r"(?<!\\)(?:\\\\)*(?!\\)(\$[a-zA-Z_]\w*)"

x = re.findall(regex, txt)
# x:  ['$hello']

If you want to capture the slashes and the rest, keep the original capturing group but add the second.

txt = r"\\$hello"
regex = r"(?<!\\)(\\\\)*(?!\\)(\$[a-zA-Z_]\w*)"

x = re.findall(regex, txt)
# x: [('\\\\', '$hello')]

edited Oct 19 '20 at 21:36

answered Oct 19 '20 at 21:35

Pranav Hosangadi

12,149
4
34
61

1

This is a known issue and solution, see https://stackoverflow.com/questions/31915018/re-findall-behaves-weird – Wiktor Stribiżew Oct 19 '20 at 21:36

Why is my Python REGEX findall returning an unexpected list of matches?

2 Answers2