1

Given this string "foo-bar=369,337,234,123", I'm able to parse it to ['foo-bar', '369', '337', '234', '123] with this regular expression:

re.findall(r'[a-zA-Z0-9\-_\+;]+', 'foo-bar=369,337,234,123')

Now, if I escape some of the , in the string, e.g. "foo-bar=369\,337\,234,123", I would like it to be parsed a bit differently: ['foo-bar', '369\,337\,234', '123']. I tried the below regex but it doesn't work:

r'[a-zA-Z0-9\-_\+;(\\,)]+'

basically trying to add the sequence of characters \, to the list of characters to match.

BiBi
  • 5,481
  • 3
  • 32
  • 54

1 Answers1

1

You may use

[a-zA-Z0-9_+;-]+(?:\\,[a-zA-Z0-9_+;-]+)*

See the regex demo

If you pass re.A or re.ASCII to re.compile, you may shorten it to

[\w+;-]+(?:\\,[\w+;-]+)*

Regex details

  • [\w+;-]+ - one or more word, +, ; or - chars
  • (?:\\,[\w+;-]+)* - 0 or more occurrences of a \, substring followed with 1+ word, +, ; or - chars.

Python demo:

import re
strings = [r'foo-bar=369,337,234,123', r'foo-bar=369\,337\,234,123']
rx = re.compile(r"[\w+;-]+(?:\\,[\w+;-]+)*", re.A)
for s in strings:
    print(f"Parsing {s}")
    print(rx.findall(s))

Output:

Parsing foo-bar=369,337,234,123
['foo-bar', '369', '337', '234', '123']
Parsing foo-bar=369\,337\,234,123
['foo-bar', '369\\,337\\,234', '123']

Note the double backslashes here, inside string literals, denote a single literal backslash.

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397