I'm looking for a clean way to extract some data from a string using regex and the python re
module. Each line of the string is of the form key = value
. There are only certain keys that I'm interested in, but for some strings these keys may be missing. I can think of a few ways to do this by iterating over the string line by line, or by using re.finditer()
, but what I'd really like to do is use named groups and a single call to re.match()
, to end with a dictionary of groups using .groupdict()
method of the returned match object. I can do that using named groups when all the groups are present, but it seems that if I make groups optional then they don't get matched even when present.
I'm probably missing something obvious, but is there a way to do this in a single regex or do I need a multistep process?
import re
# trying to extract 'type', 'count' and 'destinations'.
# string1 has all keys and a single re.match works
# string2 is missing 'count'... any suggestions?
string1 = """
Name: default
type = Route
status = 0
count = 5
enabled = False
start_time = 18:00:00
end_time = 00:00:00
destinations = default
started = False
"""
string2 = """
Name: default
type = Route
status = 0
enabled = False
start_time = 18:00:00
end_time = 00:00:00
destinations = default
started = False
"""
pattern = re.compile(r"(?s).*type = (?P<type>\S*).*count = (?P<count>\S*).*destinations = (?P<destinations>\S*)")
m1 = re.match(pattern,string1)
# m1.groupdict() == {'type': 'Route', 'count': '5', 'destinations': 'default'}
m2 = re.match(pattern,string2)
# m2 == None