A named symbolic group requires a name. It takes the form (?P<name>...)
. In your example, you forgot to provide a name for your groups.
Unfortunately, a group name cannot be reused, thus the following is an error.
re.compile(r'(?P<last>\w+), (?P<first>\w+)|(?P<first>\w+) (?P<last>\w+)')
# sre_constants.error: redefinition of group name 'first' ...
The above error happens because re
is not smart enough to know that only one of each name will be matched. Thus you will have to catch the pattern and then extract first
and last
.
import re
def get_name(name):
match = re.match(r'(\w+), (\w+)|(\w+) (\w+)', name)
return {'first': match[2] or match[3], 'last': match[1] or match[4]}
print(get_name('James Allen'))
print(get_name('Allen, James'))
Output
{'first': 'James', 'last': 'Allen'}
{'first': 'James', 'last': 'Allen'}