1

I want a regular expression that captures names like "James Allen" and "Allen, James" with the naming group first and last. Here is what i have:

(?P<first>\w+), (?P<last>\w+)|(?P<last>\w+) (?P<first>\w+)

but it's causing a subpattern naming error. How do I fix it so that it will only match only one of the patterns. I want to keep the group name "first" and "last".

Olivier Melançon
  • 19,112
  • 3
  • 34
  • 61
  • Edit the post to add the piece of code where you are defining the pattern and using it to make sure that the problem is in the pattern meaning and that there is no other errors in the code – Hemerson Tacon Sep 30 '18 at 03:24

1 Answers1

0

A named symbolic group requires a name. It takes the form (?P<name>...). In your example, you forgot to provide a name for your groups.

Unfortunately, a group name cannot be reused, thus the following is an error.

re.compile(r'(?P<last>\w+), (?P<first>\w+)|(?P<first>\w+) (?P<last>\w+)')
# sre_constants.error: redefinition of group name 'first' ...

The above error happens because re is not smart enough to know that only one of each name will be matched. Thus you will have to catch the pattern and then extract first and last.

import re

def get_name(name):
    match = re.match(r'(\w+), (\w+)|(\w+) (\w+)', name)

    return {'first': match[2] or match[3], 'last': match[1] or match[4]}

print(get_name('James Allen'))
print(get_name('Allen, James'))

Output

{'first': 'James', 'last': 'Allen'}
{'first': 'James', 'last': 'Allen'}
Olivier Melançon
  • 19,112
  • 3
  • 34
  • 61