regular expression for different forms of people's name representation

Question

I'm writing a python regular expression tries to capture people's names.

They can be in the form first_name last_name or last_name, first_name.

This is my regular expression for that:

(?P<first>\w+) (?P<last>\w+)|(?P<last>\w+), (?P<first>\w+)

However, it's causing a sub-pattern naming error. Is there a way to fix it?

What about `name, surname = re.sub(r'^(\w+),\s+(\w+)$', r'\2 \1', s).split()`? See [demo](https://ideone.com/fMXl8H). — Wiktor Stribiżew, Sep 29 '18 at 22:09
Possible duplicate of [Named regular expression group "(?Pregexp)": what does "P" stand for?](https://stackoverflow.com/questions/10059673/named-regular-expression-group-pgroup-nameregexp-what-does-p-stand-for) — KC., Sep 30 '18 at 08:04
https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ — Toto, Sep 30 '18 at 16:52
Glad [my answer](https://stackoverflow.com/a/52578917/3832970) worked for you. Please also consider upvoting if my answer proved helpful to you (see [How to upvote on Stack Overflow?](http://meta.stackexchange.com/questions/173399/how-to-upvote-on-stack-overflow)) as you are entitled to the upvoting privilege after reaching 15 rep points. Note you may upvote all the answers that turned out helpful. — Wiktor Stribiżew, Sep 30 '18 at 21:04

score 0 · Answer 1 · answered Sep 29 '18 at 22:35

0

Try something like this,
note that Python wants unique capture group names.

r"(?P<first1>\w+)[ ](?P<last1>\w+)|(?P<last2>\w+),[ ](?P<first2>\w+)"

https://regex101.com/r/FUYxTb/1

   (?P<first1> \w+ )             # (1)
   [ ] 
   (?P<last1> \w+ )              # (2)
|  
   (?P<last2> \w+ )              # (3)
   , [ ] 
   (?P<first2> \w+ )             # (4)

answered Sep 29 '18 at 22:35

then how would i assign first_name = match.group('first') when I have 'first1' and 'first2' – Sep 29 '18 at 22:59
@jc1234567890 - Only one set of groups will match. So, `first_name = match.group('first1') + match.group('first2')`. same for last. I.e. https://regex101.com/r/2NSfaU/1 – Oct 16 '18 at 06:52

Wiktor Stribiżew · Accepted Answer · 2018-09-30T14:56:50.173

You may do what you want with the PyPi regex module only as it allows using the same named capturing groups in the single pattern:

import regex
sz = ["first_name last_name","last_name, first_name"]
for s in sz:
    print(regex.search(r'(?P<first>\w+) (?P<last>\w+)|(?P<last>\w+), (?P<first>\w+)', s).groupdict())
# => {'last': 'last_name', 'first': 'first_name'}
# => {'last': 'last_name', 'first': 'first_name'}

See the Python demo.

Else, if your input is always like that, you may swap the first and last name and remove the comma and then just split the string:

name, surname = re.sub(r'^(\w+),\s+(\w+)$', r'\2 \1', s).split()
# => first_name last_name
# => first_name last_name

See another Python demo.

Another alternative: use simple numbered capturing groups with a regular alternation, and then concatenate the corresponding captures:

import re
sz = ["first_name last_name","last_name, first_name"]
for s in sz:
    m = re.search(r'(\w+),\s+(\w+)|(\w+)\s+(\w+)', s)
    if m:
        surname = "{}{}".format(m.group(1) or '', m.group(4) or '')
        name = "{}{}".format(m.group(2) or '', m.group(3) or '') 
        print("{} {}".format(name, surname))
    else:
        print("No match")

Here, r'(\w+),\s+(\w+)|(\w+)\s+(\w+)' has last names in Group 1 or 4 and first names in Group 2 or 3, after joining these groups, you get your match (one of them is always None, thus or '' is required when concatenating).

regular expression for different forms of people's name representation

2 Answers2

Related