1

I'm writing a python regular expression tries to capture people's names.

They can be in the form first_name last_name or last_name, first_name.

This is my regular expression for that:

(?P<first>\w+) (?P<last>\w+)|(?P<last>\w+), (?P<first>\w+)

However, it's causing a sub-pattern naming error. Is there a way to fix it?

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • What about `name, surname = re.sub(r'^(\w+),\s+(\w+)$', r'\2 \1', s).split()`? See [demo](https://ideone.com/fMXl8H). – Wiktor Stribiżew Sep 29 '18 at 22:09
  • can't do that. only use regex –  Sep 29 '18 at 22:25
  • Possible duplicate of [Named regular expression group "(?Pregexp)": what does "P" stand for?](https://stackoverflow.com/questions/10059673/named-regular-expression-group-pgroup-nameregexp-what-does-p-stand-for) – KC. Sep 30 '18 at 08:04
  • https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ – Toto Sep 30 '18 at 16:52
  • Glad [my answer](https://stackoverflow.com/a/52578917/3832970) worked for you. Please also consider upvoting if my answer proved helpful to you (see [How to upvote on Stack Overflow?](http://meta.stackexchange.com/questions/173399/how-to-upvote-on-stack-overflow)) as you are entitled to the upvoting privilege after reaching 15 rep points. Note you may upvote all the answers that turned out helpful. – Wiktor Stribiżew Sep 30 '18 at 21:04

2 Answers2

0

Try something like this,
note that Python wants unique capture group names.

r"(?P<first1>\w+)[ ](?P<last1>\w+)|(?P<last2>\w+),[ ](?P<first2>\w+)"

https://regex101.com/r/FUYxTb/1

   (?P<first1> \w+ )             # (1)
   [ ] 
   (?P<last1> \w+ )              # (2)
|  
   (?P<last2> \w+ )              # (3)
   , [ ] 
   (?P<first2> \w+ )             # (4)
  • then how would i assign first_name = match.group('first') when I have 'first1' and 'first2' –  Sep 29 '18 at 22:59
  • @jc1234567890 - Only one set of groups will match. So, `first_name = match.group('first1') + match.group('first2')`. same for last. I.e. https://regex101.com/r/2NSfaU/1 –  Oct 16 '18 at 06:52
0

You may do what you want with the PyPi regex module only as it allows using the same named capturing groups in the single pattern:

import regex
sz = ["first_name last_name","last_name, first_name"]
for s in sz:
    print(regex.search(r'(?P<first>\w+) (?P<last>\w+)|(?P<last>\w+), (?P<first>\w+)', s).groupdict())
# => {'last': 'last_name', 'first': 'first_name'}
# => {'last': 'last_name', 'first': 'first_name'}

See the Python demo.

Else, if your input is always like that, you may swap the first and last name and remove the comma and then just split the string:

name, surname = re.sub(r'^(\w+),\s+(\w+)$', r'\2 \1', s).split()
# => first_name last_name
# => first_name last_name

See another Python demo.

Another alternative: use simple numbered capturing groups with a regular alternation, and then concatenate the corresponding captures:

import re
sz = ["first_name last_name","last_name, first_name"]
for s in sz:
    m = re.search(r'(\w+),\s+(\w+)|(\w+)\s+(\w+)', s)
    if m:
        surname = "{}{}".format(m.group(1) or '', m.group(4) or '')
        name = "{}{}".format(m.group(2) or '', m.group(3) or '') 
        print("{} {}".format(name, surname))
    else:
        print("No match")

Here, r'(\w+),\s+(\w+)|(\w+)\s+(\w+)' has last names in Group 1 or 4 and first names in Group 2 or 3, after joining these groups, you get your match (one of them is always None, thus or '' is required when concatenating).

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397