1

I have a Python Pandas DataFrame like this:

Name  
Jim, Mr. Jones
Sara, Miss. Baker
Leila, Mrs. Jacob
Ramu, Master. Kuttan 

I would like to extract only name title from Name column and copy it into a new column named Title. Output DataFrame looks like this:

Name                    Title
Jim, Mr. Jones          Mr
Sara, Miss. Baker       Miss
Leila, Mrs. Jacob       Mrs
Ramu, Master. Kuttan    Master

I am trying to find a solution with regex but failed to find a proper result.

raja
  • 340
  • 6
  • 18

2 Answers2

1
In [157]: df['Title'] = df.Name.str.extract(r',\s*([^\.]*)\s*\.', expand=False)

In [158]: df
Out[158]:
                   Name   Title
0        Jim, Mr. Jones      Mr
1     Sara, Miss. Baker    Miss
2     Leila, Mrs. Jacob     Mrs
3  Ramu, Master. Kuttan  Master

or

In [163]: df['Title'] = df.Name.str.split(r'\s*,\s*|\s*\.\s*').str[1]

In [164]: df
Out[164]:
                   Name   Title
0        Jim, Mr. Jones      Mr
1     Sara, Miss. Baker    Miss
2     Leila, Mrs. Jacob     Mrs
3  Ramu, Master. Kuttan  Master
MaxU
  • 173,524
  • 24
  • 290
  • 329
0

Have a look at str.extract.

The regexp you are looking for is (?<=, )\w+(?=.). In words: take the substring that is preceded by , (but do not include), consists of at least one word character, and ends with a . (but do not include). In future, use an online regexp tester such as regex101; regexps become rather trivial that way.

This is assuming each entry in the Name column is formatted the same way.

svdc
  • 124
  • 1
  • 9