0

What is the meaning of string locator ', \s*([^\.]*)\s*\.' =?

I have a dataframe identical to Extract sub-string between 2 special characters from one column of Pandas DataFrame

and want to extract the substring located between "," and ".". Thanks to the post answer, a way would be as below:

In [157]: df['Title'] = df.Name.str.extract(r',\s*([^\.]*)\s*\.', expand=False)

In [158]: df
Out[158]:
                   Name   Title
0        Jim, Mr. Jones      Mr
1     Sara, Miss. Baker    Miss
2     Leila, Mrs. Jacob     Mrs
3  Ramu, Master. Kuttan  Master

Although I see the outcome being correct, what is the meaning of ',\s*([^\.]*)\s*\.'? In particular, what is the meaning of '*' and '\'?

Cuisilopez
  • 39
  • 1
  • 6
  • 1
    @JustBaron. The first = symbol was part of the question mark, not the expression =) – Mad Physicist Sep 08 '18 at 14:39
  • Possible duplicate of [Reference - What does this regex mean?](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – Paolo Sep 08 '18 at 14:49

1 Answers1

2

It means the following, match:

  • a , (comma)
  • followed by \s* zero or more whitespaces characters (tab, spaces, etc)
  • followed by ([^\.])* zero or more characters that are not a . (dot)
  • followed by \s* zero or more whitespaces characters
  • followed by a \. (dot)

You can find more about regex in here.

UPDATE

As @UnbearableLightness mentioned the character \ is redundant inside a character set to escape the . (dot). A character set is anything defined between [].

Dani Mesejo
  • 43,691
  • 6
  • 29
  • 53