-3

My dataframe:

pd.DataFrame({'module_num': ['Assignment Module 6 Due', 'Review of Module 2 Checklist', 'Welcome to Module 7 Chapter 25']})

I am new to regular expressions in python and I was hoping I could get the following output using regex and pandas:

    pd.DataFrame({'module_num': ['Module 6', 'Module 2', 'Module 7']})

So, I am trying to match on the string "Module" and the number that comes after it. There is a white space in between Module and the number in every case.

2 Answers2

0

Use, Series.str.extract

regex demo

df.module_num.str.extract("(Module \d+)")

          0
0  Module 6
1  Module 2
2  Module 7
sushanth
  • 6,960
  • 3
  • 13
  • 23
0

Use Named group followed by white space and digit to .str.extract

df.module_num.str.extract('(?P<module_num>Module\s\d)')



     module_num
    0  Module 6
    1  Module 2
    2  Module 7

How it works (?P<name>group) captures the named group refered to by name. name must be an alphanumeric sequence starting with a letter.

\s is for white space

\d special character for digit.

wwnde
  • 14,189
  • 2
  • 8
  • 21