3
import pandas as pd      
dataframe = pd.DataFrame({'Date':['This 1A1619 person BL171111 the A-1-24',
                                      'dont Z112 but NOT 1-22-2001',
                                      'mix: 1A25629Q88 or A13B ok'], 
                              'IDs': ['A11','B22','C33'],
                              }) 

I have the dataframe above. I want to extract the all the mixed letters and number strings (1A1619,BL171111, Z112, 1A25629Q88, A13B) . To do so, I tried to combine code from replace words and strings pandas and identify letter/number combinations using regex and storing in dictionary. My code looks like this

pat = r'(?<!\S)(?:[a-zA-Z]+\d|\d+[a-zA-Z])[a-zA-Z0-9]*(?!\S)'
dataframe['new'] = dataframe['Date'].str.extract(pat, expand=True)

But doing so gives me an error

ValueError: pattern contains no capture groups

I looked here pandas ValueError: pattern contains no capture groups but I am still unsure how to alter my code so I can extract all the mixed letters and number strings (1A1619,BL171111, Z112, 1A25629Q88, A13B)

What can I do to my code

dataframe['new'] = dataframe['Date'].str.extract(pat, expand=True) 

to extract what I want?

  • 1
    See if wrapping all of it with `()` works: `pat = r'((? – acdcjunior Aug 29 '19 at 23:21
  • yes, that works! –  Aug 29 '19 at 23:24
  • such a simple fix...can you please explain why `()` creates the "capture group", and what a "capture group" is exactly? –  Aug 29 '19 at 23:28
  • 1
    In Pandas, `str.extract` *returns only the capturing group contents. The pattern used with extract requires at least 1 capturing group*. [*Capturing group*](https://www.regular-expressions.info/brackets.html) details. – Wiktor Stribiżew Aug 30 '19 at 00:12

0 Answers0