1
import pandas as pd
df= pd.DataFrame({'Date':['nothing ',
                              'This 1A1619 A124 person BL171111 the A-1-24 and ',
                              'dont Z112 but NOT 12-24-1981',
                               'nada here either',
                              'mix: 1A25629Q88 or A13B ok A1 the A16'],
                      'IDs': ['A11','B22','C33', 'D44', 'E55'],
                      })

This is a follow up and variation to pulling mixed letters and numbers. Using this code

pat = r'((?<!\S)(?:[a-zA-Z]+\d|\d+[a-zA-Z])[a-zA-Z0-9]*(?!\S))'
df['Date'].str.extractall(pat)

gives me

        0
   match    
1   0   1A1619
    1   A124
    2   BL171111
2   0   Z112
4   0   1A25629Q88
    1   A13B
    2   A1
    3   A16

I am looking to add NaN where the regex doesnt match. So I would like something this instead

        0
   match    
0   NaN
1   0   1A1619
1   A124
2   BL171111
2   0   Z112
3   NaN
4   0   1A25629Q88
    1   A13B
    2   A1
    3   A16

How would I alter my code to do so?

1 Answers1

1

Given s is the return of df['Date'].str.extractall(pat), we can:

i = df.index.difference(s.index.get_level_values(0))
o = pd.DataFrame({0: np.nan}, index=[i, [0]*len(i)])
adjust = lambda s,o: pd.concat([s, o]).sort_index()

Then

>>> adjust(s,o)

                  0
  match            
0 0             NaN
1 0          1A1619
  1            A124
  2        BL171111
2 0            Z112
3 0             NaN
4 0      1A25629Q88
  1            A13B
  2              A1
  3             A16
rafaelc
  • 48,227
  • 12
  • 46
  • 72