2

I have a database, similar to this (this is just a column):

OPÇÃO IBOVESPA - 13/01/2021
OPÇÃO IBOVESPA - 16/12/2020
IDICFPBB    FPBB
OPD IDI/JPFT
Opção s/disp./Índice - IDIOPDFPD3
Opção s/disp./Índice - IDIOPDFPF5
Opção s/disp./Índice - IDIOPDJPF1
BBDC PN - 21/12/2020
BOVA CI - 21/12/2020

and I need a code that returns only the lines that contain date and form two columns: a column with the date and another with the first argument of the line, example:

OPÇÃO IBOVESPA | 13/01/2021
OPÇÃO IBOVESPA | 16/12/2020
BBDC PN | 21/12/2020
BOVA CI | 21/12/2020

I tried to use regex but I can't fix the code, can you help me?

1 Answers1

2

You can use .str.contains to first filter the rows which contains date, then split those rows around the delimiter - to get the desired result:

m = df['COL'].str.contains(r'\d{2}/\d{2}/\d{4}')
df.loc[m, 'COL'].str.split(r'\s-\s', expand=True)

Or you can use .str.extract with regex capturing groups to extract the rows which contains pattern where the first argument and date are seperated around the delimiter -:

df['COL'].str.extract(r'(.+)\s-\s(\d{2}/\d{2}/\d{4})').dropna(how='all')

Result:

                0           1
0  OPÇÃO IBOVESPA  13/01/2021
1  OPÇÃO IBOVESPA  16/12/2020
7         BBDC PN  21/12/2020
8         BOVA CI  21/12/2020
Shubham Sharma
  • 38,395
  • 6
  • 14
  • 40