If line contains date in DD/MM/YY print line

Question

I have a database, similar to this (this is just a column):

OPÇÃO IBOVESPA - 13/01/2021
OPÇÃO IBOVESPA - 16/12/2020
IDICFPBB    FPBB
OPD IDI/JPFT
Opção s/disp./Índice - IDIOPDFPD3
Opção s/disp./Índice - IDIOPDFPF5
Opção s/disp./Índice - IDIOPDJPF1
BBDC PN - 21/12/2020
BOVA CI - 21/12/2020

and I need a code that returns only the lines that contain date and form two columns: a column with the date and another with the first argument of the line, example:

OPÇÃO IBOVESPA | 13/01/2021
OPÇÃO IBOVESPA | 16/12/2020
BBDC PN | 21/12/2020
BOVA CI | 21/12/2020

I tried to use regex but I can't fix the code, can you help me?

None of those dates are 2-digit years. Your question is confusing. — MonkeyZeus, Jan 27 '21 at 15:05
Do you wish to capture only valid dates or capture "date-like" formatting of digits? What should happen when `54/22/2049` is encountered? — MonkeyZeus, Jan 27 '21 at 15:07
`(.*?) - (\b\d\d\/\d\d\/\d\d\d\d)$` would do it. You'll have your desired data in `\1` and `\2` capture groups respectively — MonkeyZeus, Jan 27 '21 at 15:13
Please add your regex! Commonly this would be some form of 1) find all lines which match regex and put them in some collection, 2) see if the regex match is a real _whatever_ (in your case a date) or skip/raise Exception 3) keep the filtered collection — ti7, Jan 27 '21 at 15:13
@IsabellaSchneider Is the first argument and date always separated by the delimiter `-` ? — Shubham Sharma, Jan 27 '21 at 15:32
@MonkeyZeus could you help me assemble that part of the code? I never used regex! — Isabella Schneider, Jan 27 '21 at 15:58
@ti7 in this case I need to 1) make a column with the date found 2) make another column without the date, only with the initial data — Isabella Schneider, Jan 27 '21 at 15:59
I'm not a Python programmer but you can visit https://regex101.com/, plug in your regex and data sample, and then click the "code generator" button to get started. — MonkeyZeus, Jan 27 '21 at 16:02
Opened the question because the it is wrongly marked as dupe. — Shubham Sharma, Mar 08 '21 at 13:12

score 2 · Accepted Answer · answered Jan 27 '21 at 16:16

2

You can use .str.contains to first filter the rows which contains date, then split those rows around the delimiter - to get the desired result:

m = df['COL'].str.contains(r'\d{2}/\d{2}/\d{4}')
df.loc[m, 'COL'].str.split(r'\s-\s', expand=True)

Or you can use .str.extract with regex capturing groups to extract the rows which contains pattern where the first argument and date are seperated around the delimiter -:

df['COL'].str.extract(r'(.+)\s-\s(\d{2}/\d{2}/\d{4})').dropna(how='all')

Result:

                0           1
0  OPÇÃO IBOVESPA  13/01/2021
1  OPÇÃO IBOVESPA  16/12/2020
7         BBDC PN  21/12/2020
8         BOVA CI  21/12/2020

answered Jan 27 '21 at 16:16

Shubham Sharma

38,395
6
14
40

This assumes they a dataframe! While really, they may not be able to receive a copy in a decent form (such as a simple text report) – ti7 Jan 27 '21 at 16:46
@ti7 As the question is tagged with `pandas` hopefully the OP might already have a pandas dataframe ;) – Shubham Sharma Jan 27 '21 at 17:03
1

It worked! thanks! @ShubhamSharma – Isabella Schneider Jan 27 '21 at 18:26

If line contains date in DD/MM/YY print line

1 Answers1