2

I have a Dataframe with the following date field:

463    14-05-2019
535    03-05-2019
570    11-05-2019
577    09-05-2019
628    08-08-2019
630    25-05-2019
Name: Date, dtype: object

I have to format it as DDMMAAAA. This is what I'm doing inside a loop (for idx, row in df.iterrows():):

  • I'm removing the \- char using regex:

    df.at[idx, 'Date'] = re.sub('\-', '', df.at[idx, 'Date'])

  • then using apply to enforce and an 8 digit string with leading zeros

    df['Date'] = df['Date'].apply(lambda x: '{0:0>8}'.format(x))

But even though the df['Date'] field has the 8 digits with the leading 0 on the df, whent exporting it to csv the leading zeros are being removed on the exported file like below.

df.to_csv(path_or_buf=report, header=True, index=False, sep=';')

field as in csv:

Dt_DDMMAAAA
30102019
12052019
7052019
26042019
3052019
22042019
25042019
2062019

I know I must be missing the point somewhere along the way here, but I just can't figure out what the issue (or if it's even an issue, rather then a misused method).

Community
  • 1
  • 1
  • 1
    I think it is very important to understand that your method is producing the correct output in the file, but that you may be reading the file back incorrectly. It would help this answer a lot if you add more detail about how you are checking/reading the CSV. – chthonicdaemon Jun 01 '19 at 03:36
  • I export it to an csv to send it to another system, through an external integration (i don't have access to it). And it is required that I send it as csv. – Lucas Vasconcellos Czepaniki Jun 01 '19 at 03:48
  • And thanks for pointing that out, I was validating by opening the csv in wps, but now I checked in a text editor and the leading zeros are there!! So the problem is not related to the code, but the way I was opening it.... -_- – Lucas Vasconcellos Czepaniki Jun 01 '19 at 03:50

3 Answers3

2

IMO the simplest method is to use the date_format argument when writing to CSV. This means you will need to convert the "Date" column to datetime beforehand using pd.to_datetime.

(df.assign(Date=pd.to_datetime(df['Date'], errors='coerce'))
   .to_csv(path_or_buf=report, date_format='%d%m%Y', index=False))

This prints,

Date
14052019
05032019
05112019
05092019
08082019
25052019

More information on arguments to to_csv can be found in Writing a pandas DataFrame to CSV file.

cs95
  • 274,032
  • 76
  • 480
  • 537
2

What i will do is using strftime + 'to_excel`, since In csv , if you open it with text , it will show the leading zero, since csv will not keeping any format when display, in that case , you can using excel

pd.to_datetime(df.Date,dayfirst=True).dt.strftime('%m%d%Y').to_excel('your.xls')
Out[722]: 
463    05142019
535    05032019
570    05112019
577    05092019
628    08082019
630    05252019
Name: Date, dtype: object
BENY
  • 258,262
  • 17
  • 121
  • 165
1

Firstly, your method is producing a file which contains leading zeros just as you expect. I reconstructed this minimal working example from your description and it works just fine:

import pandas
import re

df = pandas.DataFrame([["14-05-2019"],
                       ["03-05-2019"],
                       ["11-05-2019"],
                       ["09-05-2019"],
                       ["08-08-2019"],
                       ["25-05-2019"]], columns=['Date'])

for idx in df.index:
    df.at[idx, 'Date'] = re.sub('\-', '', df.at[idx, 'Date'])
df['Date'] = df['Date'].apply(lambda x: '{0:0>8}'.format(x))

df.to_csv(path_or_buf="report.csv", header=True, index=False, sep=';')

At this point report.csv contains this (with leading zeros just as you wanted).

Date
14052019
03052019
11052019
09052019
08082019
25052019

Now as to why you thought it wasn't working. If you are mainly in Pandas, you can stop it from guessing the type of the output by specifying a dtype in read_csv:

df_readback = pandas.read_csv('report.csv', dtype={'Date': str})


       Date
0  14052019
1  03052019
2  11052019
3  09052019
4  08082019
5  25052019

It might also be that you are reading this in Excel (I'm guessing this from the fact that you are using ; separators). Unfortunately there is no way to ensure that Excel reads this field correctly on double-click, but if this is your final target, you can see how to mangle your file for Excel to read correctly in this answer.

chthonicdaemon
  • 16,668
  • 1
  • 39
  • 59