2

I am newbie to Python. I have a CSV file with below data as an example. I wanted to skip rows between specific date range(2018-08-01 to 2018-08-28) and redirect output to a separate CSV file. Please note, a blank space in the header "LAST USE".

NUMBER,MAIL,COMMENT,COUNT,LAST USE,PERCENTAGE,TEXTN
343,user1@example.com,"My comment","21577",2018-08-06,80.436%,
222,user2@example.com,"My comment","31181",2018-07-20,11.858%,
103,user3@example.com,"My comment",540,2018-06-14,2.013%,
341,user4@example.com,"My comment",0,N/A,0.000%,

Any idea would be greatly appreciated.

jpp
  • 134,728
  • 29
  • 196
  • 240
Rio
  • 410
  • 4
  • 19

2 Answers2

2

With Pandas, this is straightforward:

import pandas as pd

# read file
df = pd.read_csv('file.csv')

# convert to datetime
df['LAST USE'] = pd.to_datetime(df['LAST USE'])

# calculate mask
mask = df['LAST USE'].between('2018-08-01', '2018-08-28')

# output masked dataframes
df[~mask].to_csv('out1.csv', index=False)
df[mask].to_csv('out2.csv', index=False)

You can also combine Boolean arrays to construct mask. For example:

m1 = df['LAST USE'] >= (pd.to_datetime('now') - pd.DateOffset(days=30))
m2 = df['LAST USE'] <= pd.to_datetime('now')
mask = m1 & m2
jpp
  • 134,728
  • 29
  • 196
  • 240
  • Absolutely perfect Sir. Only one query, how to make the dates dynamic? I tried - mask = df['LAST USE'].between(date.today() - timedelta(30), date.today()) but seems not ok. – Rio Aug 29 '18 at 15:20
  • @SamironMallick, There are *many* ways to go about *that* problem, have a look at some other answers, e.g. [here](https://stackoverflow.com/questions/703907/how-would-i-compute-exactly-30-days-into-the-past-with-python-down-to-the-minut) and [here](https://stackoverflow.com/questions/441147/how-to-subtract-a-day-from-a-date). If you are still stuck, ask [a new question](https://stackoverflow.com/questions/ask) showing precisely where you got stuck. – jpp Aug 29 '18 at 15:21
0

dict reader documentation: https://docs.python.org/3/library/csv.html#csv.DictReader

Strptime documentation: https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior

Basically, we start by opening the CSV file as a set of python dictionaries - one for each row, then iterate over all the rows in the CSV.

For each row, we convert the date/time string to an actual date/time object that python can then compare with your date range. If the value is within the range, we will write the entire row to a separate CSV file.

import datetime, csv

#define all the fieldnames in the input CSV file (for use in creating / appending to output CSV file)
fieldnames = ['NUMBER','MAIL','COMMENT','COUNT','LAST USE','PERCENTAGE','TEXTN']

#open input CSV file as readonly
with open("input.csv", "r") as fin:
    #create a CSV dictionary reader object
    csv_dreader = csv.DictReader(fin)
    #iterate over all rows in CSV dict reader
    for row in csv_dreader:
        #check for invalid Date values
        if 'N/A' not in row['LAST USE']:
            #convert date string to a date object
            datetime_val = datetime.datetime.strptime(row['LAST USE'], '%Y-%m-%d')
            #check if date falls within requested range
            if datetime_val > datetime.datetime(2018, 8, 1) and datetime_val < datetime.datetime(2018, 8, 28):                
                #if it does, open output CSV file for appending
                with open("output.csv", "a") as fout:
                    #create a csv writer object using the fieldnames defined above
                    csv_writer = csv.DictWriter(fout, fieldnames=fieldnames)
                    #write the current row (from the input CSV) to the output CSV file
                    csv_writer.writerow(row)
vicente louvet
  • 264
  • 2
  • 9