-2

I have a number of .csv files in a folder (1.csv, 2.csv, 3.csv, etc.) and I need to loop over them all. The output should be a corresponding NEW file for each existing one, but each should only contain 2 columns.

Here is a sample of the csv files:

004,444.444.444.444,448,11:16 PDT,11-24-15
004,444.444.444.444,107,09:55 PDT,11-25-15
004,444.444.444.444,235,09:45 PDT,11-26-15
004,444.444.444.444,241,11:00 PDT,11-27-15

And here is how I would like the output to look:

448,11-24-15
107,11-25-15
235,11-26-15
241,11-27-15

Here is my working attempt at achieving this with Python:

import csv
import os
import glob

path = '/csvs/'
for infile in glob.glob( os.path.join(path, '*csv') ):


    inputfile = open(infile, 'r') 
    output = os.rename(inputfile + ".out", 'w')

#Extracts the important columns from the .csv into a new file
with open(infile, 'r') as source:
    readr = csv.reader(source)
    with open(output,"w") as result:
        writr = csv.writer(result)
        for r in readr:
            writr.writerow((r[4], r[2]))

Using only the second half of this code, I am able to get the desired output by specifying the input files in the code. However, this Python script will be a small part of a much larger bash script that will be (hopefully) fully automated.

How can I adjust the input of this script to loop over each file and create a new one with just the 2 specified columns?

Please let me know if there is anything I need to clarify.

smoothjabz
  • 15
  • 7

2 Answers2

0

You can use pandas library. It offers several functionality for dealing with csv files. read_csv will read the csv file for you and give you a dataframe object. Visit this link to get example about how to write csv file from pandas dataframe. Moreore there are lot of tutorials available on the net.

Community
  • 1
  • 1
Mangu Singh Rajpurohit
  • 8,894
  • 2
  • 50
  • 76
0

inputfile file is a file you openned , but then you are doing -

os.rename(inputfile + ".out", 'w')

This does not work, you are trying to add a string and the openned file using + operator. I am not even sure why you need that line or even the line - inputfile = open(infile, 'r') . You are openning the file again in the with statement.

Another issue -

  1. You specify your path as - path = '/csvs/' , it is highly unlikely that you have a 'csvs' directory under root directory. You may have wanted to use some other relative directory, so you should use relative directory.

You can just do -

path = 'csvs/'
for infile in glob.glob( os.path.join(path, '*csv') ):
    output = infile + '.out'
    with open(infile, 'r') as source:
        readr = csv.reader(source)
        with open(output,"w") as result:
            writr = csv.writer(result)
            for r in readr:
                writr.writerow((r[4], r[2]))
Anand S Kumar
  • 76,986
  • 16
  • 159
  • 156