Skip the headers when editing a csv file using Python

Question

I am using below referred code to edit a csv using Python. Functions called in the code form upper part of the code.

Problem: I want the below referred code to start editing the csv from 2nd row, I want it to exclude 1st row which contains headers. Right now it is applying the functions on 1st row only and my header row is getting changed.

in_file = open("tmob_notcleaned.csv", "rb")
reader = csv.reader(in_file)
out_file = open("tmob_cleaned.csv", "wb")
writer = csv.writer(out_file)
row = 1
for row in reader:
    row[13] = handle_color(row[10])[1].replace(" - ","").strip()
    row[10] = handle_color(row[10])[0].replace("-","").replace("(","").replace(")","").strip()
    row[14] = handle_gb(row[10])[1].replace("-","").replace(" ","").replace("GB","").strip()
    row[10] = handle_gb(row[10])[0].strip()
    row[9] = handle_oem(row[10])[1].replace("Blackberry","RIM").replace("TMobile","T-Mobile").strip()
    row[15] = handle_addon(row[10])[1].strip()
    row[10] = handle_addon(row[10])[0].replace(" by","").replace("FREE","").strip()
    writer.writerow(row)
in_file.close()    
out_file.close()

I tried to solve this problem by initializing row variable to 1 but it didn't work.

Please help me in solving this issue.

possible duplicate of [When processing CSV data, how do I ignore the first line of data?](http://stackoverflow.com/questions/11349333/when-processing-csv-data-how-do-i-ignore-the-first-line-of-data) — Louis, Aug 19 '15 at 14:21

score 421 · Accepted Answer · edited Apr 20 '21 at 08:50

421

Your reader variable is an iterable, by looping over it you retrieve the rows.

To make it skip one item before your loop, simply call next(reader, None) and ignore the return value.

You can also simplify your code a little; use the opened files as context managers to have them closed automatically:

with open("tmob_notcleaned.csv", "rb") as infile, open("tmob_cleaned.csv", "wb") as outfile:
   reader = csv.reader(infile)
   next(reader, None)  # skip the headers
   writer = csv.writer(outfile)
   for row in reader:
       # process each row
       writer.writerow(row)

# no need to close, the files are closed automatically when you get to this point.

If you wanted to write the header to the output file unprocessed, that's easy too, pass the output of next() to writer.writerow():

headers = next(reader, None)  # returns the headers or `None` if the input is empty
if headers:
    writer.writerow(headers)

edited Apr 20 '21 at 08:50

Flow

22,048
13
91
147

answered Jan 10 '13 at 12:07

Martijn Pieters

889,049
245
3,507
2,997

25

An alternative is also to use `for row in islice(reader, 1, None)` - although less explicit than `next` for most simple "skip one line" jobs, for skipping multiple header rows (or getting only certain chunks etc...) it's quite handy – Jon Clements Jan 10 '13 at 12:12
I'd consider using `try: writer.write(next(reader))... except StopIteration: # handle empty reader` – Jon Clements Jan 10 '13 at 13:07
@JonClements: Perhaps. This works well enough without having to teach about `try:` / `except:`. – Martijn Pieters Jan 10 '13 at 13:09
1

@JonClements: Advantage to explicit `next` iteration is that it's "free"; `islice` would wrap the `reader` forever adding (an admittedly very small amount of) overhead to each iteration. The [`consume` recipe from `itertools`](https://docs.python.org/3/library/itertools.html#itertools-recipes) can be used to skip many values quickly, without adding wrapping to subsequent usage, in the case where the `islice` would have a `start` but no `end`, so the overhead isn't gaining you anything. – ShadowRanger Jan 05 '16 at 14:37

score 133 · Answer 2 · answered Mar 19 '15 at 23:37

133

Another way of solving this is to use the DictReader class, which "skips" the header row and uses it to allowed named indexing.

Given "foo.csv" as follows:

FirstColumn,SecondColumn
asdf,1234
qwer,5678

Use DictReader like this:

import csv
with open('foo.csv') as f:
    reader = csv.DictReader(f, delimiter=',')
    for row in reader:
        print(row['FirstColumn'])  # Access by column header instead of column number
        print(row['SecondColumn'])

answered Mar 19 '15 at 23:37

Chad Zawistowski

1,546
1
10
15

23

I feel like this is the real answer, as the question seems to be an example of [XY problem](http://mywiki.wooledge.org/XyProblem). – MariusSiuram Sep 23 '16 at 08:28
3

DictReader is definitely the way to go – Javier Arias Aug 15 '17 at 11:43
7

It is important to note that this only works if you omit the field names parameter when constructing the DictReader. Per the documentation: `If the fieldnames parameter is omitted, the values in the first row of the file f will be used as the fieldnames.` See https://docs.python.org/2/library/csv.html – BuvinJ Mar 01 '18 at 14:14

score 8 · Answer 3 · answered Jan 10 '13 at 12:06

8

Doing row=1 won't change anything, because you'll just overwrite that with the results of the loop.

You want to do next(reader) to skip one row.

answered Jan 10 '13 at 12:06

Katriel

107,638
19
124
160

I tried changing it to `for row in next(reader):` but it is giving me `IndexError: string index out of range` error – Jan 10 '13 at 12:09
Use it before the for loop: `next(reader); for row in reader:` .... – dlazesz May 07 '20 at 14:01

Darío López Padial · Answer 4 · 2020-11-04T11:30:24.147

0

Inspired by Martijn Pieters' response.

In case you only need to delete the header from the csv file, you can work more efficiently if you write using the standard Python file I/O library, avoiding writing with the CSV Python library:

with open("tmob_notcleaned.csv", "rb") as infile, open("tmob_cleaned.csv", "wb") as outfile:
   next(infile)  # skip the headers
   outfile.write(infile.read())

edited Nov 04 '20 at 11:30

answered Oct 30 '20 at 18:18

Darío López Padial

21
6

You seem to overlook the `# process each row` part in Martijn's answer, which stands for all the stuff the op wants to with the rows, as well as the fact that the op wants a _csv-file as output_? Of course you can avoid using the `csv` module altogether. But what's the point, it's from the _standard_ library? – Timus Oct 30 '20 at 20:52
In my case, I only want to remove the header from the `csv` file, and I don't want to process anything. For this reason, I write using the standard library, because it is faster. I will edit my comment to be more clear. – Darío López Padial Nov 03 '20 at 09:56
1

I see. In that case you don't need the `csv` module at all: Just `next(infile)` without instantiating a `csv.reader` should do it (the output of `open` is also an iterator). – Timus Nov 03 '20 at 12:27

Skip the headers when editing a csv file using Python

4 Answers4

Linked

Related