2

I have a CSV file that i need to rearrange and renecode. I'd like to run

line = line.decode('windows-1250').encode('utf-8')

on each line before it's parsed and split by the CSV reader. Or I'd like iterate over lines myself run the re-encoding and use just single line parsing form CSV library but with the same reader instance.

Is there a way to do that nicely?

Janusz Skonieczny
  • 13,637
  • 9
  • 48
  • 59

3 Answers3

2

Loop over lines on file can be done this way:

with open('path/to/my/file.csv', 'r') as f:
    for line in f:
        puts line # here You can convert encoding and save lines

But if You want to convert encoding of a whole file You can also call:

$ iconv -f Windows-1250 -t UTF8 < file.csv > file.csv

Edit: So where the problem is?

with open('path/to/my/file.csv', 'r') as f:
    for line in f:
        line = line.decode('windows-1250').encode('utf-8')
        elements = line.split(",")
Dawid
  • 3,926
  • 2
  • 24
  • 30
  • I do not want to read/write the file twice. The iconv solution is lame, I want it done in code no by some tool, I need to crate a tool that will prepare files in one process not instructions to do that. – Janusz Skonieczny Feb 25 '10 at 14:31
  • Again, no support for CSV parsing at the same time, line splitting just won't cut it. – Janusz Skonieczny Feb 25 '10 at 14:39
2

Thx, for the answers. The wrapping one gave me an idea:

def reencode(file):
    for line in file:
        yield line.decode('windows-1250').encode('utf-8')

csv_writer = csv.writer(open(outfilepath,'w'), delimiter=',',quotechar='"', quoting=csv.QUOTE_MINIMAL)
csv_reader = csv.reader(reencode(open(filepath)), delimiter=";",quotechar='"')
for c in csv_reader:
    l = # rearange columns here
    csv_writer.writerow(l)

That's exactly what i was going for re-encoding a line just before it's get parsed by the csv_reader.

Janusz Skonieczny
  • 13,637
  • 9
  • 48
  • 59
2

At the very bottom of the csv documentation is a set of classes (UnicodeReader and UnicodeWriter) that implements Unicode support for csv:

rfile = open('input.csv')
wfile = open('output.csv','w')
csv_reader = UnicodeReader(rfile,encoding='windows-1250')
csv_writer = UnicodeWriter(wfile,encoding='utf-8')
for c in csv_reader:
    # process Unicode lines
    csv_writer.writerow(c)
rfile.close()
wfile.close()
Mark Tolonen
  • 132,868
  • 21
  • 152
  • 208