3

Possible Duplicate:
Open a file in the proper encoding automatically

my code:

import csv

def handle_uploaded_file(f):
  dataReader = csv.reader(f, delimiter=';', quotechar='"')

for row in dataReader:
  do_sth

the problem is that it works well only if csv is UTF-8 encoded. What should I change to serve the iso-8859-2 or windows-1250 encoding? (the best solution is to autorecognize the encoding, but hand converting is also acceptable)

Community
  • 1
  • 1
Tomasz Brzezina
  • 1,083
  • 5
  • 15
  • 38

3 Answers3

5

The solution for now:

def reencode(file):
    for line in file:
        yield line.decode('windows-1250').encode('utf-8')

csv_reader = csv.reader(reencode(open(filepath)), delimiter=";",quotechar='"')
Tomasz Brzezina
  • 1,083
  • 5
  • 15
  • 38
  • 2
    this is not he crrect answer , csv documentation : Since open() is used to open a CSV file for reading, the file will by default be decoded into unicode using the system default encoding (see locale.getpreferredencoding()). To decode a file using a different encoding, use the encoding argument of open: –  Dec 15 '17 at 17:15
  • 3
    I was able to open the file with `open(filename, 'r', encoding='latin-1') as f:` and it fixed the encoding errors I was getting. A standard list of encodings can be found here: https://docs.python.org/3/library/codecs.html#standard-encodings – Max Candocia Jan 09 '18 at 16:10
3

Have a look at the examples section of the csv module documentation. At the end, you'll find classes you can use for exactly that purpose, specifying the encoding.

zigg
  • 17,690
  • 6
  • 34
  • 52
1

Pass a file-descriptor opened with codecs.open. You can't autorecognize encodings, or not very well. To guess the encoding you can use chardet.

dav1d
  • 5,537
  • 1
  • 28
  • 50