I'm dealing with some problems in a few files about the encoding. We receive files from other company and have to read them (the files are in csv format)
Strangely, the files appear to be encoded in UTF-16. I am managing to do that, but I have to open them using the codecs
module and specifying the encoding, this way.
ENCODING = 'utf-16'
with codecs.open(test_file, encoding=ENCODING) as csv_file:
# Autodetect dialect
dialect = csv.Sniffer().sniff(descriptor.read(1024))
descriptor.seek(0)
input_file = csv.reader(descriptor, dialect=dialect)
for line in input_file:
do_funny_things()
But, just like I am able to get the dialect in a more agnostic way, I 'm thinking it will be great to have a way of opening automatically the files with its proper encoding, at least all the text files. There are other programs, like vim that achieve that.
Anyone knows a way of doing that in python 2.6?
PD: I hope that this will be solved in Python 3, as all the strings are Unicode...