1

I am trying to import data from text files from particular file path, but am getting error 'utf-8' codec can't decode byte 0xa5 in position 18: invalid start byte

My question is there anyway I can apply "utf-8" encoding to all the text files(about 20 others) I will have to open eventually so I can prevent the above error?

Code:

import pandas as pd
filelist = [r'D:/file1',r'D:/file2']
print (len((pd.concat([pd.read_csv(item, names=[item[:-4]]) for item in filelist],axis=1))))

also open to any suggestions if I am doing something wrong.

Thank you in advance.

RustyShackleford
  • 3,044
  • 2
  • 25
  • 53
  • Does this answer your question? [UnicodeDecodeError: ('utf-8' codec) while reading a csv file](https://stackoverflow.com/questions/33819557/unicodedecodeerror-utf-8-codec-while-reading-a-csv-file) – knanne May 12 '20 at 08:22
  • @knanne yes it does, as well as the below answer selected. Thank you so much for reaching out and providing the link! – RustyShackleford May 18 '20 at 02:28

1 Answers1

2

Not aware of solution to automatically convert encoding to utf-8 in python.

Alternatively, you can find out what the encoding is, and read it accordingly. Then write to file in utf-8.

this solution worked well for my files (credit maxnoe)

import chardet
import pandas as pd

with open('filename.csv', 'rb') as f:
    result = chardet.detect(f.read())  # or readline if the file is large

pd.read_csv('filename.csv', encoding=result['encoding'])

don't forget to pip install chardet

if you now write file using pd.to_csv(), pandas default is to encode in utf-8

Community
  • 1
  • 1
knanne
  • 408
  • 3
  • 10