0

I'm looking to read in multiple CSV files from the same directory and store them into separate pandas dfs. The CSVs don't have the same column headings. The code successfully lists all of the csv files in the directory but it errors when I run the rest. Here is my code currently:

import pandas as pd
import os
import glob

path = "/file/path/"
all_files = glob.glob(os.path.join(path, "*.csv"))

for file in all_files:
    file_name = os.path.splitext(os.path.basename(file))[0]
    dfn = pd.read_csv(file)
    dfn.index.name = file_name

I get the error message "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 137: invalid start byte".

willd9
  • 51
  • 5

1 Answers1

0

In both the UTF-8 and 'latin1' character tables, 0xa3 is the British pound symbol £; and is non-ASCII. As such, passing 'latin1' to the encoding parameter should do the trick.

So this line:

dfn = pd.read_csv(file)

Becomes:

dfn = pd.read_csv(file, encoding='latin1')

Further debugging:

In the event your file doesn't actually contain utf-8 encoded data, and using 'latin1' does not work, this suggests the files are encoded using a different code page. To help determine the encoding, this SO question might be of help.

Or, open the CSV in a text editor and look at the character in position 137 (as mentioned in the error), then find the code page which lists this character as 0xa3. Here is a link to Python's standard encodings.

S3DEV
  • 4,857
  • 3
  • 14
  • 27