Reading in multiple CSVs to Pandas Dataframe

Question

I'm looking to read in multiple CSV files from the same directory and store them into separate pandas dfs. The CSVs don't have the same column headings. The code successfully lists all of the csv files in the directory but it errors when I run the rest. Here is my code currently:

import pandas as pd
import os
import glob

path = "/file/path/"
all_files = glob.glob(os.path.join(path, "*.csv"))

for file in all_files:
    file_name = os.path.splitext(os.path.basename(file))[0]
    dfn = pd.read_csv(file)
    dfn.index.name = file_name

I get the error message "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 137: invalid start byte".

You probably have a different seperator than the default one which is comma. — Erfan, Sep 30 '20 at 14:12
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html — Paul H, Sep 30 '20 at 14:18
I've checked a number of them and they appear to be comma delimited — willd9, Sep 30 '20 at 14:39
Is the encoding of your csv files already utf-8? If not: https://stackoverflow.com/questions/18171739 — Bill Huang, Sep 30 '20 at 14:48

S3DEV · Accepted Answer · 2020-09-30T15:14:28.830

In both the UTF-8 and 'latin1' character tables, 0xa3 is the British pound symbol £; and is non-ASCII. As such, passing 'latin1' to the encoding parameter should do the trick.

So this line:

dfn = pd.read_csv(file)

Becomes:

dfn = pd.read_csv(file, encoding='latin1')

Further debugging:

In the event your file doesn't actually contain utf-8 encoded data, and using 'latin1' does not work, this suggests the files are encoded using a different code page. To help determine the encoding, this SO question might be of help.

Or, open the CSV in a text editor and look at the character in position 137 (as mentioned in the error), then find the code page which lists this character as 0xa3. Here is a link to Python's standard encodings.

Reading in multiple CSVs to Pandas Dataframe

1 Answers1

Further debugging: