1

I have searched some methods how to read csv files where values contain comma , but I have not ever seen how to read it only by pandas successfully.

For example, the csv file contains "A", "B", "C", "D", "E", "F" columns where only "C" column values contain comma.

The type of C column values is string

I have tried this:

pd.read_csv('my.csv',quotechar="'")

but it returns

CParserError: Error tokenizing data. C error: Expected 6 fields in line 1553, saw 7

Update:

Some values in C column started with comma like ",hello" while some commas among the values like "hello,hello,hello"

How can I set the parameters quotechar to solve my problems ?

Jack
  • 1,408
  • 3
  • 14
  • 26
  • Does C column have quotes around? – ayhan May 03 '16 at 07:57
  • 6
    Better yet, can you post a few lines of your CSV here (say lines 1550 to 1555)? – ayhan May 03 '16 at 08:07
  • 1
    Without an example it is unclear, but this might be relevant; http://stackoverflow.com/questions/24079304/numpy-genfromtxt-pandas-read-csv-ignore-commas-within-quote-marks – atomh33ls May 03 '16 at 08:32
  • Sorry, I have checked data again and the detailed has been updated. – Jack May 04 '16 at 00:37
  • Maybe you need rather `quotechar = '"'` (a quote between apostrophes). If neither of the combinations work, the data in the file may be improperly formatted (in line 1553?). – ptrj May 04 '16 at 04:50

1 Answers1

1

I had that kind of problems while trying to parse with pandas a CSV file containing SQL queries, thus involving commas inside some columns.

To solve that problem, we had to use another separator than a comma for our columns, and set the 'sep' attribute from pandas.read_csv accordingly, like that :

df = pd.read_csv(path, sep=';')

Personnaly, since I'm lazy, if I were you I'll just change (or ask to change) the delimiter from comma to something else (like semicolon) in the CSV you have as an input.

But if you can't, here's something I found while looking for a solution :

Pandas Read CSV with string delimiters via regex

As you can see inside that code, a regex was used, and allowed the user to parse its csv file while delimiters were not clearly defined for pandas, by stating in the regex which value to extract and how to do it.

I'm no expert in regex, but it might fit your needs.

Community
  • 1
  • 1
Kaël
  • 163
  • 1
  • 12