1

I have SP500 data downloaded from the Fed, a very simple .csv file with two fields; date and price. When I do a pd.read_csv() to load into a dataframe I get two errors:

TypeError: Cannot cast array from dtype('O') to dtype('float64') according to the rule 'safe'

ValueError: could not convert string to float: '.'

It seems to be telling me that the decimal point '.' in the price field is throwing the error.

I have many files that handle the decimal point when converting a string object to a float.

here is the data being used:

observation_date,price
2008-04-04,1340.40
2008-04-07,1372.54
2008-04-08,1365.54
2008-04-09,1354.49
2008-04-10,1360.55
2008-04-11,1332.83
2008-04-14,1328.32
2008-04-15,1334.43
2008-04-16,1364.71

I have also used:

SP500 = pd.read_csv(csv_file)

to load the file give both fields as object dtype then using

SP500.price = SP500.price.astype(float).fillna(0.0)

this throws the same two errors

Community
  • 1
  • 1
  • 1
    I can't reproduce, it reads in fine for me. – user3483203 Apr 07 '18 at 00:15
  • When I key in the data to a .csv file using notepad it works fine. but the downloaded csv file from the Federal Reserve throws the error. The data example I provided is after first loading the original Fed. data allowing pandas to used dtype Object for both fields, then do a db.to_csv(file_csv, decimal='.', index=False, sep=',') to wrter the dataframe to a new csv file, giving the data format shown in the post. However, this new csv file still throws the two errors. Could there be some type of hidden characters from the original data being writing out by the pd.to_csv() command. – Christopher Pfeifer Apr 07 '18 at 00:28
  • here is the link to download the SP500 csv file https://fred.stlouisfed.org/series/SP500 use the download button then choose csv file. I have tried 4 other feb files, they all give the same error when trying to load price data as float fields. the decimal point in the fields throws the error. – Christopher Pfeifer Apr 07 '18 at 00:44

1 Answers1

1

Try specifying that '.' is a NaN value

SP500 = pd.read_csv('SP500.csv', na_values=['.'])
piRSquared
  • 240,659
  • 38
  • 359
  • 510
  • That WORKS! RangeIndex: 2609 entries, 0 to 2608 Data columns (total 2 columns): observation_date 2609 non-null object price 2518 non-null float64 dtypes: float64(1), object(1) memory usage: 40.8+ KB – Christopher Pfeifer Apr 07 '18 at 01:09