Sometimes the string numbers in my DataFrames have commas in them representing either decimal or marking the thousand, some do not. The dataframe is an example of the range of price formats I receive via an API and vary depend on the currency. These are prices and the decimals will always be 2. So I need to output the string prices into float so I can sum them or separate them into other dataframes or use them for plotting graphs. I have created a loop to replace them, but is there a quicker way to do this without the loop?
My DataFrame and working loop is as follows:
data = {'amount': ['7,99', '6,99', '9.99', '-6,99', '1,000.00']}
df = pd.DataFrame(data)
fees = []
sales = []
for items in df['amount']:
if items[-7:-6] == ',':
items = float(items.replace(',', '').replace(' ',''))
if items[-3:-2] == ',':
items = float(items.replace(',', '.').replace(' ',''))
items = float(items)
if items <= 0:
fees.append(items)
else:
sales.append(items)
I have attempted to do this without the loop but can't seem to work out where I have gone wrong.
df["amount"] = np.where((df['amount'][-7:-6] == ','),
df["amount"][-7:-6].str.replace(',', '').replace(' ',''),
df["amount"])
df["amount"] = np.where((df['amount'][-3:-2] == ','),
df["amount"][-3:-2].str.replace(',', '').replace(' ',''),
df["amount"])
Any help would be much appreciated. Thank you in advance