33

I have a CSV file with data reading that I want to read into Python. I get lists that contain strings like "2,5". Now doing float("2,5") does not work, because it has the wrong decimal mark.

How do I read this into Python as 2.5?

pault
  • 32,557
  • 9
  • 66
  • 110
Till B
  • 1,138
  • 2
  • 12
  • 18

7 Answers7

64

You may do it the locale-aware way:

import locale

# Set to users preferred locale:
locale.setlocale(locale.LC_ALL, '')
# Or a specific locale:
locale.setlocale(locale.LC_NUMERIC, "en_DK.UTF-8")

print locale.atof("3,14")

Read this section before using this method.

Lauritz V. Thaulow
  • 41,893
  • 11
  • 64
  • 87
  • 1
    One should also keep in mind that this won't be thread safe: http://stackoverflow.com/questions/2186327/thread-safe-locale-techniques – Haroldo_OK Mar 14 '17 at 12:22
26

Pandas supports this out of the box:

df = pd.read_csv(r'data.csv', decimal=',')

See http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

maggie
  • 2,795
  • 2
  • 21
  • 27
  • Is there some ready command to replace comma decimal points with dot decimal points in pandas (without brute-force replacing with some regex)? – hhh Oct 02 '17 at 14:16
  • Didn't find one. But what's wrong with replacing it via apply or applymap? – maggie Oct 02 '17 at 16:41
26

float("2,5".replace(',', '.')) will do in most cases

If valueis a large number and .has been used for thousands, you can:

Replace all commas for points: value.replace(",", ".")

Remove all but the last point: value.replace(".", "", value.count(".") -1)

guival
  • 639
  • 17
  • 22
eumiro
  • 179,099
  • 29
  • 277
  • 252
  • 28
    Just curious, is it the true-way solution? Looks awful for me. – Andrey Agibalov Aug 18 '11 at 11:11
  • @loki2302 - Standard decimal delimiter is a point. If you use anything else (i.e. comma or something more "exotic" like space, a letter or anything else), then you have to replace it to make `float()` understand what you mean. – eumiro Aug 18 '11 at 11:13
  • 1
    The decimal delimiter is a [localization](http://en.wikipedia.org/wiki/Decimal_point#Hindu-Arabic_numeral_system) issue, so I guess @loki2302 wonders if there is a way to inform python to _adhere_ to the localized way of representing a floating-point number when parsing. – Lauritz V. Thaulow Aug 18 '11 at 11:18
  • 9
    It's not a worldwide *standard*. For instance, in Russia commas are standard and points are not widely used. So, I believe, that correct solution for this problem is to somehow be aware of this document's author locale. Another question is - *since we're talking about CSV, how did they manage to save floats with commas? :-)*. 3,14 is, I believe, 2 integers and not a float. – Andrey Agibalov Aug 18 '11 at 11:19
  • 8
    @loki2302 Standard csv format for countries with comma as decimal delimiter [is to use `;` as field delimiter](http://en.wikipedia.org/wiki/Comma-separated_values) in csv files. – Lauritz V. Thaulow Aug 18 '11 at 11:22
  • 33
    Standard delimiter is a comma. Everyone knows that, except Americans and English :D – Kheldar Aug 18 '11 at 11:28
  • 2
    It was a danish csv-file, and here the comma is standard decimal delimiter. And indeed, the field delimiter was ";". – Till B Aug 18 '11 at 11:32
  • @loki2302 Well, you can have CSV-fields with commas even if the delimiter is also a comma. You just need string quoting. Then there's a ton of ways of string quoting in CSV (how to escape the quote? double quotes? backslash?). – Tomasz Gandor Mar 29 '15 at 09:30
  • 1
    The problem with this approach is, that in germany we sometimes do something like "1,000,024,00" - i know it's not good. But i often have to handle this kind of data. If you don't want to use locale, you can just do it this way: `value = "1,000,024,00" value = value.replace(",", ".") value.replace(".", "", (value.count(".")-1))` – herbertp. Jan 06 '16 at 08:40
  • @Kheldar ... and Australians, and Bangladeshis, and Canadians, and Dominicans, and Egyptians, [and...](https://en.wikipedia.org/wiki/Decimal_mark#Hindu%E2%80%93Arabic_numeral_system) – Jean-François Corbett Dec 13 '17 at 12:51
  • 2
    @Jean-FrançoisCorbett which are all British Commonwealth system users. Point stands. – Kheldar Dec 14 '17 at 14:17
  • @Kheldar I guess... in a circular logic kind of way. – Jean-François Corbett Dec 14 '17 at 14:51
  • 4
    @Jean-FrançoisCorbett I suppose I haven't been clear enough. My point was simply that delimiters depend on the language you use, mainly how numbers have been represented historically. English-based languages and Latin-based languages use incompatible systems, which is why the UK and former colonies are featured in your list. Regards! – Kheldar Dec 17 '17 at 12:23
  • @Kheldar That is accurate, though I find that point difficult to parse from your original comment. – Jean-François Corbett Dec 17 '17 at 15:08
7

using a regex will be more reliable

import re

decmark_reg = re.compile('(?<=\d),(?=\d)')

ss = 'abc , 2,5 def ,5,88 or (2,5, 8,12, 8945,3 )'

print ss
print decmark_reg.sub('.',ss)

result

abc , 2,5 def ,5,88 or (2,5, 8,12, 8945,3 )
abc , 2.5 def ,5.88 or (2.5, 8.12, 8945.3 )

If you want to treat more complex cases (numbers with no digit before the decimal mark for exemple) the regex I crafted to detect all types of numbers in the following thread may be of interest for you:

stackoverflow.com/questions/5917082/regular-expression-to-match-numbers-with-or-without-commas-and-decimals-in-text/5929469

Community
  • 1
  • 1
eyquem
  • 24,028
  • 6
  • 35
  • 41
4

First you must ensure what locale was used to provide the number. Failing to do this random problems surely will occur.

import locale

loc = locale.getlocale()  # get and save current locale
# use locale that provided the number;
# example if German locale was used:
locale.setlocale(locale.LC_ALL, 'de_DE')
pythonnumber = locale.atof(value)
locale.setlocale(locale.LC_ALL, loc)  # restore saved locale
ilias iliadis
  • 513
  • 5
  • 13
3

Try replacing all the decimal commas with decimal dots:

floatAsStr = "2,5"
floatAsStr = floatAsStr.replace(",", ".");
myFloat = float(floatAsStr)

The function replace, of course, work on any substring as python does now differentiate between char and string.

penelope
  • 7,605
  • 5
  • 40
  • 82
0

if dots are used as thousand separators, to swap commas and dots you could use a third symbol as temporary placeholder like so:

value.replace('.', '#').replace(',', '.').replace('#', ',')

but seeing as you want to convert to float from string, you could just remove any dots and then replace any commas with dots

float(value.replace('.', '').replace(',', '.'))

IMO this is the most readable solution

teebagz
  • 496
  • 1
  • 3
  • 23