This question is very similar to this post - but not exactly
I have some data in a .csv file. The data has precision to the 4th digit (#.####).
Calculating the mean in Excel or SAS gives a result with precision to 5th digit (#.#####) but using numpy gives:
import numpy as np
data = np.recfromcsv(path2file, delimiter=';', names=['measurements'], dtype=np.float64)
rawD = data['measurements']
print np.average(rawD)
gives a number like this
#.#####999999999994
Clearly something is wrong..
using
from math import fsum
print fsum(rawD.ravel())/rawD.size
gives
#.#####
Is there anything in the np.average that I set wrong _______?
BONUS info:
I'm only working with 200 data points in the array
UPDATE
I thought I should make my case more clear.
I have numbers like 4.2730
in my csv (giving a 4 decimal precision - even though the 4th always is zero [not part of the subject so don't mind that])
Calculating an average/mean by numpy gives me this
4.2516499999999994
Which gives a print by
>>>print "%.4f" % np.average(rawD)
4.2516
During the same thing in Excel or SAS gives me this:
4.2517
Which I actually believe as being the true average value because it finds it to be 4.25165. This code also illustrate it:
answer = 0
for number in rawD:
answer += int(number*1000)
print answer/2
425165
So how do I tell np.average() to calculate this value ___?
I'm a bit surprised that numpy did this to me... I thought that I only needed to worry if I was dealing with 16 digits numbers. Didn't expect a round off on the 4 decimal place would be influenced by this..
I know I could use
fsum(rawD.ravel())/rawD.size
But I also have other things (like std) I want to calculate with the same precision
UPDATE 2
I thought I could make a temp solution by
>>>print "%.4f" % np.float64("%.5f" % np.mean(rawD))
4.2416
Which did not solve the case. Then I tried
>>>print "%.4f" % float("4.24165")
4.2416
AHA! There is a bug in the formatter: Issue 5118
To be honest I don't care if python stores 4.24165 as 4.241649999... It's still a round off error - NO MATTER WHAT.
If the interpeter can figure out how to display the number
>>>print float("4.24165")
4.24165
Then should the formatter as well and deal with that number when rounding..
It still doesn't change the fact that I have a round off problem (now both with the formatter and numpy)
In case you need some numbers to help me out then I have made this modified .csv file:
(I'm aware that this file does not have the number of digits I explained earlier and that the average gives ..9988 at the end instead of ..9994 - it's modified)
Guess my qeustion boils down to how do I get a string output like the one excel gives me if I use =average()
and have it round off correctly if I choose to show only 4 digits
I know that this might seem strange for some.. But I have my reasons for wanting to reproduce the behavior of Excel.
Any help would be appreciated, thank you.