11

I usually work with huge simulations. Sometimes, I need to compute the center of mass of the set of particles. I noted that in many situations, the mean value returned by numpy.mean() is wrong. I can figure out that it is due to a saturation of the accumulator. In order to avoid the problem, I can split the summation over all particles in small set of particles, but it is uncomfortable. Anybody has and idea of how to solve this problem in an elegant way?

Just for piking up your curiosity, the following example produce something similar to what I observe in my simulations:

import numpy as np
a = np.ones((1024,1024), dtype=np.float32)*30504.00005

if you check the max and min values, you get:

a.max() 
30504.0
a.min() 
30504.0

however, the mean value is:

a.mean()
30687.236328125

You can figure out that something is wrong here. This is not happening when using dtype=np.float64, so it should be nice to solve the problem for single precision.

Alejandro
  • 2,803
  • 1
  • 18
  • 34

4 Answers4

7

This isn't a NumPy problem, it's a floating-point issue. The same occurs in C:

float acc = 0;
for (int i = 0; i < 1024*1024; i++) {
    acc += 30504.00005f;
}
acc /= (1024*1024);
printf("%f\n", acc);  // 30687.304688

(Live demo)

The problem is that floating-point has limited precision; as the accumulator value grows relative to the elements being added to it, the relative precision drops.

One solution is to limit the relative growth, by constructing an adder tree. Here's an example in C (my Python isn't good enough...):

float sum(float *p, int n) {
    if (n == 1) return *p;
    for (int i = 0; i < n/2; i++) {
        p[i] += p[i+n/2];
    }
    return sum(p, n/2);
}

float x[1024*1024];
for (int i = 0; i < 1024*1024; i++) {
    x[i] = 30504.00005f;
}

float acc = sum(x, 1024*1024);

acc /= (1024*1024);
printf("%f\n", acc);   // 30504.000000

(Live demo)

Oliver Charlesworth
  • 252,669
  • 29
  • 530
  • 650
3

You can call np.mean with a dtype keyword argument, that specifies the type of the accumulator (which defaults to the same type as the array for floating point arrays).

So calling a.mean(dtype=np.float64) will solve your toy example, and perhaps your issue with larger arrays.

Jaime
  • 59,107
  • 15
  • 108
  • 149
  • Yes, it was stated in the question. np.float64 solves the problem as you say. But it is possible to solve the problem when computing the mean by hand without changing dtype. If you take little subsets of the data and compute partial summations, you get a better result even with single precision – Alejandro Jul 04 '13 at 06:34
  • The right thing to do would be to use (Welford's method)[http://stackoverflow.com/questions/895929/how-do-i-determine-the-standard-deviation-stddev-of-a-set-of-values/897463#897463], or a similar variant, but nothing like that is implemented in numpy. The next best thing to making your arrays of `np.float64`, is to tell `np.mean` to use an `np.float64` accumulator using the `dtype` keyword. – Jaime Jul 04 '13 at 15:02
3

You can partially remedy this by using a built-in math.fsum, which tracks down the partial sums (the docs contain a link to an AS recipe prototype):

>>> fsum(a.ravel())/(1024*1024)
30504.0

As far as I'm aware, numpy does not have an analog.

ev-br
  • 21,230
  • 9
  • 54
  • 70
  • +1 for the accuracy, but on my machine more than 100 times slower than `a.mean()` or `a.mean(axis=-1).mean()`. – Stefano M Jul 04 '13 at 11:34
  • sure it is, it's pure python. And even if this sort of thing is getting into numpy, there's still quite a lot of work as compared to just summing things up. But the question of course is whether doing this is going to create a bottleneck in your real code --- you mentioned 'sometimes' in the original post :-). – ev-br Jul 04 '13 at 13:18
  • `math.fsum` is implemented in C, the AS recipe is just a reference. Probably the AS python code is thousands of times slower... Since the OP speaks of `huge` problems I though that speed was an issue, but here I'm alone. There is nothing wrong in trading accuracy for speed and small memory footprint... – Stefano M Jul 04 '13 at 15:24
  • Sure, you're right. I meant to mean '`fsum` incurs the python overhead'. – ev-br Jul 04 '13 at 17:58
0

Quick and dirty answer

assert a.ndim == 2
a.mean(axis=-1).mean()

This gives the expected result for the 1024*1024 matrix, but of course this will not be true for larger arrays...

If computing the mean will not be a bottleneck in your code I would implement myself an ad-hoc algorithm in python: details however depends on your data structure.

If computing the mean is a bottleneck, then some specialized (parallel) reduction algorithm could solve the problem.

Edit

This approach may seem silly, but will for sure mitigate the problem and is almost as efficient as .mean() itself.

In [65]: a = np.ones((1024,1024), dtype=np.float32)*30504.00005

In [66]: a.mean()
Out[66]: 30687.236328125

In [67]: a.mean(axis=-1).mean()
Out[67]: 30504.0

In [68]: %timeit a.mean()
1000 loops, best of 3: 894 us per loop

In [69]: %timeit a.mean(axis=-1).mean()
1000 loops, best of 3: 906 us per loop

Giving a more sensible answer requires some more information on the data structures, it's sizes, and target architeture.

Stefano M
  • 3,268
  • 1
  • 17
  • 42