IEEE floating point operations are deterministic, but see How can floating point calculations be made deterministic? for one way that an overall floating point computation can be non-deterministic:
... parallel computations are non-deterministic in terms of the order in which floating-point computations are performed, which can result in non-bit-exact results across runs.
Two-part question:
- How else can an overall floating point computation be non-deterministic, yielding results that are not exactly equal?
Consider a single-threaded Python program that calls NumPy, CVXOPT, and SciPy subroutines such as
scipy.optimize.fsolve()
, which in turn call native libraries like MINPACK and GLPK and optimized linear algebra subroutines like BLAS, ATLAS, and MKL. “If your numpy/scipy is compiled using one of these, then dot() will be computed in parallel (if this is faster) without you doing anything.”Do these native libraries ever parallelize in a way that introduces non-deterministic results?
Assumptions:
- The same software, with the same inputs, on the same hardware. The output of multiple runs should be equal.
- If that works, it's highly desirable to test that the output after doing a code refactoring is equal. (Yes, some changes in order of operations can make some of the output not-equal.)
- All random numbers in the program are psuedo-random numbers used in a consistent way from the same seeds across all runs.
No uninitialized values. Python is generally safe in that way but
numpy.empty()
returns a new array without initializing entries. And it's not clear that it's much faster in practice. So beware!@PaulPanzer's test shows that
numpy.empty()
does return an uninitialized array and it can easily and quickly recycle a recent array:import numpy as np np.arange(100); np.empty(100, int); np.empty(100, int) np.arange(100, 200.0); np.empty(100, float); np.empty(100, float)
It's tricky to get useful timing measurements for these routines! In a
timeit
loop,numpy.empty()
can just keep reallocating the same one or two memory nodes. The time is independent of the array size. To prevent recycling:from timeit import timeit timeit('l.append(numpy.empty(100000))', 'import numpy; l = []') timeit('l.append(numpy.zeros(100000))', 'import numpy; l = []')
but reducing that array size to
numpy.zeros(10000)
takes 15x as long; reducing it tonumpy.zeros(1000)
takes 1.3x as long (on my MBP). Puzzling.
See also: Hash values are salted in Python 3 and each dict preserves insertion order. That could vary the order of operations from run to run. [I'm wrangling with this problem in Python 2.7.15.]