As part of a statistical programming package, I need to add log-transformed values together with the LogSumExp Function. This is significantly less efficient than adding unlogged values together.
Furthermore, I need to add values together using the numpy.ufunc.reduecat functionality.
There are various options I've considered, with code below:
- (for comparison in non-log-space) use numpy.add.reduceat
- Numpy's ufunc for adding logged values together: np.logaddexp.reduceat
- Handwritten reduceat function with the following logsumexp functions:
- scipy's implemention of logsumexp
- logsumexp function in Python (with numba)
- Streaming logsumexp function in Python (with numba)
def logsumexp_reduceat(arr, indices, logsum_exp_func):
res = list()
i_start = indices[0]
for cur_index, i in enumerate(indices[1:]):
res.append(logsum_exp_func(arr[i_start:i]))
i_start = i
res.append(logsum_exp_func(arr[i:]))
return res
@numba.jit(nopython=True)
def logsumexp(X):
r = 0.0
for x in X:
r += np.exp(x)
return np.log(r)
@numba.jit(nopython=True)
def logsumexp_stream(X):
alpha = -np.Inf
r = 0.0
for x in X:
if x != -np.Inf:
if x <= alpha:
r += np.exp(x - alpha)
else:
r *= np.exp(alpha - x)
r += 1.0
alpha = x
return np.log(r) + alpha
arr = np.random.uniform(0,0.1, 10000)
log_arr = np.log(arr)
indices = sorted(np.random.randint(0, 10000, 100))
# approach 1
%timeit np.add.reduceat(arr, indices)
12.7 µs ± 503 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# approach 2
%timeit np.logaddexp.reduceat(log_arr, indices)
462 µs ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# approach 3, scipy function
%timeit logsum_exp_reduceat(arr, indices, scipy.special.logsumexp)
3.69 ms ± 273 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# approach 3 handwritten logsumexp
%timeit logsumexp_reduceat(log_arr, indices, logsumexp)
139 µs ± 7.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# approach 3 streaming logsumexp
%timeit logsumexp_reduceat(log_arr, indices, logsumexp_stream)
164 µs ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
The timeit results show that handwritten logsumexp functions with numba are the fastest options, but are still 10x slower than numpy.add.reduceat.
A few questions:
- Are there any other approaches (or tweaks to the options I've presented) which are faster? For instance, is there a way to use a lookup table to compute the logsumexp function?
- Why is Sebastian Nowozin's "streaming logsumexp" function not faster than the naive approach?