How to use broadcasting with my numpy arrays (3000,3) and (3,2,3000)

Question

I have 2 arrays currently with shapes v1=(3000,3) and v2=(3,2,3000). The 3000 is a time dimension so v1 has 3000 (1,3) samples and v2 has 3000 (3,2) samples. I wish to do matrix multiplication and broadcast along the 3000 dimension so that I get 3000 (1,2) vectors in return.

I have tried reshaping so that v1 = (1,3,3000) and v2 = (3,2,300) which gives an error saying that the shapes are not aligned.

code:

v1 = np.ones((1,3,3000)) +1
v2 = np.ones((3,2,3000)) - 0.5
np.dot(v1,v2)

Divakar · Accepted Answer · 2019-02-19T12:36:42.390

4

With v1 of shape (3000,3) and v2 as (3,2,3000), we can use np.einsum -

np.einsum('ij,jki->ik',v1,v2)

This gives us an output of shape (3000,2).

We could play around with the optimize arg in np.einsum. With optimize = True, it leverages BLAS internally and with optimize = False resorts to simple C-loops. That BLAS way requires some setting up work too. So, with decent lengths of axes that undergo sum-reductions, we might want to set that flag as True and False otherwise. In this case, it seems those axes are really short, so we are probably better off with the default : optimize = False input.

edited Feb 19 '19 at 12:36

answered Feb 18 '19 at 17:42

Divakar

204,109
15
192
292

is this equivalent to doing matrix multiplication? I tried to read the docs on einsum but they puzzled me. – seanysull Feb 19 '19 at 08:32
@seanysull It's performing matrix multiplication on tensors(more than 2 dims) under the hoods leveraging BLAS with that optimize flag set as True, same as `np.dot`. – Divakar Feb 19 '19 at 08:34
@Divakar Do you know the reason why using `optimize=True` consumes more time than when not using it? (please see the timings in my code below). Could it be possible that it spends lot of time optimizing the code? – kmario23 Feb 19 '19 at 11:53
@kmario23 What version of NumPy are you on? – Divakar Feb 19 '19 at 12:03
@Divakar sure, I'm using `'1.16.1'` – kmario23 Feb 19 '19 at 12:23
1

@You are right. Edited question with comments on the same. Thanks for pointing it out. – Divakar Feb 19 '19 at 12:30

kmario23 · Answer 2 · 2019-02-19T12:52:06.977

I'd suggest you to not use optimize=True flag because it's in-efficient, for some strange reason. Also, I'd recommend you to explicitly promote the 2D-array to 3D, perform batch matrix multiplication and then squeeze the singleton dimension of the resultant array, if you at the end need a 2D array as the final result. Please find the code below:

# sample arrays
In [25]: v1 = np.random.random_sample((3000, 3))
In [26]: v2 = np.random.random_sample((3, 2, 3000))

# Divakar's approach
In [27]: %timeit np.einsum('ij,jki->ik',v1,v2, optimize=True)
80.7 µs ± 792 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# needed for future use
In [28]: res_optimized = np.einsum('ij,jki->ik',v1,v2, optimize=True)

# promoting to 3D array and swapping axes
In [29]: v1 = v1[:, np.newaxis, :]
In [30]: v2 = np.moveaxis(v2, 2, 0)

# perform batch matrix multiplication
In [31]: %timeit np.einsum("bij, bjk -> bik", v1, v2)
47.9 µs ± 496 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# for sanity checking
In [32]: res = np.einsum("bij, bjk -> bik", v1, v2)

In [33]: res.shape, res_optimized.shape
Out[33]: ((3000, 1, 2), (3000, 2))

# squeeze the singleton dimension and perform sanity check with Divakar's approach
In [34]: np.allclose(res.squeeze(), res_optimized)
Out[34]: True

So, as we can see from the above timings, we gain approx. 2x speedup by not using optimize=True flag. Also, explicitly shaping the arrays to 3D gives bit more understanding about what's going on when we use numpy.einsum().

Note: timings were performed using the latest NumPy version '1.16.1'

P.S. Read more about Understanding NumPy einsum()

How to use broadcasting with my numpy arrays (3000,3) and (3,2,3000)

2 Answers2