With Python Numpy, how can I most efficiently subtract a NxM matrix from a Nx1 matrix element-wise?

Question

Let x be a 3x4 Numpy matrix defined by:

x = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
In: x
Out:
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

Let y be a 3x1 matrix defined by:

y = np.array([3, 6 ,9])
In: y
Out: array([3, 6, 9])

How could I most efficiently subtract y - x element-wise such that the result will be:

array([[ 2,  1,  0, -1],
       [ 1,  0, -1, -2],
       [ 0, -1, -2, -3]])

The only way that I found to do it is:

-1.0*(x.T + (-1.0*y)).T

However, upon profiling I found that because I am doing the above calculation multiple times and with big matrices, that last line proved the be the bottle-neck of my application. Thus, I ask: is there a better, more efficient way of doing that?

@Divakar I would avoid `None` and use `numpy.newaxis`. At least it gives a hint at what's happening (even though `numpy.newaxis is None`) — Giacomo Alzetta, Aug 29 '18 at 07:27
@AzatIbrakov sorry about not being clear about that. I meant CPU runtime efficiency (i.e. not memory efficiency) — AndraSol, Aug 29 '18 at 08:15

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

1

Let y be a 3x1 matrix defined by:
y = np.array([3, 6 ,9])

That is not a 3x1 matrix (more info here):

>>> y.shape
(3,)

A 3x1 matrix is produced with

>>> y_new = np.array([[3], [6], [9]])
>>> y_new.shape
(3, 1)

Or from your existing y with:

>>> y_new = y[:, np.newaxis]

Once you actually have a 3x1 and a 3x4 matrix, you can just subtract them

>>> x - y_new

edited Jun 20 '20 at 09:12

Community

1
1

answered Aug 29 '18 at 08:28

Eric

87,154
48
211
332

By that I assume then that the most performant solution would be to transform the "3-vector" into a 3x1 matrix and then just do the subtraction? – AndraSol Aug 29 '18 at 08:32
Yes - the transforming step is O(1) if you spell it `y[:,None]` as in the comments – Eric Aug 29 '18 at 08:32
The answer would be more complete if you explicitly write out the `x[:, None] - y` solution. That way it is obvious that you don't need to actually modify `y` (neither the array or the name) to make the computation. – Hannes Ovrén Aug 29 '18 at 08:44

norok2 · Answer 2 · 2018-08-31T11:15:43.560

As others have already pointed out, NumPy's broadcasting is your friend here. Note that because of this broadcasting rules, in NumPy it is actually a lot less frequent to use the transpose operation compared to other matrix-oriented tech stacks (read: MATLAB/Octave).

EDITED (reorganized)

The key is to get a correctly shaped array. The best method is to use a slicing with an extra np.newaxis/None value. But you could also use ndarray.reshape():

import numpy as np
x = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
y = np.array([3, 6 ,9]).reshape(-1, 1)  # same as: y = np.array([3, 6 ,9])[:, None]
y - x

Most importantly, a correctly shaped array would allow to use numexpr, which can be more efficient than NumPy for large arrays (and it can be a good fit for your algorithm, if the bottleneck is that operation):

import numpy as np
import numexpr as ne

x = np.random.randint(1, 100, (3, 4))
y = np.random.randint(1, 100, (3, 1))

%timeit y - x
# The slowest run took 43.14 times longer than the fastest. This could mean that an intermediate result is being cached.
# 1000000 loops, best of 3: 879 ns per loop

%timeit ne.evaluate('y - x')
# The slowest run took 20.86 times longer than the fastest. This could mean that an intermediate result is being cached.
# 100000 loops, best of 3: 10.8 µs per loop

# not so exciting for small arrays, but for somewhat larger numbers...
x = np.random.randint(1, 100, (3000, 4000))
y = np.random.randint(1, 100, (3000, 1))

%timeit y - x
# 10 loops, best of 3: 33.1 ms per loop
%timeit ne.evaluate('y - x')
# 100 loops, best of 3: 10.7 ms per loop

# which is roughly a factor 3 faster on my machine

In this case, there is no much difference on how you get to a correctly shaped account -- either slicing or reshape -- but slicing seems to be twice as fast. To put some numbers to it (edited as per comments):

import numpy as np

# creating the array does not depend too much as long as its size is the same

%timeit y = np.zeros((3000000))
# 838 µs ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit y = np.zeros((3000000, 1))
# 825 µs ± 12.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit y = np.zeros((3000, 1000))
# 827 µs ± 14.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


# ...and reshaping / slicing is independent of the array size

x = np.zeros(3000000)
%timeit x[:, None]
# 147 ns ± 4.02 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit x.reshape(-1, 1)
# 214 ns ± 9.55 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

x = np.zeros(300)
%timeit x[:, None]
# 146 ns ± 0.659 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit x.reshape(-1, 1)
# 212 ns ± 1.56 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Needless to say that %timeit benchmarks should be taken with a grain of salt.

Better to exclude the random number generations from your timings here — Eric, Aug 29 '18 at 16:25
You are right, it was making it difficult to see the differences among the (re)shaping methods. Now should be better. — norok2, Aug 30 '18 at 07:43
You should also exclude the allocation time from your benchmark - allocate the arrays before the timeit — Eric, Aug 30 '18 at 15:33

With Python Numpy, how can I most efficiently subtract a NxM matrix from a Nx1 matrix element-wise?

2 Answers2