Why is there such a large speed difference between the following L2 norm calculations:
a = np.arange(1200.0).reshape((-1,3))
%timeit [np.sqrt((a*a).sum(axis=1))]
100000 loops, best of 3: 12 µs per loop
%timeit [np.sqrt(np.dot(x,x)) for x in a]
1000 loops, best of 3: 814 µs per loop
%timeit [np.linalg.norm(x) for x in a]
100 loops, best of 3: 2 ms per loop
All three produce identical results as far as I can see.
Here's the source code for numpy.linalg.norm function:
x = asarray(x)
# Check the default case first and handle it immediately.
if ord is None and axis is None:
x = x.ravel(order='K')
if isComplexType(x.dtype.type):
sqnorm = dot(x.real, x.real) + dot(x.imag, x.imag)
else:
sqnorm = dot(x, x)
return sqrt(sqnorm)
EDIT: Someone suggested that one version could be parallelized, but I checked and it's not the case. All three versions consume 12.5% of CPU (as is usually the case with Python code on my 4 physical / 8 virtual core Xeon CPU.