Why is math.sqrt massively slower than exponentiation?

Question

I can't believe what I just measured:

python3 -m timeit -s "from math import sqrt" "sqrt(2)"
5000000 loops, best of 5: 42.8 nsec per loop

python3 -m timeit "2 ** 0.5"
50000000 loops, best of 5: 4.93 nsec per loop

This goes against any intuition... it shoud be exactly the opposite!

Python 3.8.3 on macOS Catalina

The first code has the overhead of doing an `import`, and the second doesn't. — John Gordon, Jun 17 '20 at 14:29
@JohnGordon The import is in the setup statement, which should not be timed. — mkrieger1, Jun 17 '20 at 14:29
Repeating what @John Gordon just said, "[In these micro-benchmarks, math.sqrt will be slower, because of the slight time it takes to lookup the sqrt in the math namespace](https://stackoverflow.com/a/327024/8286364)" — Adam Boinet, Jun 17 '20 at 14:31

chepner · Answer 1 · 2020-06-24T12:19:32.543

Python 3 is precomputing the value of 2 ** 0.5 at compile time, since both operands are known at that time. The value of sqrt, however, is not known at compile time, so the computation necessarily occurs at run time.

You aren't timing how long it takes to compute 2 ** 0.5, but just the time it takes to load a constant.

A fairer comparison would be

$ python3 -m timeit -s "from math import sqrt" "sqrt(2)"
5000000 loops, best of 5: 50.7 nsec per loop
$ python3 -m timeit -s "x = 2" "x**0.5"
5000000 loops, best of 5: 56.7 nsec per loop

I'm not sure if there is a way to show unoptimized byte code. Python starts by parsing source code into an abstract syntax tree (AST):

>>> ast.dump(ast.parse("2**0.5"))
'Module(body=[Expr(value=BinOp(left=Num(n=2), op=Pow(), right=Num(n=0.5)))])'

Update: This particular optimization is now applied directly to the abstract syntax tree, so the byte code is generated directly from something like

Module(body=Num(n= 1.4142135623730951))

The ast module doesn't appear to apply the optimization.

~~The compiler takes the AST and generates unoptimized byte code; in this case, I believe it would look (based on the output of dis.dis("2**x") and dis.dis("x**0.5")) like~~

LOAD_CONST       0  (2)
LOAD_CONST       1  (0.5)
BINARY_POWER
RETURN_VALUE

The raw byte code is then subject to modification by the peephole optimzizer, which can reduce these 4 instructions to 2, as shown by the dis module.

The compiler then generates byte code from the AST.

>>> dis.dis("2**0.5")
  1           0 LOAD_CONST               0 (1.4142135623730951)
              2 RETURN_VALUE

[While the following paragraph was originally written with the idea of optimizing byte code in mind, the reasoning applies to optimizing the AST as well.]

Since nothing at runtime affects how the two LOAD_CONST and following BINARY_POWER instruction are evaluated (for example, there are no name lookups), the peephole optimizer can take this sequence of byte codes, perform the computation of 2**0.5 itself, and replace the first three instructions with a single LOAD_CONST instruction that loads the result immediately.

This optimization actually happens at AST level in recent Python versions (see [`Python/ast_opt.c`](https://github.com/python/cpython/blob/v3.8.3/Python/ast_opt.c)), so "unoptimized" bytecode isn't actually generated any more. — user2357112 supports Monica, Jun 23 '20 at 22:40
Ah, OK, I'd seen references to doing optimizations at at the AST level, but thought it was a future enhancement since the `ast` module doesn't (yet) take advantage of it. — chepner, Jun 24 '20 at 12:13

mkrieger1 · Answer 2 · 2020-06-17T14:56:53.180

To enhance chepner's answer, here's a proof:

Python 3.5.3 (default, Sep 27 2018, 17:25:39) 
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> dis.dis('2 ** 0.5')
  1           0 LOAD_CONST               2 (1.4142135623730951)
              3 RETURN_VALUE

vs.

>>> dis.dis('sqrt(2)')
  1           0 LOAD_NAME                0 (sqrt)
              3 LOAD_CONST               0 (2)
              6 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
              9 RETURN_VALUE

akostrikov · Answer 3 · 2020-06-22T16:30:32.567

2

>>> dis.dis('44442.3123 ** 0.5')
          0 LOAD_CONST               0 (210.81345379268373)
          2 RETURN_VALUE

I do not believe, that 44442.3123 ** 0.5 is precomputed at compile time. We should better check the AST of code.

>>> import ast
>>> import math
>>> code = ast.parse("2**2")
>>> ast.dump(code)
'Module(body=[Expr(value=BinOp(left=Num(n=2), op=Pow(), right=Num(n=2)))])'
>>> code = ast.parse("math.sqrt(3)")
>>> ast.dump(code)
"Module(body=[Expr(value=Call(func=Attribute(value=Name(id='math', ctx=Load()), attr='sqrt', ctx=Load()), args=[Num(n=3)], keywords=[]))])"

edited Jun 22 '20 at 16:30

answered Jun 22 '20 at 11:32

akostrikov

57
4

`ast.parse` simply doesn't perform the peephole optimization. (I believe the optimizer uses the byte code generated from the parse tree as its input.) – chepner Jun 23 '20 at 21:54
@chepner You are right. I been wondering why it is compiled to load const. And Your updated answer clearly explains that. Thanks for update. – akostrikov Jun 24 '20 at 16:38

Why is math.sqrt massively slower than exponentiation?

3 Answers3