Why is Python 3 is considerably slower than Python 2?

Question

I've been trying to understand why Python 3 is actually taking much time compared with Python 2 in certain situations, below are few cases I've verified from python 3.4 to python 2.7.

Note: I've gone through some of the questions like Why is there no xrange function in Python3? and loop in python3 much slower than python2 and Same code slower in Python3 as compared to Python2, but I feel that I didn't get the actual reason behind this issue.

I've tried this piece of code to show how it is making difference:

MAX_NUM = 3*10**7

# This is to make compatible with py3.4.
try:
    xrange
except:
    xrange = range


def foo():
    i = MAX_NUM
    while i> 0:
        i -= 1

def foo_for():
    for i in xrange(MAX_NUM):
        pass

When I've tried running this programme with py3.4 and py2.7 I've got below results.

Note: These stats came through a 64 bit machine with 2.6Ghz processor and calculated the time using time.time() in single loop.

Output : Python 3.4
-----------------
2.6392083168029785
0.9724123477935791

Output: Python 2.7
------------------
1.5131521225
0.475143909454

I really don't think that there has been changes applied to while or xrange from 2.7 to 3.4, I know range has been started acting as to xrange in py3.4 but as documentation says

range() now behaves like xrange() used to behave, except it works with values of arbitrary size. The latter no longer exists.

this means change from xrange to range is very much equal to a name change but working with arbitrary values.

I've verified disassembled byte code as well.

Below is the disassembled byte code for function foo():

Python 3.4:
--------------- 

 13           0 LOAD_GLOBAL              0 (MAX_NUM)
              3 STORE_FAST               0 (i)

 14           6 SETUP_LOOP              26 (to 35)
        >>    9 LOAD_FAST                0 (i)
             12 LOAD_CONST               1 (0)
             15 COMPARE_OP               4 (>)
             18 POP_JUMP_IF_FALSE       34

 15          21 LOAD_FAST                0 (i)
             24 LOAD_CONST               2 (1)
             27 INPLACE_SUBTRACT
             28 STORE_FAST               0 (i)
             31 JUMP_ABSOLUTE            9
        >>   34 POP_BLOCK
        >>   35 LOAD_CONST               0 (None)
             38 RETURN_VALUE

python 2.7
-------------

 13           0 LOAD_GLOBAL              0 (MAX_NUM)
              3 STORE_FAST               0 (i)

 14           6 SETUP_LOOP              26 (to 35)
        >>    9 LOAD_FAST                0 (i)
             12 LOAD_CONST               1 (0)
             15 COMPARE_OP               4 (>)
             18 POP_JUMP_IF_FALSE       34

 15          21 LOAD_FAST                0 (i)
             24 LOAD_CONST               2 (1)
             27 INPLACE_SUBTRACT    
             28 STORE_FAST               0 (i)
             31 JUMP_ABSOLUTE            9
        >>   34 POP_BLOCK           
        >>   35 LOAD_CONST               0 (None)
             38 RETURN_VALUE

And below is the disassembled byte code for function foo_for():

Python: 3.4

 19           0 SETUP_LOOP              20 (to 23)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_GLOBAL              1 (MAX_NUM)
              9 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             12 GET_ITER
        >>   13 FOR_ITER                 6 (to 22)
             16 STORE_FAST               0 (i)

 20          19 JUMP_ABSOLUTE           13
        >>   22 POP_BLOCK
        >>   23 LOAD_CONST               0 (None)
             26 RETURN_VALUE


Python: 2.7
-------------

 19           0 SETUP_LOOP              20 (to 23)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_GLOBAL              1 (MAX_NUM)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                 6 (to 22)
             16 STORE_FAST               0 (i)

 20          19 JUMP_ABSOLUTE           13
        >>   22 POP_BLOCK           
        >>   23 LOAD_CONST               0 (None)
             26 RETURN_VALUE

If we compare both the byte codes they've produced the same disassembled byte code.

Now I'm wondering what change from 2.7 to 3.4 is really causing this huge change in execution time in the given piece of code.

Running this just once with the *whole interpreter setup* won't tell you anything. Use `timeit.timeit()` to run time trials instead. — Martijn Pieters, Jul 21 '15 at 20:15

Martijn Pieters · Accepted Answer · 2015-07-21T20:37:07.543

39

The difference is in the implementation of the int type. Python 3.x uses the arbitrary-sized integer type (long in 2.x) exclusively, while in Python 2.x for values up to sys.maxint a simpler int type is used that uses a simple C long under the hood.

Once you limit your loops to long integers, Python 3.x is faster:

>>> from timeit import timeit
>>> MAX_NUM = 3*10**3
>>> def bar():
...     i = MAX_NUM + sys.maxsize
...     while i > sys.maxsize:
...         i -= 1
...

Python 2:

>>> timeit(bar, number=10000)
5.704327821731567

Python 3:

>>> timeit(bar, number=10000)
3.7299320790334605

I used sys.maxsize as sys.maxint was dropped from Python 3, but the integer value is basically the same.

The speed difference in Python 2 is thus limited to the first (2 ** 63) - 1 integers on 64-bit, (2 ** 31) - 1 integers on 32 bit systems.

Since you cannot use the long type with xrange() on Python 2, I did not include a comparison for that function.

edited Jul 21 '15 at 20:37

answered Jul 21 '15 at 20:24

Martijn Pieters

889,049
245
3,507
2,997

11

Is there any reason you would not want to optimize for integers below 2**63? They seem to be the most frequently used... – thebjorn Jul 21 '15 at 20:30
@thebjorn: the simplification of using one `int` type was more important. Besides, if you are doing `for` loops over that large a range you are probably doing something wrong *anyway*. – Martijn Pieters Jul 21 '15 at 20:32
8

But doesn't this choice make all integer calculations on e.g. array indexes etc. slower too? It seems that other languages (Smalltalk, Lisp, Haskell, Java) go to some lengths in order to optimize the boxing/unboxing of integers, are those optimizations superfluous in a language like Python? – thebjorn Jul 21 '15 at 20:45
@thebjorn: sequences are explicitly limited to `sys.maxsize` anyway and for up to 2**30 (one digit) the code is highly optimised to just return the first digit from the Python `int` object. – Martijn Pieters Jul 21 '15 at 20:55
2

@MartijnPieters It still doesn't give the same performance, as the OP's tests show. – quant_dev Jan 05 '16 at 15:39
1

@quant_dev: Of course those tests don't give the same performance. I never said they could be made to give the same performance. When you are using *large integers* past `sys.maxint` you get the same performance *in Python 2* because the same codepaths are involved then. – Martijn Pieters Jan 05 '16 at 15:42

Why is Python 3 is considerably slower than Python 2?

1 Answers1

Linked