Python nested for loop faster than single for loop

Question

Why is the nested for loop faster than the single for loop?

start = time()

k = 0
m = 0

for i in range(1000):
    for j in range(1000):
        for l in range(100):
            m+=1

#for i in range(100000000):
#    k +=1

print int(time() - start)

For the single for loop I get a time of 14 seconds and for the nested for loop of 10 seconds

For context, read [this](https://stackoverflow.com/questions/30081275/why-is-1000000000000000-in-range1000000000000001-so-fast-in-python-3/30081318#30081318). — timgeb, Nov 12 '18 at 15:41

score 1 · Answer 1 · answered Nov 12 '18 at 15:40

It is because you are using Python2. Range generates a list of numbers, and has to allocate that list. In the first nested loop you are allocating 1000 + 1000 + 100, so the list size is 2100, while in the other one the list has a size of 100000000, which is much bigger.

In python2 is better to use a generator, xrange(), a generator yields the numbers instead of building and allocating a list with them.

Aditionally and for further information you can read this question that it is related to this but in python3

score 1 · Answer 2 · answered Nov 12 '18 at 15:40

1

In Python 2, range creates a list with all of the numbers within the list. Try swapping range with xrange and you should see them take comparable time or the single loop approach may work a bit faster.

answered Nov 12 '18 at 15:40

doodhwala

338
3
11

score 1 · Answer 3 · answered Nov 12 '18 at 15:44

during the nested loops python has to allocate 1000+1000+100=2100 values for the counters whereas in the single loop it has to allocate 10M. This is what's taking the extra time

i have tested this in python 3.6 and the behaviour is similar, i would say it's very likely this is a memory allocation issue.

score 1 · Accepted Answer · answered Nov 12 '18 at 16:28

The relevant context is explained in this topic.

In short, range(100000000) builds a huge list in Python 2, whereas with the nested loops you only build lists with a total of 1000 + 1000 + 100 = 2100 elements. In Python 3, range is smarter and lazy like xrange in Python 2.

Here are some timings for the following code. Absolute runtime depends on the system, but comparing the values with each other is valuable.

import timeit

runs = 100

code = '''k = 0
for i in range(1000):
    for j in range(1000):
        for l in range(100):
            k += 1'''

print(timeit.timeit(stmt=code, number=runs))

code = '''k = 0
for i in range(100000000):
    k += 1'''

print(timeit.timeit(stmt=code, number=runs))

Outputs:

CPython 2.7 - range

264.650791883
372.886064053

Interpretation: building huge lists takes time.

CPython 2.7 - range exchanged with xrange

231.975350142
221.832423925

Interpretation: almost equal, as expected. (Nested for loops should have slightly larger overhead than a single for loop.)

CPython 3.6 - range

365.20924194483086
437.26447860104963

Interpretation: Interesting! I did not expect this. Anyone?

Python nested for loop faster than single for loop

4 Answers4