Python 2D list performance, without numpy

Question

I am trying to create a 2D list in Python. I found two possibilities.

def cArray(size):
    c = [[0. for i in range(size)] for j in range(size)]
    return c

def npArray(size):
    np = numpy.zeros((size,size))
    return np

Now both of these functions give the correct answer. The problem here is with the performance. I ran both of these using timeit, and here are my results:

list size is 5000
number of times run is 5

cArray average time: 3.73241295815
npArray average time: 0.210782241821

So obviously, I would like to avoid the first solution, especially since this will be running for sizes up to 100k. However, I also do not want to use too many dependencies. Is there a way for me to efficiently create a 2D array, without numpy? It doesn't need to be exactly up to speed, as long as it's not 17 times slower.

If this shocks you, try measuring memory consumption too. It may not be 17x larger but it definitely will be awful. — , Aug 22 '12 at 18:47
ctypes could help, but surely not as much as numpy see http://stackoverflow.com/questions/4145775/how-do-i-convert-a-python-list-into-a-c-array-by-using-ctypes this question — unddoch, Aug 22 '12 at 18:50
.. sizes up to 100k? Then I hope your array either (1) isn't square despite your example being square, or (2) is very sparse, because that's a lot of cells to keep track of otherwise. — DSM, Aug 22 '12 at 18:59
I really think that this question cannot be truly answered without knowing what is done to the data. There is also import array which is limited to 1D, but a list of arrays is possible. To throw it in [[0]*size for _ in xrange(size)], or array('d', [0]) instaed of [0] (which is type aware and thus takes much less space), eliminates one loop. — seberg, Aug 22 '12 at 19:39
@Sebastian: this is for an implementation of a [sequence alignment algorithm](http://www.ks.uiuc.edu/Training/SumSchool/materials/sources/tutorials/07-bioinformatics/seqlab-html/node6.html) — Phil, Aug 22 '12 at 20:16
@flep, anyways, has been said... numpy is probably the only way to go (since it has many fast vectorized functions), unless for this kind of thing you are not doomed to go to a lower language then python anyways. — seberg, Aug 22 '12 at 20:31

score 5 · Accepted Answer · answered Aug 22 '12 at 18:57

So obviously, I would like to avoid the first solution, especially since this will be running for sizes up to 100k. However, I also do not want to use too many dependencies.

You must choose which of these is more important to you. Numpy has better performance precisely because it doesn't use the builtin Python types and uses its own types that are optimized for numerical work. If your data are going to be numeric and you're going to have 100k rows/columns, you will see a gigantic performance increase with numpy. If you want to avoid the numpy dependency, you will have to live with reduced performance. (Obviously you can always write your own Python libraries or C extensions to optimize for your particular use case, but these will then be dependencies like any other.)

Personally I would recommend you just use numpy. It is so widely used that anyone who is considering using a library that deals with 100k multidimensional arrays probably already has numpy installed.

Roland Smith · Answer 2 · 2012-08-22T19:49:08.663

I tried a couple of alternatives;

Edit: The original eArray was faulty, it created references to the same list...

Edit2: Added array.array as suggested by Sebastian.

import time
import numpy as np
import array

t1 = 0

def starttimer():
    global t1
    t1 = time.clock()

def stoptimer(s):
    t2 = time.clock()
    print 'elapsed time for "{}": {:.3f} seconds'.format(s, t2-t1)

def cArray(size):
    c = [[0. for i in range(size)] for j in range(size)]
    return c

def dArray(size):
    d = [[0. for i in xrange(size)] for j in xrange(size)]
    return d

def eArray2(size):
    return [[0.]*size for j in xrange(size)]

def fArray(size):
    return np.zeros((size,size))

def gArray(size):
    return [array.array('d', [0])*size for j in xrange(size)]

sz = 5000

starttimer()
cArray(sz)
stoptimer('cArray')

starttimer()
dArray(sz)
stoptimer('dArray')

starttimer()
fArray(sz)
stoptimer('fArray')

starttimer()
gArray(sz)
stoptimer('gArray')

The results (cpython 2.7.3 on FreeBSD amd64, if anyone cares):

> python tmp/test.py
elapsed time for "cArray": 2.312 seconds
elapsed time for "dArray": 1.945 seconds
elapsed time for "eArray2": 0.680 seconds
elapsed time for "fArray": 0.180 seconds
elapsed time for "gArray": 0.695 seconds
> python tmp/test.py
elapsed time for "cArray": 2.312 seconds
elapsed time for "dArray": 1.914 seconds
elapsed time for "eArray2": 0.680 seconds
elapsed time for "fArray": 0.180 seconds
elapsed time for "gArray": 0.695 seconds
> python tmp/test.py
elapsed time for "cArray": 2.328 seconds
elapsed time for "dArray": 1.906 seconds
elapsed time for "eArray2": 0.680 seconds
elapsed time for "fArray": 0.180 seconds
elapsed time for "gArray": 0.703 seconds

This timing code won't work: `t1 = time.clock()` sets a local variable. That's why `d` and `e` have the same time when `e` should be much faster. If you fix this (e.g. `global t1` in `starttimer`) you get more plausible results. It's usually easier to use the `timeit` module. As an aside, `eArray` isn't really a fair comparison because it doesn't actually make 5000^2 cells. — DSM, Aug 22 '12 at 19:07
Also keep in mind that they don't do the same thing. For example, try changing an element of your `eArray` and look at the result. — Joe Kington, Aug 22 '12 at 19:18
@JoeKington I've changed eArray. I think I realized where it went wrong. :-/ — Roland Smith, Aug 22 '12 at 19:27
since you are at it, maybe add import array, array.array('d', [0])*size instead of [0]*size for eArray. — seberg, Aug 22 '12 at 19:41

score 2 · Answer 3 · answered Aug 22 '12 at 18:51

2

If you want to go really low level, you can use ctypes:

a = (ctypes.c_int * 5000 * 5000)()

But NumPy is available on the majority of platforms where Python runs.

answered Aug 22 '12 at 18:51

ecatmur

137,771
23
263
343

score 1 · Answer 4 · answered Nov 11 '18 at 10:37

My use-case is that at some online judge platform, the library dependency is limited, but there is need for 2D array when doing dynamical programming (also hard to vectorize). My python codes there often get Time Limit Exceeded.

There are things to reminder:

Although python list is a array of pointers, the naive objects are very quick.
Using compact structure like numpy maybe fast when creating object, but accessing elements will cost extra overhead unlike naive python object,
unless some JIT things like Numba are used.

Test code with a toy DP example:

import timeit

size = 1000

range_init_0 = "size={}".format(size)
range_init_1 = "a = [[0 for i in range(size)] for j in range(size)]"

multi_init_0 = "size={}".format(size)
multi_init_1 = "a = [[0]*size for _ in range(size)]"

numpy_init_0 = "from numpy import zeros; size={}".format(size)
numpy_init_1 = "a = zeros((size, size), dtype=int)"

array_init_0 = "from array import array; size={}".format(size)
array_init_1 = "a = [array('d', [0])*size for j in range(size)]"

ctyps_init_0 = "from ctypes import c_int; size={}".format(size)
ctypes_init_1 = "a = (c_int * size * size)()"

dp = '''
MOD = int(1e9+7)
for i in range(size):
    a[i][0] = 1
for j in range(size):
    a[0][i] = 1
for i in range(1, size):
    for j in range(1, size):
        a[i][j] = (a[i][j] + a[i-1][j] + a[i][j-1]) % MOD
'''

def test(name, init_0, init_1, dp, n=10):
    t = timeit.timeit(init_1, setup=init_0, number=n)
    print("{} initial time:\t{:.8f}".format(name, t))
    t = timeit.timeit(dp, setup=init_0 + '\n' + init_1, number=n)
    print("{} calculate time:\t{:.8f}".format(name, t))

test("range", range_init_0, range_init_1, dp)
test("multi", multi_init_0, multi_init_1, dp)
test("numpy", numpy_init_0, numpy_init_1, dp)
test("array", array_init_0, array_init_1, dp)
test("ctypes", ctyps_init_0, ctypes_init_1, dp)

print('------')
numba_init_0 = '''
import numpy as np
size = {}
a = np.zeros((size, size), dtype=np.int32)
'''.format(size)
numba_init_1 = '''
import numba
def dp1(a):
    size = len(a)
    MOD = int(1e9+7)
    for i in range(size):
        a[i][0] = 1
    for j in range(size):
        a[0][i] = 1
    for i in range(1, size):
        for j in range(1, size):
            a[i][j] = (a[i][j] + a[i-1][j] + a[i][j-1]) % MOD
dp_jit = numba.jit('void(i4[:,:])')(dp1)
'''
dp = "dp_jit(a)"

test("numba", numba_init_0, numba_init_1, dp)

Results:

range initial time:     0.56781153
range calculate time:   5.08359793
multi initial time:     0.03682878
multi calculate time:   5.14657282
numpy initial time:     0.00883761
numpy calculate time:   12.15619322
array initial time:     0.02656035
array calculate time:   5.27542352
ctypes initial time:    0.00523795
ctypes calculate time:  7.88469346
------
numba initial time:     2.98394509
numba calculate time:   0.05321887

(Numba initialization time here doesn't include numpy initialization)

As we can see, both numpy and ctypes are slower than native lists when computing.

Numba JIT costs some time, but the computation time is significantly shorter.

(And don't use Python for 2D dynamical programming at Online Judgement Platform!)

Python 2D list performance, without numpy

4 Answers4