My use-case is that at some online judge platform, the library dependency is limited, but there is need for 2D array when doing dynamical programming (also hard to vectorize). My python codes there often get Time Limit Exceeded.
There are things to reminder:
Although python list is a array of pointers, the naive objects are very quick.
Using compact structure like numpy maybe fast when creating object, but accessing elements will cost extra overhead unlike naive python object,
unless some JIT things like Numba are used.
Test code with a toy DP example:
import timeit
size = 1000
range_init_0 = "size={}".format(size)
range_init_1 = "a = [[0 for i in range(size)] for j in range(size)]"
multi_init_0 = "size={}".format(size)
multi_init_1 = "a = [[0]*size for _ in range(size)]"
numpy_init_0 = "from numpy import zeros; size={}".format(size)
numpy_init_1 = "a = zeros((size, size), dtype=int)"
array_init_0 = "from array import array; size={}".format(size)
array_init_1 = "a = [array('d', [0])*size for j in range(size)]"
ctyps_init_0 = "from ctypes import c_int; size={}".format(size)
ctypes_init_1 = "a = (c_int * size * size)()"
dp = '''
MOD = int(1e9+7)
for i in range(size):
a[i][0] = 1
for j in range(size):
a[0][i] = 1
for i in range(1, size):
for j in range(1, size):
a[i][j] = (a[i][j] + a[i-1][j] + a[i][j-1]) % MOD
'''
def test(name, init_0, init_1, dp, n=10):
t = timeit.timeit(init_1, setup=init_0, number=n)
print("{} initial time:\t{:.8f}".format(name, t))
t = timeit.timeit(dp, setup=init_0 + '\n' + init_1, number=n)
print("{} calculate time:\t{:.8f}".format(name, t))
test("range", range_init_0, range_init_1, dp)
test("multi", multi_init_0, multi_init_1, dp)
test("numpy", numpy_init_0, numpy_init_1, dp)
test("array", array_init_0, array_init_1, dp)
test("ctypes", ctyps_init_0, ctypes_init_1, dp)
print('------')
numba_init_0 = '''
import numpy as np
size = {}
a = np.zeros((size, size), dtype=np.int32)
'''.format(size)
numba_init_1 = '''
import numba
def dp1(a):
size = len(a)
MOD = int(1e9+7)
for i in range(size):
a[i][0] = 1
for j in range(size):
a[0][i] = 1
for i in range(1, size):
for j in range(1, size):
a[i][j] = (a[i][j] + a[i-1][j] + a[i][j-1]) % MOD
dp_jit = numba.jit('void(i4[:,:])')(dp1)
'''
dp = "dp_jit(a)"
test("numba", numba_init_0, numba_init_1, dp)
Results:
range initial time: 0.56781153
range calculate time: 5.08359793
multi initial time: 0.03682878
multi calculate time: 5.14657282
numpy initial time: 0.00883761
numpy calculate time: 12.15619322
array initial time: 0.02656035
array calculate time: 5.27542352
ctypes initial time: 0.00523795
ctypes calculate time: 7.88469346
------
numba initial time: 2.98394509
numba calculate time: 0.05321887
(Numba initialization time here doesn't include numpy initialization)
As we can see, both numpy and ctypes are slower than native lists when computing.
Numba JIT costs some time, but the computation time is significantly shorter.
(And don't use Python for 2D dynamical programming at Online Judgement Platform!)