0

I would like to use a memmap allocated numpy array that can be processed in parallel using joblib i.e. shared memory between different processes. But I also want the big array to be stored entirely on RAM to avoid the write/read to disk that memmap does. I have enough RAM to store the whole array, but using np.zeros() instead of memmap complicates parallelization since the former allocates memory local to a process. How do I achieve my goal?

Example:

x_memmap = os.path.join(folder, 'x_memmap')
x_shared = np.memmap(x_memmap,dtype=np.float32,shape=(100000,8,8,32),mode='w+')

Later:

n = N / number_of_cores
slices = [ slice((id*n) , (min(N,(id+1)*n))) for id in range(number_of_cores) ]
Parallel(n_jobs=number_of_cores)( delayed(my_job) ( x_shared[sl,:] ) for sl in slices )

If I allocate x_shared with np.zeros instead as shown below, I can't use parallelization.

x_shared = np.zeros(dtype=np.float32,shape=(100000,8,8,32))
danny
  • 799
  • 6
  • 27
  • _using np.zeros() instead of memmap complicates parallelization._ What do you mean, can you elaborate? Please clarify your question. See [ask], [help/on-topic]. – AMC May 11 '20 at 19:26
  • np.zeros() allocates memory local to one process. I want to have a shared memory between different processes -- memmap can do that but it writes the array to file which I want to avoid for performance reasons. Also, I can't use threads because of the global lock. – danny May 11 '20 at 19:31
  • 1
    The [`multiprocessing.shared_memory` docs](https://docs.python.org/3/library/multiprocessing.shared_memory.html) have an example of making a NumPy array backed by shared memory. – user2357112 supports Monica May 11 '20 at 19:51

0 Answers0