For this question, I refer to the example in Python docs discussing the "use of the SharedMemory
class with NumPy
arrays, accessing the same numpy.ndarray
from two distinct Python shells".
A major change that I'd like to implement is manipulate array of class objects rather than integer values as I demonstrate below.
import numpy as np
from multiprocessing import shared_memory
# a simplistic class example
class A():
def __init__(self, x):
self.x = x
# numpy array of class objects
a = np.array([A(1), A(2), A(3)])
# create a shared memory instance
shm = shared_memory.SharedMemory(create=True, size=a.nbytes, name='psm_test0')
# numpy array backed by shared memory
b = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf)
# copy the original data into shared memory
b[:] = a[:]
print(b)
# array([<__main__.Foo object at 0x7fac56cd1190>,
# <__main__.Foo object at 0x7fac56cd1970>,
# <__main__.Foo object at 0x7fac56cd19a0>], dtype=object)
Now, in a different shell, we attach to the shared memory space and try to manipulate the contents of the array.
import numpy as np
from multiprocessing import shared_memory
# attach to the existing shared space
existing_shm = shared_memory.SharedMemory(name='psm_test0')
c = np.ndarray((3,), dtype=object, buffer=existing_shm.buf)
Even before we are able to manipulate c
, printing it will result in a segmentation fault. Indeed I can not expect to observe a behaviour that has not been written into the module, so my question is what can I do to work with a shared array of objects?
I'm currently pickling the list but protected read/writes add a fair bit of overhead. I've also tried using Namespace
, which was quite slow because indexed writes are not allowed. Another idea could be to use share Ctypes Structure in a ShareableList
but I wouldn't know where to start with that.
In addition there is also a design aspect: it appears that there is an open bug in shared_memory
that may affect my implementation wherein I have several processes working on different elements of the array.
Is there a more scalable way of sharing a large list of objects between several processes so that at any given time all running processes interact with a unique object/element in the list?
UPDATE: At this point, I will also accept partial answers that talk about whether this can be achieved with Python at all.