I am deserializing large numpy arrays (500MB in this example) and I find the results vary by orders of magnitude between approaches. Below are the 3 approaches I've timed.
I'm receiving the data from the multiprocessing.shared_memory
package, so the data comes to me as a memoryview
object. But in these simple examples, I just pre-create a byte array to run the test.
I wonder if there are any mistakes in these approaches, or if there are other techniques I didn't try. Deserialization in Python is a real pickle of a problem if you want to move data fast and not lock the GIL just for the IO. A good explanation as to why these approaches vary so much would also be a good answer.
""" Deserialization speed test """
import numpy as np
import pickle
import time
import io
sz = 524288000
sample = np.random.randint(0, 255, size=sz, dtype=np.uint8) # 500 MB data
serialized_sample = pickle.dumps(sample)
serialized_bytes = sample.tobytes()
serialized_bytesio = io.BytesIO()
np.save(serialized_bytesio, sample, allow_pickle=False)
serialized_bytesio.seek(0)
result = None
print('Deserialize using pickle...')
t0 = time.time()
result = pickle.loads(serialized_sample)
print('Time: {:.10f} sec'.format(time.time() - t0))
print('Deserialize from bytes...')
t0 = time.time()
result = np.ndarray(shape=sz, dtype=np.uint8, buffer=serialized_bytes)
print('Time: {:.10f} sec'.format(time.time() - t0))
print('Deserialize using numpy load from BytesIO...')
t0 = time.time()
result = np.load(serialized_bytesio, allow_pickle=False)
print('Time: {:.10f} sec'.format(time.time() - t0))
Results:
Deserialize using pickle...
Time: 0.2509949207 sec
Deserialize from bytes...
Time: 0.0204288960 sec
Deserialize using numpy load from BytesIO...
Time: 28.9850852489 sec
The second option is the fastest, but notably less elegant because I need to explicitly serialize the shape and dtype information.