35

You can convert a numpy array to bytes using .tobytes() function.

How do decode it back from this bytes array to numpy array? I tried like this for array i of shape (28,28)

>>k=i.tobytes()

>>np.frombuffer(k)==i

False

also tried with uint8 as well.

Gautham Santhosh
  • 625
  • 1
  • 8
  • 18

2 Answers2

42

A couple of issues with what you're doing:

  1. frombuffer will always interpret the input as a 1-dimensional array. It's the first line of the documentation. So you'd have to reshape to be (28, 28).

  2. The default dtype is float. So if you didn't serialize out floats, then you'll have to specify the dtype manually (a priori no one can tell what a stream of bytes means: you have to say what they represent).

  3. If you want to make sure the arrays are equal, you have to use np.array_equal. Using == will do an elementwise operation, and return a numpy array of bools (this presumably isn't what you want).

How do decode it back from this bytes array to numpy array?

Example:

In [3]: i = np.arange(28*28).reshape(28, 28)

In [4]: k = i.tobytes()

In [5]: y = np.frombuffer(k, dtype=i.dtype)

In [6]: y.shape
Out[6]: (784,)

In [7]: np.array_equal(y.reshape(28, 28), i)
Out[7]: True

HTH.

Matt Messersmith
  • 9,979
  • 2
  • 39
  • 42
8

While you could use tobytes(), it isn't the ideal method as it doesn't store shape information of the numpy array.

In cases where you have to send it to another process where you have no information about the shape, you will have to send the shape information explicitly.

A more elegant solution would be saving it to a BytesIO buffer using np.save and recovering using np.load. In this, you don't need to specifically store shape information anywhere and can easily recover your numpy array from the byte value.

Example:

>>> import numpy as np
>>> from io import BytesIO

>>> x = np.arange(28*28).reshape(28, 28)
>>> x.shape
(28, 28)

# save in to BytesIo buffer 
>>> np_bytes = BytesIO()
>>> np.save(np_bytes, x, allow_pickle=True)

# get bytes value
>>> np_bytes = np_bytes.getvalue()
>>> type(np_bytes)
<class 'bytes'>

# load from bytes into numpy array
>>> load_bytes = BytesIO(np_bytes)
>>> loaded_np = np.load(load_bytes, allow_pickle=True)

# shape is preserved
>>> loaded_np.shape
(28, 28)

# both arrays are equal without sending shape
>>> np.array_equal(x,loaded_np)
True
David Parks
  • 25,796
  • 41
  • 148
  • 265
  • 4
    Unfortunately, this approach is very very slow for large numpy arrays. See related SO post: https://stackoverflow.com/questions/62352670/deserialization-of-large-numpy-arrays-using-pickle-is-order-of-magnitude-slower?noredirect=1#comment110277408_62352670 – David Parks Jun 12 '20 at 22:33