15

My python code is receiving a byte array which represents the bytes of the hdf5 file.

I'd like to read this byte array to an in-memory h5py file object without first writing the byte array to disk. This page says that I can open a memory mapped file, but it would be a new, empty file. I want to go from byte array to in-memory hdf5 file, use it, discard it and not to write to disk at any point.

Is it possible to do this with h5py? (or with hdf5 using C if that is the only way)

mahonya
  • 7,648
  • 5
  • 37
  • 64
  • 1
    I'm trying to do the same thing. Could you show some code with the solution who worked? thanks! – konus May 07 '14 at 20:54
  • I found the solution and posted it here: https://stackoverflow.com/questions/11588630/pass-hdf5-file-to-h5py-as-binary-blob-string/45900556#45900556 – SCGH Aug 28 '17 at 01:22
  • Is it still unresolved? [This](https://stackoverflow.com/a/45900556/6357916) answer explains how to read h5 file from bytearray in memory. But how can I get such bytearray from given h5 file in file system. I want to load h5 file on machine different from one having h5 file on its file system. So was thinking to read it as byte stream & send the byte stream to target machine & then load h5 file from that bytearray on target machine. Is it possible? Just asked [question](https://stackoverflow.com/questions/53040259/can-i-read-h5-file-on-one-machine-as-bytearray-stream-that-bytearray-to-other-m) – anir Oct 29 '18 at 06:59

3 Answers3

4

You could try to use Binary I/O to create a File object and read it via h5py:

f = io.BytesIO(YOUR_H5PY_STREAM)
h = h5py.File(f,'r')
Ümit
  • 17,049
  • 7
  • 52
  • 73
1

You can use io.BytesIO or tempfile to create h5 objects, which showed in official docs http://docs.h5py.org/en/stable/high/file.html#python-file-like-objects.

The first argument to File may be a Python file-like object, such as an io.BytesIO or tempfile.TemporaryFile instance. This is a convenient way to create temporary HDF5 files, e.g. for testing or to send over the network.

tempfile.TemporaryFile

>>> tf = tempfile.TemporaryFile()
>>> f = h5py.File(tf)

or io.BytesIO

"""Create an HDF5 file in memory and retrieve the raw bytes

This could be used, for instance, in a server producing small HDF5
files on demand.
"""
import io
import h5py

bio = io.BytesIO()
with h5py.File(bio) as f:
    f['dataset'] = range(10)

data = bio.getvalue() # data is a regular Python bytes object.
print("Total size:", len(data))
print("First bytes:", data[:10])
Shawn Wang
  • 641
  • 6
  • 7
  • both examples fail in python3.7. The first: `TypeError: expected str, bytes or os.PathLike object, not _io.BufferedRandom`. The second example fails at `----> 5 with h5py.File(bio) as f` : `TypeError: expected str, bytes or os.PathLike object, not _io.BytesIO` – Dima Lituiev Dec 12 '20 at 02:30
0

The following example uses tables which can still read and manipulate the H5 format in lieu of H5PY.

import urllib.request
import tables
url = 'https://s3.amazonaws.com/<your bucket>/data.hdf5'
response = urllib.request.urlopen(url) 
h5file = tables.open_file("data-sample.h5", driver="H5FD_CORE",
                          driver_core_image=response.read(),
                          driver_core_backing_store=0)
David Wihl
  • 1,471
  • 13
  • 14