How to dereference HDF5 references in Python?

Question

Sometimes I get the following arrays from my HDF5 file:

val1 = {ndarray} [<HDF5 object reference> <HDF5 object reference> <HDF5 object reference>]

If I try to dereference it with HDF5 file object

f[val[0]]

I get an error

Argument 'ref' has incorrect type (expected h5py.h5r.Reference, got numpy.object_)

score 4 · Answer 1 · answered Oct 13 '17 at 16:24

I've come across this question while trying to answer what turned out to be basically the same question in another form. A dataset containing references to other objects is a bit of an awkward situation in HDF5, but you can actually read them in a pretty straightforward way. The idea is to get the name of the referenced object, and then just read that object directly from the file.

Given a single HDF5 reference, ref, and a file, file, you can return the name of the referenced dataset by doing:

>>> name = h5py.h5r.get_name(ref, file.id)

Then just read the actual dataset itself, as usual:

>>> data = file[name].value # ndarray with the data in it.

So to read all the referenced datasets, just map this process across the whole dataset of references.

How to dereference HDF5 references in Python?

1 Answers1