I've come across this question while trying to answer what turned out to be basically the same question in another form. A dataset containing references to other objects is a bit of an awkward situation in HDF5, but you can actually read them in a pretty straightforward way. The idea is to get the name of the referenced object, and then just read that object directly from the file.
Given a single HDF5 reference, ref
, and a file, file
, you can return the name of the referenced dataset by doing:
>>> name = h5py.h5r.get_name(ref, file.id)
Then just read the actual dataset itself, as usual:
>>> data = file[name].value # ndarray with the data in it.
So to read all the referenced datasets, just map
this process across the whole dataset of references.