7

I get a ValueError: No dataset in HDF5 file. when using :

In [1]: import pandas as pda

In [2]: store = pda.read_hdf('X.h5')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-72e9d80a2c5b> in <module>()
----> 1 store = pda.read_hdf('X.h5')

/usr/local/miniconda3/envs/tensorFlow-GPU/lib/python3.6/site-packages/pandas/io/pytables.py in read_hdf(path_or_buf, key, mode, **kwargs)
    356             groups = store.groups()
    357             if len(groups) == 0:
--> 358                 raise ValueError('No dataset in HDF5 file.')
    359             candidate_only_group = groups[0]
    360

ValueError: No dataset in HDF5 file.

h5dump shows :

$ h5dump -n X.h5
HDF5 "X.h5" {
FILE_CONTENTS {
 group      /
 dataset    /DS
 }
}

And if I use the h5py I can see the data :

In [3]: import h5py
/usr/local/miniconda3/envs/tensorFlow-GPU/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

In [4]: f = h5py.File('X.h5','r')

In [5]: f.keys()
Out[5]: KeysView(<HDF5 file "X.h5" (mode r)>)

In [6]: list( f.keys() )
Out[6]: ['DS']

In [7]: f['DS']
Out[7]: <HDF5 dataset "DS": shape (10, 20), type "<f8">

In [8]: f['DS'][:]
Out[8]:
array([[1., 0., 1., 1., 0., 0., 1., 1., 1., 1., 0., 1., 0., 0., 0., 1.,
        0., 1., 0., 0.],
       [0., 0., 0., 1., 0., 1., 1., 0., 1., 0., 1., 1., 1., 1., 0., 0.,
        1., 1., 0., 0.],
       [0., 1., 1., 1., 1., 0., 1., 1., 1., 0., 1., 0., 1., 1., 0., 0.,
        1., 1., 0., 0.],
       [0., 0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1.,
        1., 0., 1., 0.],
       [0., 1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1.,
        0., 1., 0., 0.],
       [0., 0., 1., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1., 0., 1., 1.,
        1., 1., 0., 0.],
       [0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 0., 0., 1., 1., 0., 1.,
        1., 1., 0., 1.],
       [0., 0., 1., 1., 1., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0.,
        0., 1., 0., 1.],
       [0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0., 1., 1., 1., 0.,
        1., 0., 0., 0.],
       [0., 1., 0., 1., 1., 1., 1., 1., 1., 0., 0., 1., 1., 1., 0., 1.,
        1., 0., 0., 0.]])
SebMa
  • 2,237
  • 16
  • 24
  • Does the `read_hdf`provide any parameters for reading a file that wasn't written by `pandas/pytables`? The `h5py` read shows that 'DS' is not embedded in any group; about as plain a `h5` file as possible (the dump confirms that). I'm a bit surprised since the MATLAB h5 examples that I've seen have the data several layers down, with added type and shape information. – hpaulj Jul 02 '18 at 16:11
  • @hpaulj Hi, [theses Matlab examples](https://fr.mathworks.com/help/matlab/ref/h5write.html) using `h5write` clearly shows they are storing their datasets in the root group (`/`) – SebMa Jul 02 '18 at 16:22
  • I've only looked at `h5` files written by `save` - the newer h5 version of the original `.mat`. (and more specifically the Octave equivalent). – hpaulj Jul 02 '18 at 16:30
  • 2
    `read_hdf` says it `Retrieve pandas object stored in file`. So it's expecting a file created with `df.to_hdf()`, where `df` is a dataframe or other pandas object. – hpaulj Jul 02 '18 at 16:40
  • @hpaulj OK, thank you :) So if I can sum things up. If the HDF5 file was generated using `pandas` , then I can use `pandas` to read it, else I need to use `h5py`. Is that correct ? – SebMa Jul 02 '18 at 17:09
  • I don't know if `pandas` has an alternative. It seems to prefer `pytables` and `HDFStore`. `h5py` appears to be a better match with `h5write` (though I only glanced at its docs). – hpaulj Jul 02 '18 at 17:39
  • 1
    @SebMa I'm new to `HDF5` as well but, yes, I think you need `h5py` to read it, but then you can convert it to a pandas `DataFrame` as `df_f = pd.DataFrame(f['DS'][:])`. – Gabriele Pompa Aug 24 '18 at 15:01
  • 1
    Does this answer your question? [Pandas can't read hdf5 file created with h5py](https://stackoverflow.com/questions/33641246/pandas-cant-read-hdf5-file-created-with-h5py) – user4157124 May 08 '20 at 21:22
  • 1
    @user4157124 I cannot tell as it's been almost two years and because I don't have a Matlab license, nor the environment, nor the H5 data file anymore. – SebMa May 08 '20 at 22:04

0 Answers0