hdf5 file to pandas dataframe

Question

I downloaded a dataset which is stored in .h5 files. I need to keep only certain columns and to be able to manipulate the data in it.

To do this, I tried to load it in a pandas dataframe. I've tried to use:

pd.read_hdf(path)

But I get: No dataset in HDF5 file.

I've found answers on SO (read HDF5 file to pandas DataFrame with conditions) but I don't need conditions, and the answer adds conditions about how the file was written but I'm not the creator of the file so I can't do anything about that.

I've also tried using h5py:

df = h5py.File(path)

But this is not easily manipulable and I can't seem to get the columns out of it (only the names of the columns using df.keys()) Any idea on how to do this ?

score 6 · Answer 1 · answered Oct 03 '19 at 02:12

6

Easiest way to read them into Pandas is to convert into h5py, then np.array, and then into DataFrame. It would look something like:

df = pd.DataFrame(np.array(h5py.File(path)['variable_1']))

answered Oct 03 '19 at 02:12

Ivan Mitevski

121
2
2

score 5 · Answer 2 · edited May 23 '17 at 11:52

5

Pandas HDF support needs the HDF file to be formated very specifically. You can see https://stackoverflow.com/a/33644128/4128030 for more info.

edited May 23 '17 at 11:52

Community

1
1

answered Jan 11 '17 at 18:33

drj

126
1
8

1

Yes. More about this [here](https://stackoverflow.com/a/30787168/4653485) as well. – Jérôme Oct 18 '17 at 10:15

hdf5 file to pandas dataframe

2 Answers2

Linked