29

I have a dictionary, where key is datetime object and value is tuple of integers:

>>> d.items()[0]
(datetime.datetime(2012, 4, 5, 23, 30), (14, 1014, 6, 3, 0))

I want to store it in HDF5 dataset, but if I try to just dump the dictionary h5py raises error:

TypeError: Object dtype dtype('object') has no native HDF5 equivalent

What would be "the best" way to transform this dictionary so that I can store it in HDF5 dataset?

Specifically I don't want to just dump the dictionary in numpy array, as it would complicate data retrieval based on datetime query.

theta
  • 21,223
  • 35
  • 106
  • 149

5 Answers5

16

I found two ways to this:

I) transform datetime object to string and use it as dataset name

h = h5py.File('myfile.hdf5')
for k, v in d.items():
    h.create_dataset(k.strftime('%Y-%m-%dT%H:%M:%SZ'), data=np.array(v, dtype=np.int8))

where data can be accessed by quering key strings (datasets name). For example:

for ds in h.keys():
    if '2012-04' in ds:
        print(h[ds].value)

II) transform datetime object to dataset subgroups

h = h5py.File('myfile.hdf5')
for k, v in d.items():
    h.create_dataset(k.strftime('%Y/%m/%d/%H:%M'), data=np.array(v, dtype=np.int8))

notice forward slashes in strftime string, which will create appropriate subgroups in HDF file. Data can be accessed directly like h['2012']['04']['05']['23:30'].value, or by iterating with provided h5py iterators, or even by using custom functions through visititems()

For simplicity I choose the first option.

theta
  • 21,223
  • 35
  • 106
  • 149
  • You could just convert the dictionary to a string and then use the ast library to decode the dictionary. This solution, in general, should work in many cases. – Ameet Deshpande Jan 04 '18 at 18:56
12

This question relates to the more general question of being able to store any type of dictionary in HDF5 format. First, convert the dictionary to a string. Then to recover the dictionary, use the ast library by using the import ast command. The following code gives an example.

>>> d = {1:"a",2:"b"}
>>> s = str(d)
>>> s
"{1: 'a', 2: 'b'}"
>>> ast.literal_eval(s)
{1: 'a', 2: 'b'}
>>> type(ast.literal_eval(s))
<type 'dict'>
Ameet Deshpande
  • 428
  • 5
  • 18
6

I would serialize the object into JSON or YAML and store the resulting string as an attribute in the appropriate object (HDF5 group or dataset).

I'm not sure why you're using the datetime as a dataset name, however, unless you absolutely need to look up your dataset directly by datetime.

p.s. For what it's worth, PyTables is a lot easier to use than the low-level h5py.

Klimaat
  • 870
  • 12
  • 16
Jason S
  • 171,795
  • 155
  • 551
  • 900
5

Nowadays we have deepdish (www.deepdish.io):

import deepdish as dd
dd.io.save(filename, {'dict1': dict1, 'dict2': dict2}, compression=('blosc', 9))
wordsforthewise
  • 8,361
  • 3
  • 56
  • 90
3

Previous answers were aiming to store a Python dictionary as hdf5 dataset. The following code can be used for storing Python dictionary as hdf5 attributes(metadata) which is more logical method:

import h5py
import numpy as np

#Writing data
d1 = np.random.random(size = (1000,20))  #sample data
hf = h5py.File('test_data.h5', 'w')
dset1=hf.create_dataset('dataset_1', data=d1)
#set some metadata directly
hf.attrs['metadata1']=5

#sample dictionary object
sample_dict={"metadata2":1, "metadata3":2, "metadata4":"blah_blah"}

#Store this dictionary object as hdf5 metadata
for k in sample_dict.keys():
    hf.attrs[k]=sample_dict[k]

hf.close()

#Reading data
hf1 = h5py.File('test_data.h5', 'r')
for name in hf1:
    print(name)

print(hf1.attrs.keys())
hf1.close()

This gives an output as

dataset_1
<KeysViewHDF5 ['metadata1', 'metadata2', 'metadata3', 'metadata4']>

It means that metadata1 which was directly assigned as an attribute and metadata2, 3, 4 which are obtained from a dictionary object, are simultaneously stored as attributes.

Vaibhav Dixit
  • 606
  • 1
  • 6
  • 9