2

I am trying to load a mat file for the Street View House Numbers (SVHN) Dataset http://ufldl.stanford.edu/housenumbers/ in Python with the following code

import h5py
labels_file = './sv/train/digitStruct.mat'
f = h5py.File(labels_file)
struct= f.values()
names = struct[1].values()
print(names[1][1].value)

I get [<HDF5 object reference>] but I need to know the actual string

Andrea Sindico
  • 7,058
  • 6
  • 41
  • 79

1 Answers1

1

To get an idea of the data layout you could execute

h5dump ./sv/train/digitStruct.mat

but there are also other methods like visit or visititems.

A good reference that can help you and that seems to have already addressed a very similar problem (if not the same) recently is the following SO post:
h5py, access data in Datasets in SVHN
For example the snippet:

import h5py
import numpy

def get_name(index, hdf5_data):
    name = hdf5_data['/digitStruct/name']
    print ''.join([chr(v[0]) for v in hdf5_data[name[index][0]].value])

labels_file = 'train/digitStruct.mat'
f = h5py.File(labels_file)
for j in range(33402):
    get_name(j, f)

will print the name of the files. I get for example:

7459.png
7460.png
7461.png
7462.png
7463.png
7464.png
7465.png

You can generalize from here.

Community
  • 1
  • 1
fedepad
  • 4,229
  • 1
  • 9
  • 24
  • I am sorry the question was not well rendered due to the [ characters. I have fixed it. Please check it out, thanks a lot. – Andrea Sindico Jan 11 '17 at 06:08
  • If I tried this, it is really really slow (about 3 minutes to get 50 names on a good i7), do you have an idea ? – hl037_ Aug 12 '18 at 17:30