24

Is there any way to remove a dataset from an hdf5 file, preferably using h5py? Or alternatively, is it possible to overwrite a dataset while keeping the other datasets intact?

To my understanding, h5py can read/write hdf5 files in 5 modes

f = h5py.File("filename.hdf5",'mode')

where mode can be rfor read, r+ for read-write, a for read-write but creates a new file if it doesn't exist, w for write/overwrite, and w- which is same as w but fails if file already exists. I have tried all but none seem to work.

Any suggestions are much appreciated.

hsnee
  • 446
  • 1
  • 4
  • 16

4 Answers4

43

Yes, this can be done.

with h5py.File(input,  "a") as f:
    del f[datasetname]

You will need to have the file open in a writeable mode, for example append (as above) or write.

As noted by @seppo-enarvi in the comments the purpose of the previously recommended f.__delitem__(datasetname) function is to implement the del operator, so that one can delete a dataset using del f[datasetname]

ignorance
  • 4,511
  • 2
  • 33
  • 68
EnemyBagJones
  • 723
  • 1
  • 10
  • 24
  • 8
    The purpose of `__delitem__` function is to implement the `del` operator, so that one can delete a dataset using `del f[datasetname]`. – Seppo Enarvi Dec 23 '15 at 23:39
  • @SeppoEnarvi so are you saying that the syntax should be `with h5py.File(input, "a") as f: del f[datasetname]` instead of what is written above? – DeeWBee Jul 11 '16 at 14:00
  • 2
    I would use `del f[datasetname]`, as it is the standard way to delete an object in Python, and that's also what the documentation advises. But they both probably work. – Seppo Enarvi Jul 12 '16 at 12:29
  • 6
    The file size remains the same after deleting few dataset. I tried deleting half the data of a 6 GB file and its size remains the same ever, could this be solved? – Pratheeswaran Jan 24 '19 at 14:10
  • 1
    @Pratheeswaran, you will likely need to repack the file with one of the HDF group's command line utilities, or you can copy the contents to a new file, then replace the existing file with it. I believe because of the tree structure utilized by HDF5 it's not trivial to recover space. – EnemyBagJones Jan 25 '19 at 18:36
5

I tried this out and the only way I could actually reduce the size of the file is by copying everything to a new file and just leaving out the dataset I was not interested in:

fs = h5py.File('WFA.h5', 'r')
fd = h5py.File('WFA_red.h5', 'w')
for a in fs.attrs:
    fd.attrs[a] = fs.attrs[a]
for d in fs:
    if not 'SFS_TRANSITION' in d: fs.copy(d, fd)
Felix
  • 151
  • 1
  • 3
  • I've suggested an edit based on some assumptions (SFS_TRANSITION is something specific to your work?) to make your answer more general, please roll back if I've misunderstood something – llama Feb 17 '21 at 22:31
0

I do not understand what has your question to do with the file open modes. For read/write r+ is the way to go.

To my knowledge, removing is not easy/possible, in particular no matter what you do the file size will not shrink.

But overwriting content is no problem

f['mydataset'][:] = 0
agomcas
  • 625
  • 4
  • 12
-1

I wanted to make you aware of a development one of my colleagues made and put online in opensource. It's called h5nav. You can download it with pip install (https://pypi.org/project/h5nav/).

pip install h5nav

h5nav toto.h5
ls
rm the_group_you_want_to_delete
exit

Note that you'll still have to use h5repack to lower the size of your file.

Best, Jérôme

char
  • 1,730
  • 3
  • 12
  • 19
Jerome
  • 1