73

I am trying to read a matlab file with the following code

import scipy.io
mat = scipy.io.loadmat('test.mat')

and it gives me the following error

raise NotImplementedError('Please use HDF reader for matlab v7.3 files')
NotImplementedError: Please use HDF reader for matlab v7.3 files

so could anyone please had the same problem and could please any sample code

thanks

Cris Luengo
  • 43,236
  • 6
  • 46
  • 92
Shan
  • 16,043
  • 36
  • 84
  • 124

9 Answers9

54

Try using h5py module

import h5py
with h5py.File('test.mat', 'r') as f:
    f.keys()
Shai
  • 93,148
  • 34
  • 197
  • 325
  • 5
    yeah, but there is an array of structs, and I have no idea how to read it, – Shan Jun 27 '13 at 12:12
  • 1
    `f.keys()` should give you the names of the variables stored in `'test.mat'`. Can you access `f['s'][0].keys()`? Assuming `s` is the name of the struct array you stored, this should give you a list of the fields of `s`. – Shai Jun 27 '13 at 12:14
  • 2
    no i cant access it, more specifically, I am trying to read the mat file given in the following website, http://ufldl.stanford.edu/housenumbers/, in the file train.tar.gz, there is a mat file named digitStruct.mat – Shan Jun 27 '13 at 12:31
  • 34
    This answer does not really provide sufficient background to actually use the mat file in this way. The files can be opened, sure, but with `scipy.io.loadmat` the file is represented in transparent data structures (namely, dictionaries and numpy arrays). The answer would be significantly improved if it also indicated how to actually access the HDF data structures. – aestrivex May 21 '15 at 17:27
  • 1
    This piece of code will give you a dictionary. By extract the data associated with the keys, which are variable names, we can get array-like data structures. for example ``. Row or column can be accessed directly from this data structure, or we can convert it to numpy array easily by: `np.array(data_structure)`. – lenhhoxung Dec 29 '16 at 14:57
  • I've created a library to conveniently load `.mat`: https://github.com/skjerns/mat7.3 , it loads into Python native types. Install with `pip install mat73` – skjerns Dec 17 '19 at 10:28
  • Years later and this answer still has the most votes despite being completely useless by not actually telling you how to use the result of `f.keys()` (spoiler, you get a `KeysView()` which is not helpful at all. You could *at least* have said to cast the obscure object to a list so you can see what the keys actually are. This all just goes to show how pointlessly obscure h5py really is. Why it's still the standard HDF5 library is beyond me. – ThatNewGuy Apr 01 '21 at 18:19
31
import h5py
import numpy as np
filepath = '/path/to/data.mat'
arrays = {}
f = h5py.File(filepath)
for k, v in f.items():
    arrays[k] = np.array(v)

you should end up with your data in the arrays dict, unless you have MATLAB structures, I suspect. Hope it helps!

norok2
  • 18,523
  • 3
  • 47
  • 78
  • what problem do you observe? have you checked that MATLAB (or octave) can open the file? – norok2 Mar 01 '18 at 15:55
  • Yeah I can open it with them! – Euler_Salter Mar 01 '18 at 16:13
  • 1
    Perhaps it is saved with the old MATLAB format, in that case you should probably use `scipy.io.loadmat()` https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html#scipy.io.loadmat This solution is for MATLAB format v.7.3 and above. – norok2 Jun 28 '18 at 10:39
  • It works, just that the original matrix was of size 100x256x256x3 but the result was of size 3x256x256x100. I had to use 'swapaxes' eventually. – Ruchir Jul 25 '19 at 03:14
25

I've created a small library to load MATLAB 7.3 files:

pip install mat73

To load a .mat 7.3 into Python as a dictionary:

import mat73
data_dict = mat73.loadmat('data.mat')

simple as that!

skjerns
  • 908
  • 8
  • 17
  • 4
    ****Best answer right here. Thank you so much. These kind of works removes so much of unnecessary clutter out of work. – your_boy_gorja Jun 07 '20 at 05:32
  • 1
    You are a hero sir! – Aleksejs Fomins Oct 05 '20 at 15:08
  • @IntegrateThis feel free to open an issue at https://github.com/skjerns/mat7.3/issues if you think this is a bug. Under the hood mat73 relies on `h5py` and your HDD speed. – skjerns Nov 13 '20 at 07:57
  • Why is this not part of the standard libraries? – ThatNewGuy Apr 01 '21 at 18:25
  • @ThatNewGuy you mean `scipy`? Because it introduces an dependency on `h5py`, which is not part of the standard lib / scipy-stack – skjerns Apr 03 '21 at 11:09
  • @skjerns apologies, that was lazy diction on my part. I just mean generally accepted libraries that come included with your common package managers (like Anaconda). For security reasons, my work doesn't allow any third party packages not already included with Anaconda, so there's no clean way to transfer data between matlab and python. – ThatNewGuy Apr 03 '21 at 14:29
  • `pip` is in fact the most commonly used package manager. If your work doesn't allow installation of any packages besides the default ones in Anaconda, my condolences. You can try to install it in your user folder `pip install mat73 -u`, or alternatively just download the .py file to your project and import it, that should absolutely work. There should be no way your company can prevent you from doing that. Else discuss this with your supervisor. – skjerns Apr 04 '21 at 08:38
  • @skjerns yeah, if it's a single file from the internet, which is basically a text file, I'm sure it would be fine. But any repository pulls are out of the question. Thanks for the suggestion! – ThatNewGuy Apr 04 '21 at 17:07
16

Per Magu_'s answer on a related thread, check out the package hdf5storage which has convenience functions to read v7.3 matlab mat files; it is as simple as

import hdf5storage
mat = hdf5storage.loadmat('test.mat')
Maxim
  • 5,606
  • 1
  • 23
  • 28
10

I had a look at this issue: https://github.com/h5py/h5py/issues/726. If you saved your mat file with -v7.3 option, you should generate the list of keys with (under Python 3.x):

import h5py
with h5py.File('test.mat', 'r') as file:
    print(list(file.keys()))

In order to access the variable a for instance, you have to use the same trick:

with h5py.File('test.mat', 'r') as file:
    a = list(file['a'])
Léonard
  • 1,643
  • 9
  • 23
6

According to the Scipy cookbook. http://wiki.scipy.org/Cookbook/Reading_mat_files,

Beginning at release 7.3 of Matlab, mat files are actually saved using the HDF5 format by default (except if you use the -vX flag at save time, see help save in Matlab). These files can be read in Python using, for instance, the PyTables or h5py package. Reading Matlab structures in mat files does not seem supported at this point.

Perhaps you could use Octave to re-save using the -vX flag.

lee
  • 163
  • 2
  • 4
  • As far as I can tell octave doesn't support v7.3 files either. So really you'd need to resave using a recent enough matlab version. – Michael Anderson Sep 13 '17 at 02:27
4

Despite hours of searching I've not found how to access Matlab v7.3 structures either. Hopefully this partial answer will help someone, and I'd be very happy to see extra pointers.

So starting with (I think the [0][0] arises from Matlab giving everything to dimensions):

f = h5py.File('filename', 'r')
f['varname'][0][0]

gives: < HDF5 object reference >

Pass this reference to f again:

f[f['varname'][0][0]]

which gives an array: convert this to a numpy array and extract the value (or, recursively, another < HDF5 object reference > :

np.array(f[f['varname'][0][0]])[0][0]

If accessing the disk is slow, maybe loading to memory would help.


Further edit: after much futile searching my final workaround (I really hope someone else has a better solution!) was calling Matlab from python which is pretty easy and fast:

eng = matlab.engine.start_matlab()  # first fire up a Matlab instance
eng.quit()
eng = matlab.engine.connect_matlab()  # or connect to an existing one
eng.sqrt(4.0)
x = 4.0
eng.workspace['y'] = x
a = eng.eval('sqrt(y)')
print(a)
x = eng.eval('parameterised_function_in_Matlab(1, 1)', nargout=1)
a = eng.eval('Structured_variable{1}{2}.object_name')  # (nested cell, cell, object)
Stephen Morrell
  • 831
  • 7
  • 4
2

This function reads Matlab-produced HDF5 .mat files, and returns a structure of nested dicts of Numpy arrays. Matlab writes matrices in Fortran order, so this also transposes matrices and higher-dimensional arrays into conventional Numpy order arr[..., page, row, col].

import h5py

def read_matlab(filename):
    def conv(path=''):
        p = path or '/'
        paths[p] = ret = {}
        for k, v in f[p].items():
            if type(v).__name__ == 'Group':
                ret[k] = conv(f'{path}/{k}')  # Nested struct
                continue
            v = v[()]  # It's a Numpy array now
            if v.dtype == 'object':
                # HDF5ObjectReferences are converted into a list of actual pointers
                ret[k] = [r and paths.get(f[r].name, f[r].name) for r in v.flat]
            else:
                # Matrices and other numeric arrays
                ret[k] = v if v.ndim < 2 else v.swapaxes(-1, -2)
        return ret

    paths = {}
    with h5py.File(filename, 'r') as f:
        return conv()
L. Kärkkäinen
  • 780
  • 7
  • 9
0

If you are only reading in basic arrays and structs, see vikrantt's answer on a similar post. However, if you are working with a Matlab table, then IMHO the best solution is to avoid the save option altogether.

I've created a simple helper function to convert a Matlab table to a standard hdf5 file, and another helper function in Python to extract the data into a Pandas DataFrame.

Matlab Helper Function

function table_to_hdf5(T, path, group)
%TABLE_TO_HDF5 Save a Matlab table in an hdf5 file format
%
%    TABLE_TO_HDF5(T) Saves the table T to the HDF5 file inputname.h5 at the root ('/')
%    group, where inputname is the name of the input argument for T
%
%    TABLE_TO_HDF5(T, path) Saves the table T to the HDF5 file specified by path at the
%    root ('/') group.
%
%    TABLE_TO_HDF5(T, path, group) Saves the table T to the HDF5 file specified by path
%    at the group specified by group.
%
%%%

if nargin < 2
    path = [inputname(1),'.h5'];  % default file name to input argument
end
if nargin < 3
    group = '';  % We will prepend '/' later, so this is effectively root
end

for field = T.Properties.VariableNames
    % Prepare to write
    field = field{:};
    dataset_name = [group '/' field];
    data = T.(field);
    if ischar(data) || isstring(data)
        warning('String columns not supported. Skipping...')
        continue
    end
    % Write the data
    h5create(path, dataset_name, size(data))
    h5write(path, dataset_name, data)
end

end

Python Helper Function

import pandas as pd
import h5py


def h5_to_df(path, group = '/'):
"""
Load an hdf5 file into a pandas DataFrame
"""
    df = pd.DataFrame()
    with h5py.File(path, 'r') as f:
        data = f[group]
        for k,v in data.items():
            if v.shape[0] > 1:  # Multiple column field
                for i in range(v.shape[0]):
                    k_new = f'{k}_{i}'
                    df[k_new] = v[i]
            else:
                df[k] = v[0]
    return df

Important Notes

  • This will only work on numerical data. If you know how to add string data, please comment.
  • This will create the file if it does not already exist.
  • This will crash if the data already exists in the file. You'll want to include logic to handle those cases as you deem appropriate.
ThatNewGuy
  • 177
  • 9