6

Is it possible to load matlab tables in python using scipy.io.loadmat?

What I'm doing:

In Matlab:

tab = table((1:500)')
save('tab.mat', 'tab')

In Python:

import scipy.io
mat = scipy.io.loadmat('m:/tab.mat')

But I cannot access the table tab in Python using mat['tab']

Ivan
  • 2,323
  • 2
  • 14
  • 17
  • 1
    I am able to load a matlab array, so it's not a problem with versions. I just cannot load a matlab table – Ivan Sep 16 '14 at 20:52
  • Here is the error message: >>> mat['tab'] Traceback (most recent call last): File "C:\Anaconda\lib\site-packages\IPython\core\interactiveshell.py", line 2883, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in mat['tab'] KeyError: 'tab' – Ivan Sep 16 '14 at 20:53
  • what kind of python variable is `mat` - is there any data at all (and just not the field assigned)? Or does the `loadmat` fail for the table format all together? – Schorsch Sep 17 '14 at 21:10
  • what do you get for this command in python: `scipy.io.whosmat('m:/tab.mat')`? (Which is an idea I got from [here](https://github.com/scipy/scipy/issues/2452)) – Schorsch Sep 17 '14 at 21:12
  • Does the approach from this answer to ['*Read .mat files in Python*'](http://stackoverflow.com/questions/874461/read-mat-files-in-python/19340117#19340117) work with the `table`? – Schorsch Sep 17 '14 at 21:19
  • @Ivan Did you find solution for this? – GKS Feb 01 '17 at 14:54

5 Answers5

5

The answer to your question is no. Many matlab objects can be loaded in python. Tables, among others, can not be loaded. See Handle Data Returned from MATLAB to Python

Verena Haunschmid
  • 1,202
  • 15
  • 39
1

I've looked into this for a project I'm working on, and as a workaround, you could try the following.

In MATLAB, first convert the @table object into a struct, and retrieve the column names using:

table_struct = struct(table_object);
table_columns = table_struct.varDim.labels;
save table_as_struct table_struct table_columns;

And then you can try the following code in python:

import numpy
import pandas as pd
import scipy.io

# function to load table variable from MAT-file
def loadtablefrommat(matfilename, tablevarname, columnnamesvarname):
    """
    read a struct-ified table variable (and column names) from a MAT-file
    and return pandas.DataFrame object.
    """

    # load file
    mat = scipy.io.loadmat(matfilename)

    # get table (struct) variable
    tvar = mat.get(tablevarname)
    data_desc = mat.get(columnnamesvarname)
    types = tvar.dtype
    fieldnames = types.names

    # extract data (from table struct)
    data = None
    for idx in range(len(fieldnames)):
        if fieldnames[idx] == 'data':
            data = tvar[0][0][idx]
            break;

    # get number of columns and rows
    numcols = data.shape[1]
    numrows = data[0, 0].shape[0]

    # and get column headers as a list (array)
    data_cols = []
    for idx in range(numcols):
        data_cols.append(data_desc[0, idx][0])

    # create dict out of original table
    table_dict = {}
    for colidx in range(numcols):
        rowvals = []
        for rowidx in range(numrows):
            rowval = data[0,colidx][rowidx][0]
            if type(rowval) == numpy.ndarray and rowval.size > 0:
                rowvals.append(rowval[0])
            else:
                rowvals.append(rowval)
        table_dict[data_cols[colidx]] = rowvals
    return pd.DataFrame(table_dict)
Jochen
  • 11
  • 3
1

Based on Jochens answer i propose a different variant that does a good job for me. I wrote a Matlab Script to prepare the m-file automatically (see my GitLab Repositroy with examples). It does the following:

In Matlab for class table:

Does the same like Jochens example, but binds the data together. So it is easier to load multiple variables. The names "table" and "columns" are mandatory for the next part.

YourVariableName = struct('table', struct(TableYouWantToLoad), 'columns', {struct(TableYouWantToLoad).varDim.labels})
save('YourFileName', 'YourVariableName')

In Matlab for class dataset:

Alternative, if you have to handle the old dataset type.

YourVariableName = struct('table', struct(DatasetYouWantToLoad), 'columns', {get(DatasetYouWantToLoad,'VarNames')})
save('YourFileName', 'YourVariableName')

In Python:

import scipy.io as sio
mdata = sio.loadmat('YourFileName')
mtable = load_table_from_struct(mdata['YourVariableName'])

with

import pandas as pd

def load_table_from_struct(table_structure) -> pd.DataFrame():

    # get prepared data structure
    data = table_structure[0, 0]['table']['data']
    # get prepared column names
    data_cols = [name[0] for name in table_structure[0, 0]['columns'][0]]

    # create dict out of original table
    table_dict = {}
    for colidx in range(len(data_cols)):
        table_dict[data_cols[colidx]] = [val[0] for val in data[0, 0][0, colidx]]

    return pd.DataFrame(table_dict)

It is independent from loading the file, but basically a minimized versions of Jochens Code. So please give him kudos for his post.

CodePrinz
  • 173
  • 7
1

The loadmat function doesn't load MATLAB tables. Instead a small workaround can be done. The tables can be saves as .csv files which can then be read using pandas.

In MATLAB

writetable(table_name, file_name)

In Python

df = pd.read_csv(file_name)

At the end, the DataFrame df will have the contents of table_name

sotmot
  • 721
  • 4
  • 14
0

As others have mentioned, this is currently not possible, because Matlab has not documented this file format. People are trying to reverse engineer the file format but this is a work in progress.

A workaround is to write the table to CSV format and to load that using Python. The entries in the table can be variable length arrays and these will be split across numbered columns. I have written a short function to load both scalars and arrays from this CSV file.

To write the table to CSV in matlab:

writetable(table_name, filename)

To read the CSV file in Python:

def load_matlab_csv(filename):
    """Read CSV written by matlab tablewrite into DataFrames

    Each entry in the table can be a scalar or a variable length array.
    If it is a variable length array, then Matlab generates a set of
    columns, long enough to hold the longest array. These columns have
    the variable name with an index appended.

    This function infers which entries are scalars and which are arrays.
    Arrays are grouped together and sorted by their index.

    Returns: scalar_df, array_df
        scalar_df : DataFrame of scalar values from the table
        array_df : DataFrame with MultiIndex on columns
            The first level is the array name
            The second level is the index within that array
    """
    # Read the CSV file
    tdf = pandas.read_table(filename, sep=',')
    cols = list(tdf.columns)

    # Figure out which columns correspond to scalars and which to arrays
    scalar_cols = [] # scalar column names
    arr_cols = [] # array column names, without index
    arrname2idxs = {} # dict of array column name to list of integer indices
    arrname2colnames = {} # dict of array column name to list of full names

    # Iterate over columns
    for col in cols:
        # If the name ends in "_" plus space plus digits, it's probably
        # from an array
        if col[-1] in '0123456789' and '_' in col:
            # Array col
            # Infer the array name and index
            colsplit = col.split('_')
            arr_idx = int(colsplit[-1])
            arr_name = '_'.join(colsplit[:-1])

            # Store
            if arr_name in arrname2idxs:
                arrname2idxs[arr_name].append(arr_idx)
                arrname2colnames[arr_name].append(col)
            else:
                arrname2idxs[arr_name] = [arr_idx]
                arrname2colnames[arr_name] = [col]
                arr_cols.append(arr_name)

        else:
            # Scalar col
            scalar_cols.append(col)

    # Extract all scalar columns
    scalar_df = tdf[scalar_cols]

    # Extract each set of array columns into its own dataframe
    array_df_d = {}
    for arrname in arr_cols:
        adf = tdf[arrname2colnames[arrname]].copy()
        adf.columns = arrname2idxs[arrname]
        array_df_d[arrname] = adf

    # Concatenate array dataframes
    array_df = pandas.concat(array_df_d, axis=1)

    return scalar_df, array_df

scalar_df, array_df = load_matlab_csv(filename)
cxrodgers
  • 3,187
  • 1
  • 19
  • 28