22

Is there a standard way to convert matlab .mat (matlab formated data) files to Panda DataFrame?

I am aware that a workaround is possible by using scipy.io but I am wondering whether there is a straightforward way to do it.

Ramon Martinez
  • 7,798
  • 4
  • 36
  • 43
  • 9
    @MarkMikofski I do not think this is a duplicate of "Read .mat files in Python", which does not touch how to process the extracted data so that it can be put in a Pandas dataframe. – Post169 Aug 02 '18 at 18:50

2 Answers2

31

I found 2 way: scipy or mat4py.

  1. mat4py

Load data from MAT-file

The function loadmat loads all variables stored in the MAT-file into a simple Python data structure, using only Python’s dict and list objects. Numeric and cell arrays are converted to row-ordered nested lists. Arrays are squeezed to eliminate arrays with only one element. The resulting data structure is composed of simple types that are compatible with the JSON format.

Example: Load a MAT-file into a Python data structure:

data = loadmat('datafile.mat')

From:

https://pypi.python.org/pypi/mat4py/0.1.0

  1. Scipy:

Example:

import numpy as np
from scipy.io import loadmat  # this is the SciPy module that loads mat-files
import matplotlib.pyplot as plt
from datetime import datetime, date, time
import pandas as pd

mat = loadmat('measured_data.mat')  # load mat-file
mdata = mat['measuredData']  # variable in mat file
mdtype = mdata.dtype  # dtypes of structures are "unsized objects"
# * SciPy reads in structures as structured NumPy arrays of dtype object
# * The size of the array is the size of the structure array, not the number
#   elements in any particular field. The shape defaults to 2-dimensional.
# * For convenience make a dictionary of the data using the names from dtypes
# * Since the structure has only one element, but is 2-D, index it at [0, 0]
ndata = {n: mdata[n][0, 0] for n in mdtype.names}
# Reconstruct the columns of the data table from just the time series
# Use the number of intervals to test if a field is a column or metadata
columns = [n for n, v in ndata.iteritems() if v.size == ndata['numIntervals']]
# now make a data frame, setting the time stamps as the index
df = pd.DataFrame(np.concatenate([ndata[c] for c in columns], axis=1),
                  index=[datetime(*ts) for ts in ndata['timestamps']],
                  columns=columns)

From:

http://poquitopicante.blogspot.fr/2014/05/loading-matlab-mat-file-into-pandas.html

  1. Finally you can use PyHogs but still use scipy:

Reading complex .mat files.

This notebook shows an example of reading a Matlab .mat file, converting the data into a usable dictionary with loops, a simple plot of the data.

http://pyhogs.github.io/reading-mat-files.html

Destrif
  • 1,926
  • 1
  • 11
  • 22
  • Excellent solution. This must be selected answer. –  Nov 29 '16 at 04:55
  • 2
    The `scipy.io` and `mat4py` modules cannot read Matlab v7.3+ HDF5 datafiles. – SebMa Jul 02 '18 at 17:25
  • 2
    For Python3, use ndata.items() instead of ndata.iteritems() – MaVe Mar 08 '19 at 16:22
  • Known limitations for mat4py: * Arrays with more than 2 dimensions [important] * Arrays with complex numbers [important] * Sparse arrays [important] * Function arrays * Object classes * Anonymous function classes https://pypi.org/project/mat4py/ – Suhas C Jan 09 '20 at 06:49
10

Ways to do this:
As you mentioned scipy

import scipy.io as sio
test = sio.loadmat('test.mat')

Using the matlab engine:

import matlab.engine
eng = matlab.engine.start_matlab()
content = eng.load("example.mat",nargout=1)
SerialDev
  • 2,603
  • 19
  • 31
  • Matlab says "you cannot run the MATLAB engine on a machine that only has the MATLAB Runtime" https://uk.mathworks.com/help/matlab/matlab-engine-for-python.html?w.mathworks.com – Suhas C Jan 09 '20 at 06:20