2

I have a very simple Sarimax model using statsmodels:

mdl = sm.tsa.statespace.SARIMAX(ts_monthly, exog=ts_exog, order=(3,1,0)).fit()

where ts_monthly and ts_exog are pandas series indexed by date:

df
date          vl_1    vl_2 
2016-01-01     10     12
2016-02-01     14      1
2016-03-01     98     33

ts_monthly = df.vl_1
ts_exog    = df.vl_2

The model fit works, but when I try to run a get_prediction, I get the following error:

ts = pd.Series([12,3,2], index=pd.date_range('2016-04-01', '2016-07-01', freq='M'))

mdl.get_prediction('2016-03-01', '2016-07-01', exog=ts, dynamic=False)

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-135-c89e9e005a31> in <module>()
      6 print(mdl.summary())
      7 _ = mdl.plot_diagnostics()
----> 8 pred = mdl.get_prediction(start=start_date, end=end_date, exog=ts_exog, dynamic=False)
      9 pred_ci = pred.conf_int()
     10 

C:\Users\myuer\bin\anaconda3\lib\site-packages\statsmodels\tsa\statespace\sarimax.py in get_prediction(self, start, end, dynamic, exog, **kwargs)
   1901                                      ' appropriate shape. Required %s, got %s.'
   1902                                      % (str(required_exog_shape),
-> 1903                                         str(exog.shape)))
   1904                 exog = np.c_[self.model.data.orig_exog.T, exog.T].T
   1905 

ValueError: Provided exogenous values are not of the appropriate shape. Required (3, 1), got (3,).

Any ideas of what kind of shape the prediction exogenous series must be?

Ivan
  • 16,448
  • 25
  • 85
  • 133

1 Answers1

1

This answer may be helpful.

While your prediction series is in a pandas Series (and is therefore a numpy ndarray), the shape (3,) indicates that it only has one index (i.e. you access its values with a single index, as in ts[0]). If you reshape your data with ts.reshape(3, 1) you create a second index (which will always be 0) so that values can be accessed in the manner ts[0, 0].

I haven't dug into this behavior enough to understand its rationale, but I've seen it also in dealing with dependency/related modules such as scipy and scikit-learn.

cmaher
  • 4,454
  • 1
  • 18
  • 33
  • Thanks. How is this related on the Sarimax structure in statsmodels? Why does it expect (n,1) for a pandas Series? – Ivan Jul 07 '17 at 20:02
  • It is not unusual to enforce two well-defined dimensions for numpy arrays that describe datasets with one row, even though one dimension information is redundand. This is to ensure that you have indeed a number of samples with one value each instead of a single sample with a number of values. – Neuneck Jan 24 '18 at 10:45