I have a very simple Sarimax model using statsmodels:

mdl = sm.tsa.statespace.SARIMAX(ts_monthly, exog=ts_exog, order=(3,1,0)).fit()

where ts_monthly and ts_exog are pandas series indexed by date:

date          vl_1    vl_2 
2016-01-01     10     12
2016-02-01     14      1
2016-03-01     98     33

ts_monthly = df.vl_1
ts_exog    = df.vl_2

The model fit works, but when I try to run a get_prediction, I get the following error:

ts = pd.Series([12,3,2], index=pd.date_range('2016-04-01', '2016-07-01', freq='M'))

mdl.get_prediction('2016-03-01', '2016-07-01', exog=ts, dynamic=False)

ValueError                                Traceback (most recent call last)
<ipython-input-135-c89e9e005a31> in <module>()
      6 print(mdl.summary())
      7 _ = mdl.plot_diagnostics()
----> 8 pred = mdl.get_prediction(start=start_date, end=end_date, exog=ts_exog, dynamic=False)
      9 pred_ci = pred.conf_int()

C:\Users\myuer\bin\anaconda3\lib\site-packages\statsmodels\tsa\statespace\sarimax.py in get_prediction(self, start, end, dynamic, exog, **kwargs)
   1901                                      ' appropriate shape. Required %s, got %s.'
   1902                                      % (str(required_exog_shape),
-> 1903                                         str(exog.shape)))
   1904                 exog = np.c_[self.model.data.orig_exog.T, exog.T].T

ValueError: Provided exogenous values are not of the appropriate shape. Required (3, 1), got (3,).

Any ideas of what kind of shape the prediction exogenous series must be?

This answer may be helpful.

While your prediction series is in a pandas Series (and is therefore a numpy ndarray), the shape (3,) indicates that it only has one index (i.e. you access its values with a single index, as in ts[0]). If you reshape your data with ts.reshape(3, 1) you create a second index (which will always be 0) so that values can be accessed in the manner ts[0, 0].

I haven't dug into this behavior enough to understand its rationale, but I've seen it also in dealing with dependency/related modules such as scipy and scikit-learn.

  • Thanks. How is this related on the Sarimax structure in statsmodels? Why does it expect (n,1) for a pandas Series? – Ivan Jul 07 '17 at 20:02
  • It is not unusual to enforce two well-defined dimensions for numpy arrays that describe datasets with one row, even though one dimension information is redundand. This is to ensure that you have indeed a number of samples with one value each instead of a single sample with a number of values. – Neuneck Jan 24 '18 at 10:45