Differencing in the ARIMA formula for the Microsoft Time Series algorithm

Differencing in the ARIMA formula for the Microsoft Time Series algorithm

Recently a customer discovered an issue with the ARIMA formula used in the Microsoft Time Series algorithm for data mining. She was trying to back-test the results by computing the formula and couldn't get the results to match. This can be maddening, especially when the formulas are so complex, but it also exposed a problem with the documentation. We don't really describe differencing and don't provide enough of the actual formulas used. We hope to gradually rectify that for all model types, but in the meantime here is a proposed addition to the time series technical reference that explains differencing. Suggestions and corrections are welcome! (The introduction has also been updated slightly to better explain how the two algorithms fit together.)



Many thanks to Yimin Wu for patiently explaining the implementation; any errors in describing it are mine.



{updated intro}



Implementation of the Microsoft Time Series Algorithm



Microsoft Research developed the original ARTXP algorithm that was used in SQL Server 2005, basing the implementation on the Microsoft Decision Trees algorithm. Therefore, the ARTXP algorithm can be described as an autoregressive tree model for representing periodic time series data. This algorithm relates a variable number of past items to each current item that is being predicted. The name ARTXP derives from the fact that the autoregressive tree method (an ART algorithm) is applied to multiple unknown prior states. For a detailed explanation of the ARTXP algorithm, see Autoregressive Tree Models for Time-Series Analysis, (http://go.microsoft.com/fwlink/?LinkId=45966).



The ARIMA algorithm was added to the Microsoft Time Series algorithm in SQL Server 2008 to improve long-term prediction. It is an implementation of the process for computing autoregressive integrated moving averages that was described by Box and Jenkins. The ARIMA methodology makes it possible to determine dependencies in observations taken sequentially in time, and can incorporate random shocks as part of the model. The ARIMA method also supports multiplicative seasonality. Readers who want to learn more about the ARIMA algorithm are encouraged to read the seminal work by Box and Jenkins; this section is intended to provide specific details about how the ARIMA methodology has been implemented in the Microsoft Time Series algorithm.

By default, the Microsoft Time Series algorithm uses both methods, ARTXP and ARIMA, and blends the results to improve prediction accuracy. If you want to use only a specific method, you can set the algorithm parameters to use only ARTXP or only ARIMA, or to control how the results of the algorithms are combined. Note that the ARTXP algorithm supports cross-prediction, but the ARIMA algorithm does not. Therefore, cross-prediction is available only when you use a blend of algorithms, or when you configure the model to use only ARTXP.



{new section}

Understanding ARIMA Difference Order



This section introduces some terminology needed to understand the ARIMA model, and discusses the specific implementation of differencing in the Microsoft Time Series algorithm. For a full explanation of these terms and concepts, we recommend a review of Box and Jenkins.

  • A term is a component of a mathematical equation. For example, a term in a polynomial equation might include a combination of variables and constants.

  • The ARIMA formula that is included in the Microsoft Time Series algorithm uses both autoregressive and moving average terms.
  • Time series models can be stationary or nonstationary. Stationary models are those that revert to a mean, though they might have cycles, whereas nonstationary models do not have a focus of equilibrium and are subject to greater variance or change introduced by shocks, or external variables.
  • The goal of differencing is to make a time series stabilize and become stationary.
  • The  order of difference represents the number of times that the difference between values is taken for a time series.
The Microsoft Time Series algorithm works by taking values in a data series and attempting to fit the data to a pattern. If the data series is are not already stationary, the algorithm applies an order of difference. Each increase in the order of difference tends to make the time series more stationary.

For example, if you have the time series (z1, z2, …, zn)  and perform calculations using one order of difference, you obtain a new series (y1, y2,…., yn-1), where yi = zi+1-zi. When the difference order is 2, the algorithm generates another series (x1, x2, …, xn-2), based on the y series that was derived from the first order equation. The correct amount of differencing depends on the data. A single order of differencing is most common in models that show a constant trend; a second order of differencing can indicate a trend that varies with time.

By default, the order of difference used in the Microsoft Time Series algorithm is -1, meaning that the algorithm will automatically detect the best value for the difference order. Typically, that best value is 1 (when differencing is required), but under certain circumstances the algorithm will increase that value to a maximum of 2.

The Microsoft Time Series algorithm determines the optimal ARIMA difference order by using the autoregression values. The algorithm examines the AR values and sets a hidden parameter, ARIMA_AR_ORDER, representing the order of the AR terms. This hidden parameter, ARIMA_AR_ORDER, has a range of values from -1 to 8. At the default value of -1, the algorithm will automatically select the appropriate difference order.

Whenever the value of ARIMA_AR_ORDER is greater than 1, the algorithm multiplies the time series by a polynomial term. If one term of the polynomial formula resolves to a root of 1 or close to 1, the algorithm attempts to preserve the stability of the model by removing the term and increasing the difference order by 1. If the difference order is already at the maximum, the term is removed and the difference order does not change.

For example, if the value of AR = 2,   the resulting AR polynomial term might look like this:

1 –1.4B + .45B^2 = (1- .9B) (1- 0.5B)

Note the term (1- .9B) which has a root of about 0.9. The algorithm eliminates this term from the polynomial formula but cannot increase the difference order by one because it is already at the maximum value of 2.

It is important to note that the only way that you can force a change in difference order is to use the unsupported parameter, ARIMA_DIFFERENCE_ORDER. This hidden parameter controls how many times the algorithm performs differencing on the time series, and can be set by typing a custom algorithm parameter. However, we do not recommend that you change this value unless you are prepared to experiment and are familiar with the calculations involved. Also note that there is currently no mechanism, including hidden parameters, to let you control the threshold at which the increase in difference order is triggered.

Finally, note that the formula described above is the simplified case, with no seasonality hints. If seasonality hints are provided, then a separate AR polynomial term is added to the left of the equation for each seasonality hint, and the same strategy is applied to eliminate terms that might destabilize the differenced series.

Sort by: Published Date | Most Recent | Most Useful
Comments
  • I have some real technical issues with these statements.....I'm not sure if you can change your approach, but here are some thoughts....

    Identification of differencing orders is being done on the premise that 1) there are no pulses, level shifts, seasonal pulses and/or local time trends AND that one model is adequate for the entire time range (constancy of parameters assumption) AND that the variance of the error process is homogeneous (constancy of variance assumption) And that the underlying ARIMA process is white noise. For example a series that has a level shift will exhibit an ACF that suggests non-stationarity BUT the remedy is not to difference.

    You say "The Microsoft Time Series algorithm works by taking values in a data series and attempting to fit the data to a pattern. If the data series is are not already stationary, the algorithm applies an order of difference. Each increase in the order of difference tends to make the time series more stationary."

    I think this is not robust enough. Over-differencing can induce non-stationarity se the Slutzky effect en.wikipedia.org/.../Slutsky_equation

    In general, Over-differencing yields non-invertible MA structure often suggesting equation simplification is in order.

    Tom Reilly

    www.autobox.com

  • I have some real technical issues with these statements.....I'm not sure if you can change your approach, but here are some thoughts....

    Identification of differencing orders is being done on the premise that 1) there are no pulses, level shifts, seasonal pulses and/or local time trends AND that one model is adequate for the entire time range (constancy of parameters assumption) AND that the variance of the error process is homogeneous (constancy of variance assumption) And that the underlying ARIMA process is white noise. For example a series that has a level shift will exhibit an ACF that suggests non-stationarity BUT the remedy is not to difference.

    You say "The Microsoft Time Series algorithm works by taking values in a data series and attempting to fit the data to a pattern. If the data series is are not already stationary, the algorithm applies an order of difference. Each increase in the order of difference tends to make the time series more stationary."

    I think this is not robust enough. Over-differencing can induce non-stationarity se the Slutzky effect en.wikipedia.org/.../Slutsky_equation

    In general, Over-differencing yields non-invertible MA structure often suggesting equation simplification is in order.

    Tom Reilly

    www.autobox.com

Page 1 of 1 (2 items)