This repository is intended to be a recopilaton of different techniques and models that you can perform while forecasting an univariate time series using both Python libraries and R packages together, the section D Theory
is intented to be a compilation of the theory behind forecasting. From autoregressive models (Simple Exponential Smoothing
, Holt
, Holt-Winters
or Arima
), ensemble trees (Ada Boost
, Gradient Boosting
, Random Forest
, XGBoost
) and neural networks (LSTM, CNN). Additionaly, you can perform outlier detection, interpolation and structural change tests based on R packages like tsoutliers
and strucchange
. There is the option to implement the tasks in parallel, check section F Parallel Computation
. Finally, for setting up a virtual environment with R and Python checkout the setting up instrucions under the setup
folder.
NOTE. To visualize the TeX code, activate the MathJax plugin for Github.
A. Components of a Time Series
A Time Series has three basic components, which are helpful to understand to identify appropiately a forecasting method that is capable of capturing the patterns of the time series data.
-
Trend. They are up or down changes (steep upward slope, plateauing downward slope).
-
Seasonality. The effect on the time series by the season (measured by time).
-
Noise. Is the random variation in the series and it is composed of:
- White Noise. If the variables are independent and identically distributed with a mean of zero. This means that all variables have the same variance (
$\sigma^2$ ) and each value has a zero correlation with all other values in the series. See [6] for more details. In other words, the series shows no autocorrelation. - Random Walk. A random walk is another time series model where the current observation is equal to the previous observation with a random step up or down. Checkout [7].
- White Noise. If the variables are independent and identically distributed with a mean of zero. This means that all variables have the same variance (
-
Cycles. It happens when the time series exhibits rises and fall that aren't of fixed frequency. It is not important not to confuse this concept with seasonality. When the frequency is unchanging and associated with some calendar date then there is seasonality. On the other hand, the fluctuations are cyclic when there are not of a fixed frequency.
B. Methods to decompose a Time Series
A Time series additive
)
or as a multiplication (multiplicative
)
where seasonal component
, trend component
and remainder component
at time
B.1 Classical Decomposition
One strong assumption of this method is that the seasonal component is constant from year to year. The procedure for Additive Decomposition is
- Compute the Trend-Cycle component
$\hat{T_{t}}$ with a moving average MA process. If the period$m$ is an odd number use an$m-MA$ other wise an$2\times m-MA$ .- Detrend the series
$y_{t} - \hat{T_{t}}$ .- Estimate the seasonal component
$\hat{S_{t}}$ for each season by averaging the detrended values for each season.- Calculate the remainder component
$\hat{R_{t}} = y_{t} - \hat{T_{t}} - \hat{S_{t}}.$
For the Multiplicative decomposition the steps are similar, but now consider
in step 2 and
to estimate the remainder. As we said before, the strong assumption of this method (seasonal components repeats every year) it is not reasonable for larger time series because the behavior of the data could change. For example, consider the increasing consumption of mobile devices.
B.1 STL Decomposition
STL stands for Seasonal and Trend decomposition using Loess. There are two main advantages of this method it can handle any type of seasonality and it can be robust to outliers. In other words, unusual observations will not affect the trend-cycle and seasonal component estimation (the remainder component is indeed affected).
To choose between a multiplicative
or additive
decomposition one should assess the
One can implement it in R with the stl
function of the stats
library or from the statsmodels
package in Python.
C. Autocorrelation
Autocorrelation measure the linear relationship between lagged values in a time series. The autocorrelation coefficient is given by the formula
in other words
These coefficients are plot to show the autocorrelation function or ACF
.
D. Interpreting the ACF and PACF plot
The ACF plot allow us to identify trend, seasonality or a mixture of the both in the time series.
When data have trend, the autocorrelation for the first lags is large and positive (the nearer the data in time the similar they'll be in size/value) and it slowly decrease as the lag increase. By contrast, when the data is seasonal we will see larger values appear every certain lag, that is, the autocorrelation is largeer at multiples of the seasonal frequency than other lag values. Finally, when data is both trended and seasonal, it is likely to see a combination of these effects. The above image, taken from [9], shows an example of this behavior
The partial autocorrelation is used to measure the relationship between an actual observation
D.1 Choosing an ARIMA model based on the ACF and PACF
The ACF and PACF are useful to determine the
-
The
$ACF$ is exponentially decaying or sinusoidal -
There is a significant spike at lag
$p$ in the$PACF$ , but none beyond lag$p$ .
and an
-
The
$PACF$ is exponentially decaying or sinuosoidal, -
There is a significant spike at lag
$q$ in the$ACF$ , but none beyond lag$q$ .
Note that when
E. Statistical tests for autocorrelation
We would like to test the different Portmanteau test
setting the null hypothesis as following
One such test is the Ljung-Box test
where Breusch-Godfrey
test, both of them are implemented in the function checkresiduals()
of the forecast
R package.
F. Why it is important to difference a Time Series ?
A Time Series is stationary if it's properties do not depend on time, that is, it's properties do not change in time. But, what are the properties of a time series? Basically we could consider two main components: it's mean
Now, how could one control the
Regarding the mean, think about what is happening when you have an upward/downward trend? The value at a future time
where ndiffs
and nsdiffs
let you determine the number of first differences and seasonal differences to apply respectively. ndiffs
work by running a sequence of KPSS
tests until the series is stationary.
G. Unit Root tests
When dealing with time series a common task is to determine the form of the trend in the data. In this way, Unit Root tests
are used to determine how to deal with treading data. In other words, they determine if data should be first differencing
For example the ADF tests allow us to determine the order or integration
$k$ in a process$I(k)$ by testing the null hypothesis$H_{0}$ of non-stationarity until the data is stationarity. That is, if we reject the null we first difference the time series and perform the ADF until the data is stationary.
or regressed on deterministic functions of time to render the data stationary. To this end it is crucial to specify the null and alternative hypothesis appropriately to characterize the trend properties of the data.
G.1 Specifying the null and alternative hypothesis
The trend properties of the data under the alternative hypothesis determines the form of the test regression to perform. There are two common cases:
Constant Trend. This formulation is suitable for non-trending time series. Considering
and the corresponding hypothesis are
where
Constant and Time trend.
In this case, we will incorporate a deterministic time trend parameter to the test regression to capture the deterministic trend under the alternative.
The corresponding hypothesis test are
Thus, this formulation is appropriate for trending time series.
G.2 Augmented Dicker-Fuller Test
The prior methods consider a simple
where
G.3 Phillips-Perron test
The Phillips-Perron test differs from the ADF by ignoring the serial correlation in the errors and by assuming that they may be heteroskedastic. The test regression is
and compared to the ADF test, the error term
H. Stationary tests
We've seen that unit root test test the null hypothesis that the time series
First, we consider the model
where
and KPSS test
for testing
where
I. Model Diagnosis
The residual is the difference between the fitted value and the real value, in mathematical terms
To check wheter a model has captured the information adequately one should check that the residuals follow the next properties:
-
The residuals are uncorrelated
. When correlations is present it means that there is information left in the residuals which should be used in computing forecasts. -
The residuals have mean zero
. When the mean of the residuals is different from zero, the forecasts are biased. -
The residuals have constant variance
. -
The residuals are normally distributed
.
Thus, if the residuals of a model does not satisfy these properties it can be improved. For example, to fix the bias problem one just add the mean of the residuals to all the points forecasted.
F. Model Comparison
The Diebold-Mariano
test allow us to compare the forecast accuracy between two models. Consider
$$\epsilon_{t+h|t}^{1} = y_{t+h} - y^{1}{t+h|t} \ \epsilon{t+h|t}^{2} = y_{t+h} - y^{2}_{t+h|t}$$
respectively. To determine which models predicts better than the other we may test the null hypothesis of equal predictive accuracy
against the alternative
where forecast
package in R.
To set up a virtual environment with Jupyter Notebook and the required R packages and Python libraries, see the README.md
file under the folder setup
.
[1] Scikit-Learn Ensemble. URL: https://scikit-learn.org/stable/modules/ensemble.html
[2] XGBoost. URL:https://xgboost.readthedocs.io/en/latest/index.html
[3] Friedman, Jerome; Hastie, Trevor; Tibshirani, Robert. The elements of Statistical Learning. Springer, 2008.
[4] Hansen, Bruce. Advanced Time Series and Forecasting Lecture 5 Structural Breaks. The University of Wisconsin, 2012.
[5] Rousseeuyy, Peter; Leroy, Annick. Robust Regression and Outlier Detection. John Wiley & Sons, 1987.
[6] Stackoverflow. https://stats.stackexchange.com/questions/289349/why-do-we-study-the-noise-sequence-in-time-series-analysis
[7] Quantstart. https://www.quantstart.com/articles/White-Noise-and-Random-Walks-in-Time-Series-Analysis/
[8] Mane, Priyanka. Python Multiprocessing: Pool vs Process โ Comparative Analysis. URL: https://www.ellicium.com/python-multiprocessing-pool-process/
[9] Stats-Stackexchange. URL: https://stats.stackexchange.com/questions/263366/interpreting-seasonality-in-acf-and-pacf-plots