Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
R in Action, Second Edition.pdf
Скачиваний:
540
Добавлен:
26.03.2016
Размер:
20.33 Mб
Скачать

ARIMA forecasting models

359

Johnson & Johnson Forecasts

 

25

 

 

 

 

(Dollars)

20

 

 

 

 

Earnings

15

 

 

 

 

Quarterly

10

 

 

 

 

 

5

 

 

 

 

 

0

 

 

 

 

 

1960

1965

1970

1975

1980

 

 

 

Time

 

 

Figure 15.10 Multiplicative exponential smoothing forecast with trend and seasonal components. The forecasts are a dashed line, and the 80% and 95% confidence intervals are provided in light and dark gray, respectively.

Because no model is specified, the software performs a search over a wide array of models to find one that minimizes the fit criterion (log-likelihood by default). The selected model is one that has multiplicative trend, seasonal, and error components. The plot, along with forecasts for the next eight quarters (the default in this case), is given in figure 15.10. The flty parameter sets the line type for the forecast line (dashed in this case).

As stated earlier, exponential time-series modeling is popular because it can give good short-term forecasts in many situations. A second approach that is also popular is the Box-Jenkins methodology, commonly referred to as ARIMA models. These are described in the next section.

15.4 ARIMA forecasting models

In the autoregressive integrated moving average (ARIMA) approach to forecasting, predicted values are a linear function of recent actual values and recent errors of prediction (residuals). ARIMA is a complex approach to forecasting. In this section, we’ll limit discussion to ARIMA models for non-seasonal time series.

Before describing ARIMA models, a number of terms need to be defined, including lags, autocorrelation, partial autocorrelation, differencing, and stationarity. Each is considered in the next section.

15.4.1Prerequisite concepts

When you lag a time series, you shift it back by a given number of observations. Con-

sider the first few observations from the Nile time series, displayed in table 15.5. Lag 0

360

CHAPTER 15 Time series

is the unshifted time series. Lag 1 is the time series shifted one position to the left. Lag 2 shifts the time series two positions to the left, and so on. Time series can be lagged using the function lag(ts,k), where ts is the time series and k is the number of lags.

Table 15.5 The Nile time series at various lags

Lag

1869

1870

1871

1872

1873

1874

1875

 

 

 

 

 

 

 

 

 

0

 

 

1120

1160

963

1210

1160

1

 

1120

1160

963

1210

1160

1160

2

1120

1160

963

1210

1160

1160

813

 

 

 

 

 

 

 

 

 

Autocorrelation measures the way observations in a time series relate to each other. ACk is the correlation between a set of observations (Yt) and observations k periods earlier (Yt-k). So AC1 is the correlation between the Lag 1 and Lag 0 time series, AC2 is the correlation between the Lag 2 and Lag 0 time series, and so on. Plotting these correlations (AC1, AC2, …, ACk) produces an autocorrelation function (ACF) plot. The ACF plot is used to select appropriate parameters for the ARIMA model and to assess the fit of the final model.

An ACF plot can be produced with the acf() function in the stats package or the Acf() function in the forecast package. Here, the Acf() function is used because it produces a plot that is somewhat easier to read. The format is Acf(ts), where ts is the original time series. The ACF plot for the Nile time series, with k=1 to 18, is provided a little later, in the top half of figure 15.12.

A partial autocorrelation is the correlation between Yt and Yt-k with the effects of all Y values between the two (Yt-1, Yt-2, …, Yt-k+1) removed. Partial autocorrelations can also be plotted for multiple values of k. The PACF plot can be generated with either the pacf() function in the stats package or the Pacf() function in the forecast package. Again, the Pacf() function is preferred due to its formatting. The function call is Pacf(ts), where ts is the time series to be assessed. The PACF plot is also used to determine the most appropriate parameters for the ARIMA model. The results for the Nile time series are given in the bottom half of figure 15.12.

ARIMA models are designed to fit stationary time series (or time series that can be made stationary). In a stationary time series, the statistical properties of the series don’t change over time. For example, the mean and variance of Yt are constant. Additionally, the autocorrelations for any lag k don’t change with time.

It may be necessary to transform the values of a time series in order to achieve constant variance before proceeding to fitting an ARIMA model. The log transformation is often useful here, as you saw in section 15.1.3. Other transformations, such as the BoxCox transformation described in section 8.5.2, may also be helpful.

Because stationary time series are assumed to have constant means, they can’t have a trend component. Many non-stationary time series can be made stationary through

ARIMA forecasting models

361

differencing. In differencing, each value of a time series Yt is replaced with Yt-1 – Yt. Differencing a time series once removes a linear trend. Differencing it a second time removes a quadratic trend. A third time removes a cubic trend. It’s rarely necessary to difference more than twice.

You can difference a time series with the diff() function. The format is diff(ts, differences=d), where d indicates the number of times the time series ts is differenced. The default is d=1. The ndiffs() function in the forecast package can be used to help determine the best value of d. The format is ndiffs(ts).

Stationarity is often evaluated with a visual inspection of a time-series plot. If the variance isn’t constant, the data are transformed. If there are trends, the data are differenced. You can also use a statistical procedure called the Augmented Dickey-Fuller (ADF) test to evaluate the assumption of stationarity. In R, the function adf.test() in the tseries package performs the test. The format is adf.test(ts), where ts is the time series to be evaluated. A significant result suggests stationarity.

To summarize, ACF and PCF plots are used to determine the parameters of ARIMA models. Stationarity is an important assumption, and transformations and differencing are used to help achieve stationarity. With these concepts in hand, we can now turn to fitting models with an autoregressive (AR) component, a moving averages (MA) component, or both components (ARMA). Finally, we’ll examine ARIMA models that include ARMA components and differencing to achieve stationarity (Integration).

15.4.2ARMA and ARIMA models

In an autoregressive model of order p, each value in a time series is predicted from a linear combination of the previous p values

AR(p):Yt = μ + β1Yt1 + β2Yt2 + ... + βpYtp + εt

where Yt is a given value of the series, µ is the mean of the series, the βs are the weights, and εt is the irregular component. In a moving average model of order q, each value in the time series is predicted from a linear combination of q previous errors. In

this case

MA(q):Yt = μ θ1εt1 θ2εt2 ... θqεtq + εt

where the εs are the errors of prediction and the θ s are the weights. (It’s important to note that the moving averages described here aren’t the simple moving averages described in section 15.1.2.)

Combining the two approaches yields an ARMA(p, q) model of the form

Yt = μ + β1Yt1 + β2Yt2 + ... + βpYtp θ1εt1 θ2εt2 ... θqεtq + εt

that predicts each value of the time series from the past p values and q residuals.

An ARIMA(p, d, q) model is a model in which the time series has been differenced d times, and the resulting values are predicted from the previous p actual values and q

362

CHAPTER 15 Time series

previous errors. The predictions are “un-differenced” or integrated to achieve the final prediction.

The steps in ARIMA modeling are as follows:

1Ensure that the time series is stationary.

2Identify a reasonable model or models (possible values of p and q).

3Fit the model.

4Evaluate the model’s fit, including statistical assumptions and predictive accuracy.

5Make forecasts.

Let’s apply each step in turn to fit an ARIMA model to the Nile time series.

ENSURING THAT THE TIME SERIES IS STATIONARY

First you plot the time series and assess its stationarity (see listing 15.7 and the top half of figure 15.11). The variance appears to be stable across the years observed, so there’s no need for a transformation. There may be a trend, which is supported by the results of the ndiffs() function.

 

1400

 

 

 

 

Nile

1000

 

 

 

 

 

800

 

 

 

 

 

600

 

 

 

 

 

1880

1900

1920

1940

1960

 

400

 

 

 

 

 

200

 

 

 

 

diff(Nile)

0

 

 

 

 

 

−200

 

 

 

 

 

−400

 

 

 

 

 

1880

1900

1920

1940

1960

 

 

 

Time

 

 

Figure 15.11 Time series displaying the annual flow of the river Nile at Ashwan from 1871 to 1970 (top) along with the times series differenced once (bottom). The differencing removes the decreasing trend evident in the original plot.

ARIMA forecasting models

363

Listing 15.7 Transforming the time series and assessing stationarity

>library(forecast)

>library(tseries)

>plot(Nile)

>ndiffs(Nile)

[1] 1

>dNile <- diff(Nile)

>plot(dNile)

>adf.test(dNile)

Augmented Dickey-Fuller Test

data: dNile

Dickey-Fuller = -6.5924, Lag order = 4, p-value = 0.01 alternative hypothesis: stationary

The series is differenced once (lag=1 is the default) and saved as dNile. The differenced time series is plotted in the bottom half of figure 15.11 and certainly looks more stationary. Applying the ADF test to the differenced series suggest that it’s now stationary, so you can proceed to the next step.

IDENTIFYING ONE OR MORE REASONABLE MODELS

Possible models are selected based on the ACF and PACF plots:

Acf(dNile)

Pacf(dNile)

The resulting plots are given in figure 15.12.

Figure 15.12 Autocorrelation and partial autocorrelation plots for the differenced Nile time series

0.2

ACF

0.0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

−0.2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

−0.4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

2

3

4

5

6

7

8

9

10

12

14

16

18

 

 

Partial ACF

0.2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

−0.2 0.0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

−0.4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

2

3

4

5

6

7

8

9

10

12

14

16

18

 

 

Lag

364

CHAPTER 15 Time series

The goal is to identify the parameters p, d, and q. You already know that d=1 from the previous section. You get p and q by comparing the ACF and PACF plots with the guidelines given in table 15.6.

Table 15.6 Guidelines for selecting an ARIMA model

Model

ACF

PACF

 

 

 

ARIMA(p, d, 0)

Trails off to zero

Zero after lag p

ARIMA(0, d, q)

Zero after lag q

Trails off to zero

ARIMA(p, d, q)

Trails off to zero

Trails off to zero

 

 

 

The results in table 15.6 are theoretical, and the actual ACF and PACF may not match this exactly. But they can be used to give a rough guide of reasonable models to try. For the Nile time series in figure 15.12, there appears to be one large autocorrelation at lag 1, and the partial autocorrelations trail off to zero as the lags get bigger. This suggests trying an ARIMA(0, 1, 1) model.

FITTING THE MODEL(S)

The ARIMA model is fit with the arima() function. The format is arima(ts, order=c(q, d, q)). The result of fitting an ARIMA(0, 1, 1) model to the Nile time series is given in the following listing.

Listing 15.8 Fitting an ARIMA model

>library(forecast)

>fit <- arima(Nile, order=c(0,1,1))

>fit

Series: Nile

ARIMA(0,1,1)

Coefficients:

ma1 -0.7329 s.e. 0.1143

sigma^2 estimated as

20600:

log likelihood=-632.55

AIC=1269.09

AICc=1269.22

 

BIC=1274.28

 

> accuracy(fit)

 

 

 

 

 

 

 

ME

RMSE

 

MAE

MPE

MAPE

MASE

Training set -11.94

142.8

112.2

-3.575

12.94

0.8089

Note that you apply the model to the original time series. By specifying d=1, it calculates first differences for you. The coefficient for the moving averages (-0.73) is provided along with the AIC. If you fit other models, the AIC can help you choose which one is most reasonable. Smaller AIC values suggest better models. The accuracy

ARIMA forecasting models

365

measures can help you determine whether the model fits with sufficient accuracy. Here the mean absolute percent error is 13% of the river level.

EVALUATING MODEL FIT

If the model is appropriate, the residuals should be normally distributed with mean zero, and the autocorrelations should be zero for every possible lag. In other words, the residuals should be normally and independently distributed (no relationship between them). The assumptions can be evaluated with the following code.

Listing 15.9 Evaluating the model fit

>qqnorm(fit$residuals)

>qqline(fit$residuals)

>Box.test(fit$residuals, type="Ljung-Box")

Box-Ljung test

data: fit$residuals

X-squared = 1.3711, df = 1, p-value = 0.2416

The qqnorm() and qqline() functions produce the plot in figure 15.13. Normally distributed data should fall along the line. In this case, the results look good.

The Box.test() function provides a test that the autocorrelations are all zero. The results aren’t significant, suggesting that the autocorrelations don’t differ from zero. This ARIMA model appears to fit the data well.

MAKING FORECASTS

If the model hadn’t met the assumptions of normal residuals and zero autocorrelations, it would have been necessary to alter the model, add parameters, or try a different approach. Once a final model has been chosen, it can be used to make predictions of future values. In the next listing, the forecast() function from the forecast package is used to predict three years ahead.

Normal Q−Q Plot

Quantiles

200

0

Sample

−200

 

−400

−2

−1

0

1

2

Theoretical Quantiles

Figure 15.13 Normal Q-Q plot for determining the normality of the time-series residuals

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]