Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

R in Action, Second Edition.pdf

Скачиваний:

546

Добавлен:

26.03.2016

Размер:

20.33 Mб

Скачать

☆

<<< < Предыдущая 100 101 102 103 104 105 106 107 108 109 110 111112 / 173112 113 114 115 116 117 118 119 120 121 122 123 124 > Следующая >>>

ARIMA forecasting models

359

Johnson & Johnson Forecasts

	25
(Dollars)	20
Earnings	15
Quarterly	10
	5
	0
	1960	1965	1970	1975	1980
			Time

Figure 15.10 Multiplicative exponential smoothing forecast with trend and seasonal components. The forecasts are a dashed line, and the 80% and 95% confidence intervals are provided in light and dark gray, respectively.

Because no model is specified, the software performs a search over a wide array of models to find one that minimizes the fit criterion (log-likelihood by default). The selected model is one that has multiplicative trend, seasonal, and error components. The plot, along with forecasts for the next eight quarters (the default in this case), is given in figure 15.10. The flty parameter sets the line type for the forecast line (dashed in this case).

As stated earlier, exponential time-series modeling is popular because it can give good short-term forecasts in many situations. A second approach that is also popular is the Box-Jenkins methodology, commonly referred to as ARIMA models. These are described in the next section.

15.4 ARIMA forecasting models

In the autoregressive integrated moving average (ARIMA) approach to forecasting, predicted values are a linear function of recent actual values and recent errors of prediction (residuals). ARIMA is a complex approach to forecasting. In this section, we’ll limit discussion to ARIMA models for non-seasonal time series.

Before describing ARIMA models, a number of terms need to be defined, including lags, autocorrelation, partial autocorrelation, differencing, and stationarity. Each is considered in the next section.

15.4.1Prerequisite concepts

When you lag a time series, you shift it back by a given number of observations. Con-

sider the first few observations from the Nile time series, displayed in table 15.5. Lag 0

360	CHAPTER 15 Time series

is the unshifted time series. Lag 1 is the time series shifted one position to the left. Lag 2 shifts the time series two positions to the left, and so on. Time series can be lagged using the function lag(ts,k), where ts is the time series and k is the number of lags.

Table 15.5 The Nile time series at various lags

Lag	1869	1870	1871	1872	1873	1874	1875	…

0			1120	1160	963	1210	1160	…
1		1120	1160	963	1210	1160	1160	…
2	1120	1160	963	1210	1160	1160	813	…

Autocorrelation measures the way observations in a time series relate to each other. ACk is the correlation between a set of observations (Yt) and observations k periods earlier (Yt-k). So AC1 is the correlation between the Lag 1 and Lag 0 time series, AC2 is the correlation between the Lag 2 and Lag 0 time series, and so on. Plotting these correlations (AC1, AC2, …, ACk) produces an autocorrelation function (ACF) plot. The ACF plot is used to select appropriate parameters for the ARIMA model and to assess the fit of the final model.

An ACF plot can be produced with the acf() function in the stats package or the Acf() function in the forecast package. Here, the Acf() function is used because it produces a plot that is somewhat easier to read. The format is Acf(ts), where ts is the original time series. The ACF plot for the Nile time series, with k=1 to 18, is provided a little later, in the top half of figure 15.12.

A partial autocorrelation is the correlation between Yt and Yt-k with the effects of all Y values between the two (Yt-1, Yt-2, …, Yt-k+1) removed. Partial autocorrelations can also be plotted for multiple values of k. The PACF plot can be generated with either the pacf() function in the stats package or the Pacf() function in the forecast package. Again, the Pacf() function is preferred due to its formatting. The function call is Pacf(ts), where ts is the time series to be assessed. The PACF plot is also used to determine the most appropriate parameters for the ARIMA model. The results for the Nile time series are given in the bottom half of figure 15.12.

ARIMA models are designed to fit stationary time series (or time series that can be made stationary). In a stationary time series, the statistical properties of the series don’t change over time. For example, the mean and variance of Yt are constant. Additionally, the autocorrelations for any lag k don’t change with time.

It may be necessary to transform the values of a time series in order to achieve constant variance before proceeding to fitting an ARIMA model. The log transformation is often useful here, as you saw in section 15.1.3. Other transformations, such as the BoxCox transformation described in section 8.5.2, may also be helpful.

Because stationary time series are assumed to have constant means, they can’t have a trend component. Many non-stationary time series can be made stationary through

ARIMA forecasting models

361

differencing. In differencing, each value of a time series Yt is replaced with Yt-1 – Yt. Differencing a time series once removes a linear trend. Differencing it a second time removes a quadratic trend. A third time removes a cubic trend. It’s rarely necessary to difference more than twice.

You can difference a time series with the diff() function. The format is diff(ts, differences=d), where d indicates the number of times the time series ts is differenced. The default is d=1. The ndiffs() function in the forecast package can be used to help determine the best value of d. The format is ndiffs(ts).

Stationarity is often evaluated with a visual inspection of a time-series plot. If the variance isn’t constant, the data are transformed. If there are trends, the data are differenced. You can also use a statistical procedure called the Augmented Dickey-Fuller (ADF) test to evaluate the assumption of stationarity. In R, the function adf.test() in the tseries package performs the test. The format is adf.test(ts), where ts is the time series to be evaluated. A significant result suggests stationarity.

To summarize, ACF and PCF plots are used to determine the parameters of ARIMA models. Stationarity is an important assumption, and transformations and differencing are used to help achieve stationarity. With these concepts in hand, we can now turn to fitting models with an autoregressive (AR) component, a moving averages (MA) component, or both components (ARMA). Finally, we’ll examine ARIMA models that include ARMA components and differencing to achieve stationarity (Integration).

15.4.2ARMA and ARIMA models

In an autoregressive model of order p, each value in a time series is predicted from a linear combination of the previous p values

AR(p):Yt = μ + β1Yt−1 + β2Yt−2 + ... + βpYt−p + εt

where Yt is a given value of the series, µ is the mean of the series, the βs are the weights, and εt is the irregular component. In a moving average model of order q, each value in the time series is predicted from a linear combination of q previous errors. In

this case

MA(q):Yt = μ −θ1εt−1 − θ2εt−2 ... − θqεt−q + εt

where the εs are the errors of prediction and the θ s are the weights. (It’s important to note that the moving averages described here aren’t the simple moving averages described in section 15.1.2.)

Combining the two approaches yields an ARMA(p, q) model of the form

Yt = μ + β1Yt−1 + β2Yt−2 + ... + βpYt−p − θ1εt−1 − θ2εt−2 ... − θqεt−q + εt

that predicts each value of the time series from the past p values and q residuals.

An ARIMA(p, d, q) model is a model in which the time series has been differenced d times, and the resulting values are predicted from the previous p actual values and q

362	CHAPTER 15 Time series

previous errors. The predictions are “un-differenced” or integrated to achieve the final prediction.

The steps in ARIMA modeling are as follows:

1Ensure that the time series is stationary.

2Identify a reasonable model or models (possible values of p and q).

3Fit the model.

4Evaluate the model’s fit, including statistical assumptions and predictive accuracy.

5Make forecasts.

Let’s apply each step in turn to fit an ARIMA model to the Nile time series.

ENSURING THAT THE TIME SERIES IS STATIONARY

First you plot the time series and assess its stationarity (see listing 15.7 and the top half of figure 15.11). The variance appears to be stable across the years observed, so there’s no need for a transformation. There may be a trend, which is supported by the results of the ndiffs() function.

	1400
Nile	1000
	800
	600
	1880	1900	1920	1940	1960
	400
	200
diff(Nile)	0
	−200
	−400
	1880	1900	1920	1940	1960
			Time

Figure 15.11 Time series displaying the annual flow of the river Nile at Ashwan from 1871 to 1970 (top) along with the times series differenced once (bottom). The differencing removes the decreasing trend evident in the original plot.

ARIMA forecasting models

363

Listing 15.7 Transforming the time series and assessing stationarity

>library(forecast)

>library(tseries)

>plot(Nile)

>ndiffs(Nile)

[1] 1

>dNile <- diff(Nile)

>plot(dNile)

>adf.test(dNile)

Augmented Dickey-Fuller Test

data: dNile

Dickey-Fuller = -6.5924, Lag order = 4, p-value = 0.01 alternative hypothesis: stationary

The series is differenced once (lag=1 is the default) and saved as dNile. The differenced time series is plotted in the bottom half of figure 15.11 and certainly looks more stationary. Applying the ADF test to the differenced series suggest that it’s now stationary, so you can proceed to the next step.

IDENTIFYING ONE OR MORE REASONABLE MODELS

Possible models are selected based on the ACF and PACF plots:

Acf(dNile)

Pacf(dNile)

The resulting plots are given in figure 15.12.

Figure 15.12 Autocorrelation and partial autocorrelation plots for the differenced Nile time series

0.2

ACF	0.0




	−0.2




	−0.4
	−0.4
	1	2	3	4	5	6	7	8	9	10	12	14	16	18
Partial ACF	0.2



	−0.2 0.0





	−0.4
	−0.4
	1	2	3	4	5	6	7	8	9	10	12	14	16	18

Lag

364	CHAPTER 15 Time series

The goal is to identify the parameters p, d, and q. You already know that d=1 from the previous section. You get p and q by comparing the ACF and PACF plots with the guidelines given in table 15.6.

Table 15.6 Guidelines for selecting an ARIMA model

Model	ACF	PACF

ARIMA(p, d, 0)	Trails off to zero	Zero after lag p
ARIMA(0, d, q)	Zero after lag q	Trails off to zero
ARIMA(p, d, q)	Trails off to zero	Trails off to zero

The results in table 15.6 are theoretical, and the actual ACF and PACF may not match this exactly. But they can be used to give a rough guide of reasonable models to try. For the Nile time series in figure 15.12, there appears to be one large autocorrelation at lag 1, and the partial autocorrelations trail off to zero as the lags get bigger. This suggests trying an ARIMA(0, 1, 1) model.

FITTING THE MODEL(S)

The ARIMA model is fit with the arima() function. The format is arima(ts, order=c(q, d, q)). The result of fitting an ARIMA(0, 1, 1) model to the Nile time series is given in the following listing.

Listing 15.8 Fitting an ARIMA model

>library(forecast)

>fit <- arima(Nile, order=c(0,1,1))

>fit

Series: Nile

ARIMA(0,1,1)

Coefficients:

ma1 -0.7329 s.e. 0.1143

sigma^2 estimated as		20600:		log likelihood=-632.55
AIC=1269.09	AICc=1269.22			BIC=1274.28
> accuracy(fit)
	ME	RMSE		MAE	MPE	MAPE	MASE
Training set -11.94		142.8	112.2		-3.575	12.94	0.8089

Note that you apply the model to the original time series. By specifying d=1, it calculates first differences for you. The coefficient for the moving averages (-0.73) is provided along with the AIC. If you fit other models, the AIC can help you choose which one is most reasonable. Smaller AIC values suggest better models. The accuracy

ARIMA forecasting models

365

measures can help you determine whether the model fits with sufficient accuracy. Here the mean absolute percent error is 13% of the river level.

EVALUATING MODEL FIT

If the model is appropriate, the residuals should be normally distributed with mean zero, and the autocorrelations should be zero for every possible lag. In other words, the residuals should be normally and independently distributed (no relationship between them). The assumptions can be evaluated with the following code.

Listing 15.9 Evaluating the model fit

>qqnorm(fit$residuals)

>qqline(fit$residuals)

>Box.test(fit$residuals, type="Ljung-Box")

Box-Ljung test

data: fit$residuals

X-squared = 1.3711, df = 1, p-value = 0.2416

The qqnorm() and qqline() functions produce the plot in figure 15.13. Normally distributed data should fall along the line. In this case, the results look good.

The Box.test() function provides a test that the autocorrelations are all zero. The results aren’t significant, suggesting that the autocorrelations don’t differ from zero. This ARIMA model appears to fit the data well.

MAKING FORECASTS

If the model hadn’t met the assumptions of normal residuals and zero autocorrelations, it would have been necessary to alter the model, add parameters, or try a different approach. Once a final model has been chosen, it can be used to make predictions of future values. In the next listing, the forecast() function from the forecast package is used to predict three years ahead.

Normal Q−Q Plot

Quantiles	200
Quantiles	0
Sample	−200
	−400

−2

−1

Theoretical Quantiles

Figure 15.13 Normal Q-Q plot for determining the normality of the time-series residuals

<<< < Предыдущая 100 101 102 103 104 105 106 107 108 109 110 111112 / 173112 113 114 115 116 117 118 119 120 121 122 123 124 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
05.08.2019741.83 Кб2psihologia.rtf
#
02.06.2015162.69 Кб86Psyh_final_ver.docx
#
02.06.2015141.74 Кб48Psyh_final_ver.docx
#
26.03.2016226.3 Кб23public_corporation.doc
#
26.03.2016451.53 Кб7pud_finansovyy-menedjment_318476.pdf
#
26.03.201620.33 Mб546R in Action, Second Edition.pdf
#
26.03.2016296.21 Кб19Radaev_Kak_napisat_akademicheskiy_text.pdf
#
26.03.20163.76 Mб7Raeff_Modernity.pdf
#
26.03.20162.12 Mб22raigorodskii_d_ya_hrestomatiya_psihologiya_lich.pdf
#
02.06.2015494.59 Кб6raschet_SRK_smorodin.doc
#
01.05.2025173.06 Кб0raschet_zdaniy.doc