Wooldridge_-_Introductory_Econometrics_2nd_Ed
.pdf
Part 3 |
Advanced Topics |
This looks like a linear trend model with the intercept y0. But the error, vt h, while having mean zero, has variance 2(t h). Therefore, if we use the linear trend y0 (t h) to forecast yt h at time t, the forecast error variance is 2(t h), as compared with 2h when we use h yt. The ratio of the forecast variances is (t h)/h, which can be big for large t. The bottom line is that we should not use a linear trend to forecast a random walk with drift. (Problem 18.17 asks you to compare forecasts from a cubic trend line and those from the simple random walk model for the general fertility rate in the United States.)
Deterministic trends can also produce poor forecasts if the trend parameters are estimated using old data and the process has a subsequent shift in the trend line. Sometimes, exogenous shocks—such as the oil crises of the 1970s—can change the trajectory of trending variables. If an old trend line is used to forecast far into the future, the forecasts can be way off. This problem can be mitigated by using the most recent data available to obtain the trend line parameters.
Nothing prevents us from combining trends with other models for forecasting. For example, we can add a linear trend to an AR(1) model, which can work well for forecasting series with linear trends but which are also stable AR processes around the trend.
It is also straightforward to forecast processes with deterministic seasonality (monthly or quarterly series). For example, the file BARIUM.RAW contains the monthly production of gasoline in the United States from 1978 through 1988. This series has no obvious trend, but it does have a strong seasonal pattern. (Gasoline production is higher in the summer months and in December.) In the simplest model, we would regress gas (measured in gallons) on eleven month dummies, say for February through December. Then, the forecast for any future month is simply the intercept plus the coefficient on the appropriate month dummy. (For January, the forecast is just the intercept in the regression.) We can also add lags of variables and time trends to allow for general series with seasonality.
Forecasting processes with unit roots also deserves special attention. Earlier, we obtained the expected value of a random walk conditional on information through time n. To forecast a random walk, with possible drift , h periods into the future at time n,
we use fˆn,h ˆh yn, where ˆ is the sample average of the yt up through t n. (If there is no drift, we set ˆ 0.) This approach imposes the unit root. An alternative
would be to estimate an AR(1) model for {yt} and to use the forecast formula (18.55). This approach does not impose a unit root, but if one is present, ˆ converges in probability to one as n gets large. Nevertheless, ˆ can be substantially different than one, especially if the sample size is not very large. The matter of which approach produces better out-of-sample forecasts is an empirical issue. If in the AR(1) model, is less than one, even slightly, the AR(1) model will tend to produce better long-run forecasts.
Generally, there are two approaches to producing forecasts for I(1) processes. The first is to impose a unit root. For a one-step-ahead forecast, we obtain a model to forecast the change in y, yt 1, given information up through time t. Then, because yt 1yt 1 yt, E(yt 1 It) E( yt 1 It) yt. Therefore, our forecast of yn 1 at time n is just
fˆn gˆn yn,
where gˆn is the forecast of yn 1 at time n. Typically, an AR model (which is necessarily stable) is used for yt, or a vector autoregression.
606
Chapter 18 Advanced Time Series Topics
This can be extended to multi-step-ahead forecasts by writing yn h as
yn h (yn h yn h 1) (yn h 1 yn h 2) … (yn 1 yn) yn,
or
yn h yn h yn h 1 … yn 1 yn.
Therefore, the forecast of yn h at time n is
fˆ |
gˆ |
gˆ |
… gˆ |
y |
, |
(18.63) |
n,h |
n,h |
n,h 1 |
n,1 |
n |
|
|
where gˆn,j is the forecast of yn j at time n. For example, we might model yt as a stable AR(1), obtain the multi-step-ahead forecasts from (18.55) (but with ˆ and ˆ
obtained from yt on yt 1, and yn replaced with yn), and then plug these into (18.63). The second approach to forecasting I(1) variables is to use a general AR or VAR model for {yt}. This does not impose the unit root. For example, if we use an AR(2)
model,
yt 1 yt 1 2 yt 2 ut, |
(18.64) |
then 1 2 1. If we plug in 1 1 2 and rearrange, we obtain yt2 yt 1 ut, which is a stable AR(1) model in the difference that takes us back to the first approach described earlier. Nothing prevents us from estimating (18.64) directly by OLS. One nice thing about this regression is that we can use the usual t statistic on ˆ2 to determine if yt 2 is significant. (This assumes that the homoskedasticity assumption holds; if not, we can use the heteroskedasticity-robust form.) We will not show this formally, but, intuitively, it follows by rewriting the equation as yt yt 12 yt 1 ut, where 1 2. Even if 1, 2 is minus the coefficient on a stationary, weakly dependent process { yt 1}. Because the regression results will be identical to (18.64), we can use it directly.
As an example, let us estimate an AR(2) model for the general fertility rate in FERTIL3.RAW, using the observations up through 1979. (In Exercise 18.17 you are asked to use this model for forecasting, which is why we save some observations at the end of the sample.)
gfˆr |
t |
(3.22) (1.272)gfr |
(.311)gfr |
t 2 |
|
||
gfˆr |
|
|
t 1 |
|
(18.65) |
||
t |
(2.92) 1(.120)gfr |
(.121)gfr |
t 2 |
||||
|
|
|
t 1 |
|
|
||
|
|
n 65, R |
2 |
¯2 |
.947. |
|
|
|
|
|
.949, R |
|
|
||
The t statistic on the second lag is about 2.57, which is statistically different from zero at about the 1% level. (The first lag also has a very significant t statistic, which has an approximate t distribution by the same reasoning used for ˆ2.) The R-squared, adjusted or not, is not especially informative as a goodness-of-fit measure because gfr apparently contains a unit root, and it makes little sense to ask how much of the variance in gfr we are explaining.
The coefficients on the two lags in (18.65) add up to .961, which is close to and not statistically different from one (as can be verified by applying the augmented Dickey-
607
Part 3 |
Advanced Topics |
Fuller test to the equation g frt g frt 1 g frt 1 ut). Even though we have not imposed the unit root restriction, we can still use (18.65) for forecasting, as we discussed earlier.
Before ending this section, we point out one potential improvement in forecasting in the context of vector autoregressive models with I(1) variables. Suppose {yt} and {zt} are each I(1) processes. One approach for obtaining forecasts of y is to estimate a bivariate autoregression in the variables yt and zt and then to use (18.63) to generate oneor multi-step-ahead forecasts; this is essentially the first approach we described earlier. However, if yt and zt are cointegrated, we have more stationary, stable variables in the information set that can be used in forecasting y: namely, lags of yt zt, where is the cointegrating parameter. A simple error correction model is
yt 0 1 yt 1 1 zt 1 1(yt 1 zt 1) et,
(18.66)
E(et It 1) 0.
To forecast yn 1, we use observations up through n to estimate the cointegrating parameter, , and then estimate the parameters of the error correction model by OLS, as described in Section 18.4. Forecasting yn 1 is easy: we just plug yn, zn, and ynˆzn into the equation. Having obtained the forecast of yn 1, we add it to yn.
By rearranging the error correction model, we can write
yt 0 1 yt 1 2 yt 2 1zt 1 2zt 2 ut, |
(18.67) |
where 1 1 1 , 2 1, and so on, which is the first equation in a VAR model for yt and zt. Notice that this depends on five parameters, just as many as in the error correction model. The point is that, for the purposes of forecasting, the VAR model in the levels and the error correction model are essentially the same. This is not the case in more general error correction models. For example, suppose that 1 1 0 in (18.66), but we have a second error correction term, 2(yt 2 zt 2). Then, the error correction model involves only four parameters, whereas (18.67)—which has the same order of lags for y and z—contains five parameters. Thus, error correction models can economize on parameters, that is, they are generally more parsimonious than VARs in levels.
If yt and zt are I(1) but not cointegrated, the appropriate model is (18.66) without the error correction term. This can be used to forecast yn 1, and we can add this to yn to forecast yn 1.
SUMMARY
The time series topics covered in this chapter are used routinely in empirical macroeconomics, empirical finance, and a variety of other applied fields. We began by showing how infinite distributed lag models can be interpreted and estimated. These can provide flexible lag distributions with fewer parameters than a similar finite distributed lag model. The geometric distributed lag and, more generally, rational distributed lag models, are the most popular. They can be estimated using standard econometric procedures on simple dynamic equations.
608
Chapter 18 |
Advanced Time Series Topics |
Testing for a unit root has become very common in time series econometrics. If a series has a unit root, then, in many cases, the usual large sample normal approximations are no longer valid. In addition, a unit root process has the property that an innovation has a long-lasting effect, which is of interest in its own right. While there are many tests for unit roots, the Dickey-Fuller t test—and its extension, the augmented Dickey-Fuller test—is probably the most popular and easiest to implement. We can allow for a linear trend when testing for unit roots by adding a trend to the DickeyFuller regression.
When an I(1) series, yt, is regressed on another I(1) series, xt, there is serious concern about spurious regression, even if the series do not contain obvious trends. This has been studied thoroughly in the case of a random walk: even if the two random walks are independent, the usual t test for significance of the slope coefficient, based on the usual critical values, will reject much more than the nominal size of the test. In addition, the R2 tends to a random variable, rather than to zero (as would be the case if we regress the difference in yt on the difference in xt).
In one important case, a regression involving I(1) variables is not spurious, and that is when the series are cointegrated. This means that a linear function of the two I(1) variables is I(0). If yt and xt are I(1) but yt xt is I(0), yt and xt cannot drift arbitrarily far apart. There are simple tests of the null of no cointegration against the alternative of cointegration, one of which is based on applying a Dickey-Fuller unit root test to the residuals from a static regression. There are also simple estimators of the cointegrating parameter that yield t statistics with approximate standard normal distributions (and asymptotically valid confidence intervals). We covered the leads and lags estimator in Section 18.4.
Cointegration between yt and xt implies that error correction terms may appear in a model relating yt to xt; the error correction terms are lags in yt xt, where is the cointegrating parameter. A simple two-step estimation procedure is available for estimating error correction models. First, is estimated using a static regression (or the leads and lags regression). Then, OLS is used to estimate a simple dynamic model in first differences which includes the error correction terms.
Section 18.5 contained an introduction to forecasting, with emphasis on regressionbased forecasting methods. Static models or, more generally, models that contain explanatory variables dated contemporaneously with the dependent variable, are limited because then the explanatory variables need to be forecasted. If we plug in hypothesized values of unknown future explanatory variables, we obtain a conditional forecast. Unconditional forecasts are similar to simply modeling yt as a function of past information we have observed at the time the forecast is needed. Dynamic regression models, including autoregressions and vector autoregressions, are used routinely. In addition to obtaining one-step-ahead point forecasts, we also discussed the construction of forecast intervals, which are very similar to prediction intervals.
Various criteria are used for choosing among forecasting methods. The most common performance measures are the root mean squared error and the mean absolute error. Both estimate the size of the average forecast error. It is most informative to compute these measures using out-of-sample forecasts.
Multi-step-ahead forecasts present new challenges and are subject to large forecast error variances. Nevertheless, for models such as autoregressions and vector autore-
609
Part 3 |
Advanced Topics |
gressions, multi-step-ahead forecasts can be computed, and approximate forecast intervals can be obtained.
Forecasting trending and I(1) series requires special care. Processes with deterministic trends can be forecasted by including time trends in regression models, possibly with lags of variables. A potential drawback is that deterministic trends can provide poor forecasts for long-horizon forecasts: once it is estimated, a linear trend continues to increase or decrease. The typical approach to forecasting an I(1) process is to forecast the difference in the process and to add the level of the variable to that forecasted difference. Alternatively, vector autoregressive models can be used in the levels of the series. If the series are cointegrated, error correction models can be used instead.
KEY TERMS
Augmented Dickey-Fuller Test |
Leads and Lags Estimator |
Cointegration |
Loss Function |
Conditional Forecast |
Martingale |
Dickey-Fuller Distribution |
Martingale Difference Sequence |
Dickey-Fuller (DF) Test |
Mean Absolute Error (MAE) |
Engle-Granger Two-Step Procedure |
Multiple-Step-Ahead Forecast |
Error Correction Model |
One-Step-Ahead Forecast |
Exponential Smoothing |
Out-of-Sample Criteria |
Forecast Error |
Point Forecast |
Forecast Interval |
Rational Distributed Lag (RDL) Model |
Geometric (or Koyck) Distributed Lag |
Root Mean Squared Error (RMSE) |
Granger Causality |
Spurious Regression Problem |
In-Sample Criteria |
Unconditional Forecast |
Infinite Distributed Lag (IDL) Model |
Unit Roots |
Information Set |
Vector Autoregressive (VAR) Model |
PROBLEMS
18.1Consider equation (18.15) with k 2. Using the IV approach to estimating the h and , what would you use as instruments for yt 1?
18.2An interesting economic model that leads to an econometric model with a lagged
dependent variable relates yt to the expected value of xt, say x*t , where the expectation is based on all observed information at time t 1:
y |
|
x* u |
. |
(18.68) |
t |
0 1 |
t t |
|
|
A natural assumption on {ut} is that E(ut It 1) 0, where It 1 denotes all information on y and x observed at time t 1; this means that E(yt It 1) 0 1x*t . To complete this model, we need an assumption about how the expectation x*t is formed. We saw a simple example of adaptive expectations in Section 11.2, where x*t xt 1. A more complicated adaptive expectations scheme is
610
Chapter 18 Advanced Time Series Topics
x*t x*t 1 (xt 1 x*t 1), |
(18.69) |
where 0 1. This equation implies that the change in expectations reacts to whether last period’s realized value was above or below its expectation. The assumption 0 1 implies that the change in expectations is a fraction of last period’s error.
(i)Show that the two equations imply that
yt 0 (1 )yt 1 1xt 1 ut (1 )ut 1.
[Hint: Lag equation (18.68) one period, multiply it by (1 ), and subtract this from (18.68). Then, use (18.69).]
(ii)Under E(ut It 1) 0, {ut} is serially uncorrelated. What does this imply about the errors, vt ut (1 )ut 1?
(iii)If we write the equation from part (i) as
yt 0 1 yt 1 2 xt 1 vt,
how would you consistently estimate the j?
(iv)Given consistent estimators of the j, how would you consistently estimate and 1?
18.3Suppose that {yt} and {zt} are I(1) series, but yt zt is I(0) for some 0. Show that for any , yt zt must be I(1).
18.4Consider the error correction model in equation (18.37). Show that if you add
another lag of the error correction term, yt 2 xt 2, the equation suffers from perfect collinearity. [Hint: Show that yt 2 xt 2 is a perfect linear function of yt 1 xt 1,
xt 1, and yt 1.]
18.5 Suppose the process {(xt,yt): t 0,1,2,…} satisfies the equations
yt xt ut
and
xt xt 1 vt,
where E(ut It 1) E(vt It 1) 0, It 1 contains information on x and y dated at time t 1 and earlier, 0, and 1 [so that xt, and therefore yt, is I(1)]. Show that these two equations imply an error correction model of the form
yt 1 xt 1 (yt 1 xt 1) et,
where 1 , 1, and et ut vt. (Hint: First subtract yt 1 from both sides of the first equation. Then, add and subtract xt 1 from the right-hand side and rearrange. Finally, use the second equation to get the error correction model that contains xt 1.)
18.6 Using the monthly data in VOLAT.RAW, the following model was estimated:
ˆ |
(.074)pcip 2 |
(.073)pcip 3 |
(.031)pcsp 1 |
|||
pcip (1.54) (.344)pcip 1 |
||||||
ˆ |
(.045)pcip 2 |
(.042)pcip 3 |
(.013)pcsp 1 |
|||
pcip 0(.56) (.042)pcip 1 |
||||||
n 554, R |
2 |
¯2 |
.168, |
|
||
|
.174, R |
|
|
|||
611
Part 3 |
Advanced Topics |
where pcip is the percentage change in monthly industrial production, at an annualized rate, and pcsp is the percentage change in the Standard & Poors 500 Index, also at an annualized rate.
(i)If the past three months of pcip are zero, and pcsp 1 0, what is the predicted growth in industrial production for this month? Is it statistically different from zero?
(ii)If the past three months of pcip are zero, but pcsp 1 10, what is the predicted growth in industrial production?
(iii)What do you conclude about the effects of the stock market on real economic activity?
18.7Let gMt be the annual growth in the money supply and let unemt be the unemployment rate. Assuming that unemt follows a stable AR(1) process, explain in detail how you would test whether gM Granger causes unem.
18.8Suppose that yt follows the model
yt 1zt 1 ut ut ut 1 et
E(et It 1) 0,
where It 1 contains y and z dated at t 1 and earlier.
(i)Show that E(yt 1 It) (1 ) yt 1zt 1zt 1. (Hint: Write ut 1 yt 1 1zt 2 and plug this into the second equation; then, plug the result into the first equation and take the conditional expectation.)
(ii)Suppose that you use n observations to estimate , 1, and . Write the equation for forecasting yn 1.
(iii)Explain why the model with one lag of z and AR(1) serial correlation is a special case of the model
yt 0 yt 1 1 zt 1 2 zt 2 et.
(iv)What does part (iii) suggest about using models with AR(1) serial correlation for forecasting?
18.9Let {yt} be an I(1) sequence. Suppose that gˆn is the one-step-ahead forecast ofyn 1 and let fˆn gˆn yn be the one-step-ahead forecast of yn 1. Explain why the forecast errors for forecasting yn 1 and yn 1 are identical.
COMPUTER EXERCISES
18.10 Use the data in WAGEPRC.RAW for this exercise. Problem 11.5 gives estimates of a finite distributed lag model of gprice on gwage, where 12 lags of gwage are used.
(i)Estimate a simple geometric DL model of gprice on gwage. In particular, estimate equation (18.11) by OLS. What are the estimated impact propensity and LRP? Sketch the estimated lag distribution.
(ii)Compare the estimated IP and LRP to those obtained in Problem 11.5. How do the estimated lag distributions compare?
612
Chapter 18 |
Advanced Time Series Topics |
(iii)Now, estimate the rational distributed lag model from (18.16). Sketch the lag distribution and compare the estimated IP and LRP to those obtained in part (ii).
18.11Use the data in HSEINV.RAW for this exercise.
(i)Test for a unit root in log(invpc), including a linear time trend and two lags of log(incpct). Use a 5% significance level.
(ii)Use the approach from part (i) to test for a unit root in log(price).
(iii)Given the outcomes in parts (i) and (ii), does it make sense to test for cointegration between log(invpc) and log(price)?
18.12Use the data in VOLAT.RAW for this exercise.
(i)Estimate an AR(3) model for pcip. Now, add a fourth lag and verify that it is very insignificant.
(ii)To the AR(3) model from part (i), add three lags of pcsp to test whether pcsp Granger causes pcip. Carefully, state your conclusion.
(iii)To the model in part (ii), add three lags of the change in i3, the threemonth T-bill rate. Does pcsp Granger cause pcip conditional on past
i3?
18.13In testing for cointegration between gfr and pe in Example 18.5, add t2 to equation (18.32) to obtain the OLS residuals. Include one lag in the augmented DF test. The 5% critical value for the test is 4.15.
18.14Use INTQRT.RAW for this exercise.
(i)Estimate the equation
hy6t hy3t 1 0 hy3t 1 hy3t 1 1 hy3t 2 et
and report the results in equation form. Test H0: 1 against a twosided alternative. Assume that the lead and lag are sufficient so that {hy3t 1} is strictly exogenous in this equation and do not worry about serial correlation.
(ii)To the error correction model in (18.39), add hy3t 2 and (hy6t 2 hy3t 3). Are these terms jointly significant? What do you conclude about the appropriate error correction model?
18.15Use the data in PHILLIPS.RAW, adding the 1997 values for unem and inf: 4.9 and 2.3, respectively.
(i)Estimate the models in (18.48) and (18.49) using the data up through 1997. Do the parameter estimates change much compared with (18.48) and (18.49)?
(ii)Use the new equations to forecast unem1998; round to two places after the decimal. Use the Economic Report of the President (1999 or later) to obtain unem1998. Which equation produces a better forecast?
(iii)As we discussed in the text, the forecast for unem1998 using (18.49) is 4.90. Compare this with the forecast obtained using the data through 1997. Does using the extra year of data to obtain the parameter estimates produce a better forecast?
613
Part 3 |
Advanced Topics |
(iv)Use the model estimated in (18.48) to obtain a two-step-ahead forecast
of unem. That is, forecast unem1998 using equation (18.55) with ˆ 1.572, ˆ .732, and h 2. Is this better or worse than the one- step-ahead forecast obtained by plugging unem1997 4.9 into (18.48)?
18.16Use the data in BARIUM.RAW for this exercise.
(i)Estimate the linear trend model chnimpt t ut, using the first 119 observations (this excludes the last twelve months of observations for 1988). What is the standard error of the regression?
(ii)Now, estimate an AR(1) model for chnimp, again using all data but the last twelve months. Compare the standard error of the regression with that from part (i). Which model provides a better in-sample fit?
(iii)Use the models from parts (i) and (ii) to compute the one-step-ahead forecast errors for the twelve months in 1988. (You should obtain twelve forecast errors for each method.) Compute and compare the RMSEs and the MAEs for the two methods. Which forecasting method works better out-of-sample for one-step-ahead forecasts?
(iv)Add monthly dummy variables to the regression from part (i). Are these jointly significant? (Do not worry about the slight serial correlation in the errors from this regression when doing the joint test.)
18.17Use the data in FERTIL3.RAW for this exercise.
(i)Graph gfr against time. Does it contain a clear upward or downward trend over the entire sample period?
(ii)Using the data up through 1979, estimate a cubic time trend model for gfr (that is, regress gfr on t, t2, and t3, along with an intercept). Comment on the R-squared of the regression.
(iii)Using the model in part (ii), compute the mean absolute error of the one-step-ahead forecast errors for the years 1980 through 1984.
(iv)Using the data through 1979, regress gfrt on a constant only. Is the constant statistically different from zero? Does it make sense to assume that any drift term is zero, if we assume that gfrt follows a random walk?
(v)Now, forecast gfr for 1980 through 1984, using a random walk model:
the forecast of gfrn 1 is simply gfrn. Find the MAE. How does it compare with the MAE from part (iii)? Which method of forecasting do you prefer?
(vi)Now, estimate an AR(2) model for gfr, again using the data only through 1979. Is the second lag significant?
(vii)Obtain the MAE for 1980 through 1984, using the AR(2) model. Does this more general model work better out-of-sample than the random walk model?
18.18Use CONSUMP.RAW for this exercise.
(i)Let yt be real per capita disposable income. Use the data up through 1989 to estimate the model
yt t yt 1 ut
and report the results in the usual form.
614
Chapter 18 |
Advanced Time Series Topics |
(ii)Use the estimated equation from part (i) to forecast y in 1990. What is the forecast error?
(iii)Compute the mean absolute error of the one-step-ahead forecasts for the 1990s, using the parameters estimated in part (i).
(iv)Now, compute the MAE over the same period, but drop yt 1 from the equation. Is it better to include yt 1 in the model or not?
18.19Use the data in INTQRT.RAW for this exercise.
(i)Using the data from all but the last four years (16 quarters), estimate an AR(1) model for r6t. (We use the difference because it appears that r6t has a unit root.) Find the RMSE of the one-step-ahead forecasts forr6, using the last 16 quarters.
(ii)Now, add the error correction term sprt 1 r6t 1 r3t 1 to the equation from part (i). (This assumes that the cointegrating parameter is one.) Compute the RMSE for the last 16 quarters. Does the error correction term help with out-of-sample forecasting in this case?
(iii)Now, estimate the cointegrating parameter, rather than setting it to one. Use the last 16 quarters again to produce the out-of-sample RMSE. How does this compare with the forecasts from parts (i) and (ii)?
(iv)Would your conclusions change if you wanted to predict r6 rather thanr6? Explain.
615
