Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Wooldridge_-_Introductory_Econometrics_2nd_Ed

.pdf
Скачиваний:
127
Добавлен:
21.03.2016
Размер:
4.36 Mб
Скачать

Part 3

Advanced Topics

dicted when y 0 is 80, and the percent correctly predicted when y 1 is 40. Find the overall percent correctly predicted.

17.2 Let grad be a dummy variable for whether a student-athlete at a large university graduates in five years. Let hsGPA and SAT be high school grade point average and SAT score. Let study be the number of hours spent per week in organized study hall. Suppose that, using data on 420 student-athletes, the following logit model is obtained:

P(grad 1ˆhsGPA,SAT,study) ( 1.17 .24 hsGPA .00058 SAT .073 study),

where (z) exp(z)/[1 exp(z)] is the logit function. Holding hsGPA fixed at 3.0 and SAT fixed at 1,200, compute the estimated difference in the graduation probability for someone who spent 10 hours per week in study hall and someone who spent five hours per week.

17.3 (Requires calculus) (i) Suppose in the Tobit model that x1 log(z1), and this is the only place z1 appears in x. Show that

 

E(y y 0,x)

( 1/z1){1 (x / )[x / (x / )]},

(17.49)

 

 

z1

 

 

 

 

 

 

where 1 is the coefficient on log(z1).

 

(ii) If x

z

, and x

2

z2, show that

 

1

1

 

1

 

E(y y 0,x)

z1 ( 1 2 2z1){1 (x / )[x / (x / )]},

where 1 is the coefficient on z1, and 2 is the coefficient on z12.

17.4 Let mvpi be the marginal value product for worker i, which is the price of a firm’s good multiplied by the marginal product of the worker. Assume that

log(mvpi) 0 1xi1 k xik ui

wagei max(mvpi,minwagei),

where the explanatory variables include education, experience, and so on, and minwagei is the minimum wage relevant for person i. Write log(wagei) in terms of log(mvpi) and log(minwagei).

17.5 (Requires calculus) Let patents be the number of patents applied for by a firm during a given year. Assume that the conditional expectation of patents given sales and

RD is

E(patents sales,RD) exp[ 0 1log(sales) 2RD 3RD2],

where sales is annual firm sales, and RD is total spending on research and development over the past 10 years.

(i)How would you estimate the j? Justify your answer by discussing the nature of patents.

566

Chapter 17

Limited Dependent Variable Models and Sample Selection Corrections

(ii)How do you interpret 1?

(iii)Find the partial effect of RD on E(patents sales,RD).

17.6Consider a family saving function for the population of all families in the United States:

sav 0 1inc 2hhsize 3educ 4age u,

where hhsize is household size, educ is years of education of the household head, and age is age of the household head. Assume that E(u inc,hhsize,educ,age) 0.

(i)Suppose that the sample includes only families whose head is over 25 years old. If we use OLS on such a sample, do we get unbiased estimators of the j? Explain.

(ii)Now suppose our sample includes only married couples without children. Can we estimate all of the parameters in the saving equation? Which ones can we estimate?

(iii)Suppose we exclude from our sample families that save more than $25,000 per year. Does OLS produce consistent estimators of the j?

17.7Suppose you are hired by a university to study the factors that determine whether students admitted to the university actually come to the university. You are given a large random sample of students who were admitted the previous year. You have information on whether each student chose to attend, high school performance, family income, financial aid offered, race, and geographic variables. Someone says to you, “Any analysis of that data will lead to biased results because it is not a random sample of all college applicants, but only those who apply to this university.” What do you think of this criticism?

COMPUTER EXERCISES

17.8Use the data in PNTSPRD.RAW for this exercise.

(i)The variable favwin is a binary variable if the team favored by the Las Vegas point spread wins. A linear probability model to estimate the probability that the favored team wins is

P( favwin 1 spread ) 0 1spread.

Explain why, if the spread incorporates all relevant information, we expect 0 .5.

(ii)Estimate the model from part (i) by OLS. Test H0: 0 .5 against a two-sided alternative. Use both the usual and heteroskedasticity-robust standard errors.

(iii)Is spread statistically significant? What is the estimated probability that the favored team wins when spread 10?

(iv)Now, estimate a probit model for P( favwin 1 spread ). Interpret and test the null hypothesis that the intercept is zero. [Hint: Remember that(0) .5.]

(v)Use the probit model to estimate the probability that the favored team wins when spread 10. Compare this with the LPM estimate from part (iii).

567

Part 3

Advanced Topics

(vi)Add the variables favhome, fav25, and und25 to the probit model and test joint significance of these variables using the likelihood ratio test. (How many df are in the chi-square distribution?) Interpret this result, focusing on the question of whether the spread incorporates all observable information prior to a game.

17.9Use the data in LOANAPP.RAW for this exercise; see also Problem 7.16.

(i)Estimate a probit model of approve on white. Find the estimated probability of loan approval for both whites and nonwhites. How do these compare with the linear probability estimates?

(ii)Now, add the variables hrat, obrat, loanprc, unem, male, married, dep, sch, cosign, chist, pubrec, mortlat1, mortlat2, and vr to the probit model. Is there statistically significant evidence of discrimination against nonwhites?

(iii)Estimate the model from part (ii) by logit. Compare the coefficient on white to the probit estimate.

(iv)How would you compare the size of the discrimination effect between probit and logit?

17.10Use the data in FRINGE.RAW for this exercise.

(i)For what percentage of the workers in the sample is pension equal to zero? What is the range of pension for workers with nonzero pension benefits? Why is a Tobit model appropriate for modeling pension?

(ii)Estimate a Tobit model explaining pension in terms of exper, age, tenure, educ, depends, married, white, and male. Do whites and males have statistically significant higher expected pension benefits?

(iii)Use the results from part (ii) to estimate the difference in expected pension benefits for a white male and a nonwhite female, both of whom are 35 years old, single with no dependents, have 16 years of education, and 10 years of experience.

(iv)Add union to the Tobit model and comment on its significance.

(v)Apply the Tobit model from part (iv) but with peratio, the pensionearnings ratio, as the dependent variable. (Notice that this is a fraction between zero and one, but, while it often takes on the value zero, it never gets close to being unity. Thus, a Tobit model is fine as an approximation.) Does gender or race have an effect on the pension-earnings ratio?

17.11In Example 9.1, we added the quadratic terms pcnv2, ptime862, and inc862 to a linear model for narr86.

(i)Use the data in CRIME1.RAW to add these same terms to the Poisson

regression in Example 17.3.

n

(ii) Compute the estimate of 2 given by ˆ 2 (n k 1) 1

uˆi2/yˆi. Is

 

i 1

there evidence of overdispersion? How should the Poisson MLE standard errors be adjusted?

(iii)Use the results from parts (i) and (ii) and Table 17.3 to compute the quasi-likelihood ratio statistic for joint significance of the three quadratic terms. What do you conclude?

568

Chapter 17

Limited Dependent Variable Models and Sample Selection Corrections

17.12 Refer to Table 13.1 in Chapter 13. There, we used the data in FERTIL1.RAW to estimate a linear model for kids, the number of children ever born to a woman.

(i)Estimate a Poisson regression model for kids, using the same variables in Table 13.1. Interpret the coefficient on y82.

(ii)What is the estimated percentage difference in fertility between a black woman and a nonblack woman, holding other factors fixed?

(iii)Obtain ˆ . Is there evidence of overor underdispersion?

(iv) Compute the fitted values from the Poisson regression and the

ˆ

R-squared as the squared correlation between kidsi and kidsi. Compare this with the R-squared for the linear regression model.

17.13Use the data in RECID.RAW to estimate the model from Example 17.4 by OLS, using only the 552 uncensored durations. Comment generally on how these estimates compare with those in Table 17.4.

17.14Use the MROZ.RAW data for this exercise.

(i)Using the 428 women who were in the work force, estimate the return to education by OLS including exper, exper2, nwifeinc, age, kidslt6, and kidsge6 as explanatory variables. Report your estimate on educ and its standard error.

(ii)Now estimate the return to education by Heckit, where all exogenous variables show up in the second-stage regression. In other words, the

regression is log(wage) on educ, exper, exper2, nwifeinc, age, kidslt6, kidsge6, and ˆ. Compare the estimated return to education and its standard error to that from part (i).

(iii)Using only the 428 observations for working women, regress ˆ on educ, exper, exper2, nwifeinc, age, kidslt6, and kidsge6. How big is the

R-squared? How does this help explain your findings from part (ii)? (Hint: Think multicollinearity.)

A P P E N D I X

1 7 A

Asymptotic Standard Errors in Limited Dependent

Variable Models

Derivations of the asymptotic standard errors for the models and methods introduced in this chapter are well beyond the scope of this text. Not only do the derivations require matrix algebra, but they also require advanced asymptotic theory of nonlinear estimation. The background needed for a careful analysis of these methods and several derivations are given in Wooldridge (1999).

It is instructive to see the formulas for obtaining the asymptotic standard errors for at least some of the methods. Given the binary response model P(y 1 x) G(x ), where G( ) is the logit or probit function, and is the k 1 vector of parameters, the

asymptotic variance matrix of ˆ is estimated as

569

Part 3

 

 

 

 

 

Advanced Topics

 

 

 

 

 

 

 

 

ˆ

n

 

ˆ 2

xi xi

1

 

 

ˆ

[g(xi )]

 

 

 

Avar( )

 

 

 

,

 

(17.50)

ˆ

 

ˆ

 

 

i 1

G(xi )[1

G(xi )]

 

 

 

which is a k k matrix. (See Appendix D for a summary of matrix algebra.) Without the terms involving g( ) and G( ), this formula looks a lot like the estimated variance matrix for the OLS estimator, minus the term ˆ 2. The expression in (17.50) accounts for the nonlinear nature of the response probability—that is, the nonlinear nature of G( )—as well as the particular form of heteroskedasticity in a binary response mode: Var(y x) G(x )[1 G(x )].

The square roots of the diagonal elements of (17.50) are the asymptotic standard errors of the ˆj, and they are routinely reported by econometrics software that supports logit and probit analysis. Once we have these, (asymptotic) t statistics and confidence intervals are obtained in the usual ways.

The matrix in (17.50) is also the basis for Wald tests of multiple restrictions on [see Wooldridge (1999, Chapter 15)].

The asymptotic variance matrix for Tobit is more complicated but has a similar structure. Note that we can obtain a standard error for ˆ as well. The asymptotic variance for Poisson regression, allowing for 2 1 in (17.32), has a form much like (17.50):

 

 

i 1

 

 

 

ˆ ˆ

 

n

 

1

 

2

 

ˆ

 

.

Avar( ) ˆ

 

exp(x )x x

i

 

 

 

i i

 

The square roots of the diagonal elements of this matrix are the asymptotic standard errors. If the Poisson assumption holds, we can drop ˆ 2 from the formula (because2 1).

Asymptotic standard errors for censored regression, truncated regression, and the Heckit sample selection correction are more complicated, although they share features with the previous formulas. See Wooldridge (1999) for details.

570

C h a p t e r Eighteen

Advanced Time Series Topics

In this chapter, we cover some more advanced topics in time series econometrics. In Chapters 10, 11, and 12, we emphasized in several places that using time series data in regression analysis requires some care due to the trending, persistent nature of

many economic time series. In addition to studying topics such as infinite distributed lag models and forecasting, we also discuss some recent advances in analyzing time series processes with unit roots.

In Section 18.1, we describe infinite distributed lag models, which allow a change in an explanatory variable to affect all future values of the dependent variable. Conceptually, these models are straightforward extensions of the finite distributed lag models in Chapter 10; but estimating these models poses some interesting challenges.

In Section 18.2, we show how to formally test for unit roots in a time series process. Recall from Chapter 11 that we excluded unit root processes to apply the usual asymptotic theory. Because the presence of a unit root implies that a shock today has a longlasting impact, determining whether a process has a unit root is of interest in its own right.

We cover the notion of spurious regression between two time series processes, each of which has a unit root, in Section 18.3. The main result is that even if two unit root series are independent, it is quite likely that the regression of one on the other will yield a statistically significant t statistic. This emphasizes the potentially serious consequences of using standard inference when the dependent and independent variables are integrated processes.

The issue of cointegration applies when two series are I(1), but a linear combination of them is I(0); in this case, the regression of one on the other is not spurious, but instead tells us something about the long-run relationship between them. Cointegration between two series also implies a particular kind of model, called an error correction model, for the short-term dynamics. We cover these models in Section 18.4.

In Section 18.5, we provide an overview of forecasting and bring together all of the tools in this and previous chapters to show how regression methods can be used to forecast future outcomes of a time series. The forecasting literature is vast, so we focus only on the most common regression-based methods. We also touch on the related topic of Granger causality.

571

Part 3

Advanced Topics

18.1 INFINITE DISTRIBUTED LAG MODELS

Let {(yt,zt): t …, 2, 1,0,1,2,…} be a bivariate time series process (which is only partially observed). An infinite distributed lag (IDL) model relating yt to current and all past values of z is

yt 0zt 1zt 1 2zt 2 ut,

(18.1)

where the sum on lagged z extends back to the indefinite past. This model is only an approximation to reality, as no economic process started infinitely far into the past. Compared with a finite distributed lag model, an IDL model does not require that we truncate the lag at a particular value.

In order for model (18.1) to make sense, the lag coefficients, j, must tend to zero as j * . This is not to say that 2 is smaller in magnitude than 1; it only means that the impact of zt j on yt must eventually become small as j gets large. In most applications, this makes economic sense as well: the distant past of z should be less important for explaining y than the recent past of z.

Even if we decide that (18.1) is a useful model, we clearly cannot estimate it without some restrictions. For one, we only observe a finite history of data. Equation (18.1) involves an infinite number of parameters, 0, 1, 2, …, which cannot be estimated without restrictions. Later, we place restrictions on the j that allow us to estimate (18.1).

As with finite distributed lag models, the impact propensity in (18.1) is simply 0 (see Chapter 10). Generally, the h have the same interpretation as in an FDL. Suppose that zs 0 for all s 0 and that z0 1 and zs 0 for all s 1; in other words, at time t 0, z increases temporarily by one unit and then reverts to its initial level of zero. For any h 0, we have yh h uh for all h 0, and so

E(yh) h,

(18.2)

where we use the standard assumption that uh has zero mean. It follows that h is the change in E(yh), given a one-unit, temporary change in z at time zero. We just said thath must be tending to zero as h gets large for the IDL to make sense. This means that a temporary change in z has no long-run effect on expected y: E(yh) h * as h * .

We assumed that the process z starts at zs 0 and that the one-unit increase occurred at t 0. These were only for the purpose of illustration. More generally, if z temporarily increases by one unit (from any initial level) at time t, then h measures the change in the expected value of y after h periods. The lag distribution, which is h plotted as a function of h, shows the expected path that future y follow given the one-unit, temporary increase in z.

The long run propensity in model (18.1) is the sum of all of the lag coefficients:

LRP 0 1 2 3 …,

(18.3)

where we assume that the infinite sum is well-defined. Because the j must converge to zero, the LRP can often be well-approximated by a finite sum of the form 0

572

Suppose that zs 0 for s 0 and that z0 1, z1 1, and zs 0 for s 1. Find E(y 1), E(y0), and E(yh) for h 1. What happens as h * ?
1 8 . 1

Chapter 18

Advanced Time Series Topics

1 p for sufficiently large p. To interpret the LRP, suppose that the process zt is steady at zs 0 for s 0. At t 0, the process permanently increases by one unit. For example, if zt is the percentage change in the money supply and yt is the inflation rate, then we are interested in the effects of a permanent increase of one percentage point in money supply growth. Then, by substituting zs 0 for s 0 and zt 1 for t 0, we have

yh 0 1 h uh,

where h 0 is any horizon. Because ut has a zero mean for all t, we have

E(yh) 0 1 h.

(18.4)

[It is useful to compare (18.4) and (18.2).] As the horizon increases, that is, as h * , the right-hand side of (18.4) is, by definition, the long run propensity. Thus, the LRP measures the long-run change in the expected value of y given a one-unit, perma-

Q U E S T I O N nent increase in z.

The previous derivation of the LRP, and the interpretation of j, used the fact that the errors have a zero mean; as usual, this is not much of an assumption, provided an

intercept is included in the model. A closer examination of our reasoning shows that we assumed that the change in z during any time period had no effect on the expected value of ut. This is the infinite distributed lag version of the strict exogeneity assumption that we introduced in Chapter 10 (in particular, Assumption TS.2). Formally,

E(ut …,zt 2,zt 1,zt,zt 1,…) 0,

(18.5)

so that the expected value of ut does not depend on the z in any time period. While (18.5) is natural for some applications, it rules out other important possibilities. In effect, (18.5) does not allow feedback from yt to future z because zt h must be uncorrelated with ut for h 0. In the inflation/money supply growth example, where yt is inflation and zt is money supply growth, (18.5) rules out future changes in money supply growth that are tied to changes in today’s inflation rate. Given that money supply policy often attempts to keep interest rates and inflation at certain levels, this might be unrealistic.

One approach to estimating the j, which we cover in the next subsection, requires a strict exogeneity assumption in order to produce consistent estimators of the j. A weaker assumption is

E(ut zt,zt 1,…) 0.

(18.6)

Under (18.6), the error is uncorrelated with current and past z, but it may be correlated with future z; this allows zt to be a variable that follows policy rules that depend on past y. Sometimes, (18.6) is sufficient to estimate the j; we explain this in the next subsection.

573

Part 3

Advanced Topics

One thing to remember is that neither (18.5) nor (18.6) says anything about the serial correlation properties of {ut}. (This is just as in finite distributed lag models.) If anything, we might expect the {ut} to be serially correlated because (18.1) is not generally dynamically complete in the sense discussed in Section 11.4. We will study the serial correlation problem later.

How do we interpret the lag coefficients and the LRP if (18.6) holds but (18.5) does not? The answer is: the same way as before. We can still do the previous thought (or counterfactual) experiment, even though the data we observe are generated by some feedback between yt and future z. For example, we can certainly ask about the long-run effect of a permanent increase in money supply growth on inflation, even though the data on money supply growth cannot be characterized as strictly exogenous.

The Geometric (or Koyck) Distributed Lag

Because there are generally an infinite number of j, we cannot consistently estimate them without some restrictions. The simplest version of (18.1), which still makes the model depend on an infinite number of lags, is the geometric (or Koyck) distributed lag. In this model, the j depend on only two parameters:

j j, 1, j 0,1,2, ….

(18.7)

The parameters and may be positive or negative, but must be less than one in absolute value. This ensures that j * 0 as j * 0. In fact, this convergence happens at a very fast rate. (For example, with .5 and j 10, j 1/1024 .001.)

The impact propensity in the GDL is simply 0 , and so the sign of the IP is determined by the sign of . If 0, say, and 0, then all lag coefficients are positive. If 0, the lag coefficients alternate in sign ( j is negative for odd j ). The long run propensity is more difficult to obtain, but we can use a standard result on the sum of a geometric series: for 1, 1 2 j … 1/(1 ), and so

LRP /(1 ).

The LRP has the same sign as .

If we plug (18.7) into (18.1), we still have a model that depends on the z back to the indefinite past. Nevertheless, a simple subtraction yields an estimable model. Write the IDL at times t and t 1 as:

yt zt zt 1 2zt 2 ut

(18.8)

and

 

yt 1 zt 1 zt 2 2zt 3 ut 1.

(18.9)

If we multiply the second equation by and subtract it from the first, all but a few of the terms cancel:

yt yt 1 (1 ) zt ut ut 1,

574

Chapter 18 Advanced Time Series Topics

which we can write as

yt 0 zt yt 1 ut ut 1,

(18.10)

where 0 (1 ) . This equation looks like a standard model with a lagged dependent variable, where zt appears contemporaneously. Because is the coefficient on zt and is the coefficient on yt 1, it appears that we can estimate these parameters. [If, for some reason, we are interested in , we can always obtain ˆ ˆ0 /(1 ˆ) after estimating and 0.]

The simplicity of (18.10) is somewhat misleading. The error term in this equation, ut ut 1, is generally correlated with yt 1. From (18.9), it is pretty clear that ut 1 and yt 1 are correlated. Therefore, if we write (18.10) as

yt 0 zt yt 1 vt ,

(18.11)

where vt ut ut 1, then we generally have correlation between vt and yt 1. Without further assumptions, OLS estimation of (18.11) produces inconsistent estimates of and .

One case where vt must be correlated with yt 1 occurs when ut is independent of zt and all past values of z and y. Then, (18.8) is dynamically complete, and ut is uncorrelated with yt 1. From (18.9), the covariance between vt and yt 1 is Var(ut 1)u2, which is zero only if 0. We can easily see that vt is serially correlated:

because {u } is serially uncorrelated, E(v v

) E(u u

) E(u2

) E(u u )

t

t t 1

t t 1

t 1

t t 2

2E(ut 1ut 2) u2. For j 1, E(vtvt j) 0. Thus, {vt} is a moving average process of order one (see Section 11.1). This gives an example of a model—which is derived

from the original model of interest—that has a lagged dependent variable and a particular kind of serial correlation.

If we make the strict exogeneity assumption (18.5), then zt is uncorrelated with ut and ut 1, and therefore with vt. Thus, if we can find a suitable instrumental variable for yt 1, then we can estimate (18.11) by IV. What is a good IV candidate for yt 1? By assumption, ut and ut 1 are both uncorrelated with zt 1, and so vt is uncorrelated with zt 1. If 0, zt 1 and yt 1 are correlated, even after partialling out zt. Therefore, we can use instruments (zt,zt 1) to estimate (18.11). Generally, the standard errors need to be adjusted for serial correlation in the {vt}, as we discussed in Section 15.7.

An alternative to IV estimation exploits the fact that {ut} may contain a specific kind of serial correlation. In particular, in addition to (18.6), suppose that {ut} follows the AR(1) model

ut ut 1 et

(18.12)

E(et zt,yt 1,zt 1,…) 0.

(18.13)

It is important to notice that the appearing in (18.12) is the same parameter multiplying yt 1 in (18.11). If (18.12) and (18.13) hold, we can write

yt 0 zt yt 1 et,

(18.14)

575

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]