Econometrics2011
.pdfCHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
113 |
The product xiei is iid (since the observations are iid) and mean zero (since E(xiei) = 0):
De…ne the k k covariance matrix |
= |
x x e2 : |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(6.8) |
||
We require the elements of to be …nite, orEequivalentlyi i0 i |
that |
Ek |
x |
e |
ik |
2 |
< |
1 |
: Using |
k |
x |
e |
ik |
2 = |
||||||||
kxik2 ei2 and the Cauchy-Schwarz Inequality (B.20), |
|
|
|
|
i |
|
|
|
|
|
|
i |
|
|
||||||||
|
Ekxieik2 = E kxik2 ei2 |
Ekxik4 |
1=2 Eei4 1=2 |
|
|
|
|
|
|
|
|
|
(6.9) |
|||||||||
which is …nite if xi and ei have …nite fourth moments. As ei |
is a linear combination of yi and xi; |
|||||||||||||||||||||
it is su¢ cient that the observables have …nite fourth moments (Theorem 3.16.1.6). |
|
|
|
|
|
|||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
Assumption 6.4.1 In addition to Assumption |
3.16.1, |
E |
y4 |
< |
1 |
and |
|
|
|
|
|
|
|||||||||
|
Ekxik4 < 1: |
|
|
|
|
|
|
|
i |
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Under Assumption 6.4.1 the CLT (Theorem 2.8.1) can be applied.
Theorem 6.4.1 Under Assumption 1.5.1 and Assumption 6.4.1, as n ! 1
|
|
n |
|
1 |
Xi |
|
|
d |
(6.10) |
||
p |
n |
=1 xiei ! N (0; ) |
where = E xix0ie2i :
Putting together (6.1), (6.7), and (6.10),
p |
|
|
|
|
|
d |
1 |
|
(0; ) |
|
|
n |
N |
|
|||||||||
|
|
|
|
! Qxx |
1 |
1 |
|
||||
|
|
b |
|
|
|
|
|
|
|
Qxx |
|
|
|
|
|
|
= N 0; Qxx |
|
as n ! 1; where the …nal equality follows from the property that linear combinations of normal vectors are also normal (Theorem B.9.1).
We have derived the asymptotic normal approximation to the distribution of the least-squares estimator.
Theorem 6.4.2 Asymptotic Normality of Least-Squares Estimator
Under Assumption 1.5.1 and Assumption 6.4.1, as n ! 1
|
|
|
d |
|
|
pn |
|
|
|||
! N (0; V ) |
|
||||
where |
b |
|
|
|
|
|
|
V = Qxx1 |
Qxx1; |
(6.11) |
|
Qxx = E(xixi0) ; and = E xixi0ei2 : |
|
|
|
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
114 |
||||||
In the stochastic order notation, Theorem 6.4.2 implies that |
|
|
|
||||
|
|
= + O |
(n 1=2) |
(6.12) |
|||
and |
|
b |
p |
|
|
|
|
|
|
b |
|
|
|
|
|
|
|
|
= Op(n 1=2) |
|
|
|
|
which is stronger than (6.6). |
|
|
|
|
|
: Con- |
|
The matrix V = avar( ) is the variance of the asymptotic distribution of p |
|
||||||
n |
|||||||
sequently, V is often |
referred to as the |
asymptotic covariance matrix |
of : The expression |
||||
b |
|
|
|
b |
V = Qxx1 Qxx1 is called a sandwich form. It might be worth noticing that there is a di¤erence |
||||||||||
between the variance of the asymptotic distribution given in (6.11) and the …nite-sampleb |
conditional |
|||||||||
variance in the CEF model as given in (5.11): |
|
|
|
|
|
|
||||
1 |
|
1 |
|
1 |
1 |
|
1 |
|
||
V = nX0X |
|
nX0DX nX0X |
: |
|
||||||
b |
|
|
|
|
|
|
|
|
|
|
While V |
and V are di¤erent, the two are close if n is large. Indeed, as n ! 1 |
|
b |
p
V b ! V :
There is a special case where and V simplify. We say that ei is a Homoskedastic Projection Error when
cov(xixi0; ei2) = 0: |
(6.13) |
Condition (6.13) holds in the homoskedastic linear regression model, but is somewhat broader. Under (6.13) the asymptotic variance formulas simplify as
|
|
|
x0 |
|
|
|
|
|
|
|
= |
E |
x |
|
E e2 |
= Q |
|
2 |
(6.14) |
||
|
i |
i |
|
i |
xx |
|
|
|
||
V = |
Qxx1 Qxx1 = Qxx1 2 |
|
|
V 0 |
(6.15) |
In (6.15) we de…ne V 0 = Qxx1 2 whether (6.13) is true or false. When (6.13) is true then V = V 0 ; otherwise V 6= V 0 : We call V 0 the homoskedastic asymptotic covariance matrix.
Theorem 6.4.2 states that the sampling distribution of the least-squares estimator, after rescaling, is approximately normal when the sample size n is su¢ ciently large. This holds true for all joint distributions of (yi; xi) which satisfy the conditions of Assumption 6.4.1, and is therefore broadly
applicable. Consequently, asymptotic normality is routinely used to approximate the …nite sample p
b
distribution of n :
b
A di¢ culty is that for any …xed n the sampling distribution of can be arbitrarily far from the normal distribution. In Figure 6.1 we have already seen a simple example where the least-squares estimate is quite asymmetric and non-normal even for reasonably large sample sizes. The normal approximation improves as n increases, but how large should n be in order for the approximation to be useful? Unfortunately, there is no simple answer to this reasonable question. The trouble is that no matter how large is the sample size, the normal approximation is arbitrarily poor for some data distribution satisfying the assumptions. We illustrate this problem using a simulation.
Let yi = 1xi + 2 + ei where xi is N (0; 1) ; and ei is independent of xi with the Double Pareto density f(e) = 2 jej 1 ; jej 1: If > 2 the error ei has zero mean and variance =( 2):
As approaches 2, however, its variance diverges to in…nity. In this context the normalized least-
n |
|
|
2 |
^ |
|
|
|
|
for any > 2. |
|||
|
|
|
|
|
|
|||||||
squares slope estimator q |
|
|
|
1 |
|
1 has the N(0; 1) asymptotic distibution |
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
^1 1 ; |
||||
In Figure 6.3 we display the …nite sample densities of the normalized estimator n |
|
|
|
|||||||||
|
|
|
|
|
|
|
close to the N(0; 1) |
|||||
setting n = 100 and varying the parameter . For = 3:0 the density is very q |
|
|
|
|
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
115 |
||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 6.3: Density of Normalized OLS estimator with Double Pareto Error
density. As diminishes the density changes signi…cantly, concentrating most of the probability mass around zero.
Another example is shown in Figure 6.4. Here the model is yi = + ei where
|
|
uk |
k |
|
|
|
|
|
|
ei = |
E |
|
i |
E uik |
|
|
|
|
(6.16) |
2k |
|
2 |
1=2 |
||||||
|
ui |
E ui |
|
|
|||||
|
|
|
|
p |
|
|
|
b |
|
and ui N(0; 1): We show the sampling distribution of |
n |
|
setting n = 100; for k = 1; 4, |
||||||
|
|
6 and 8. As k increases, the sampling distribution becomes highly skewed and non-normal. The lesson from Figures 6.3 and 6.4 is that the N(0; 1) asymptotic approximation is never guaranteed to be accurate.
6.5Joint Distribution
Theorem 6.4.2 gives the joint asymptotic distribution of the coe¢ cient estimates. We can use the result to study the covariance between the coe¢ cient estimates. For example, suppose k = 2
^ |
^ |
2): For simplicity suppose that the regressors are mean zero. Then |
|||
and write the estimates as ( |
1; |
||||
we can write |
|
Qxx = |
12 |
1 2 |
|
|
|
||||
|
|
1 2 |
22 |
where 21 and 22 are the variances of x1i and x2i; and is their correlation. If the error is ho-
^ ^ 0 1 2
moskedastic, then the asymptotic variance matrix for ( 1; 2) is V = Qxx : By the formula for inversion of a 2 2 matrix,
Q 1 |
= |
1 |
|
|
22 |
1 2 |
: |
|
12 22 (1 2) |
||||||||
xx |
|
1 2 |
12 |
|
||||
|
|
|
|
|
^ |
^ |
are negatively correlated (and |
|
Thus if x1i and x2i are positively correlated ( > 0) then 1 |
and 2 |
vice-versa).
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
116 |
|||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 6.4: Density of Normalized OLS estimator with error process (6.16)
For illustration, Figure 6.5 displays the probability contours of the joint asymptotic distribution
|
^ |
^ |
2 |
2 |
2 |
= 1 and = 0:5: The coe¢ cient estimates are negatively |
|
of 1 1 |
and 2 2 when 1 |
= 2 = |
|
||||
|
|
|
|
|
|
^ |
is unusually negative, |
correlated since the regressors are positively correlated. This means that if 1 |
|||||||
|
|
^ |
is unusually positive, or conversely. It is also unlikely that we will observe both |
||||
it is likely that 2 |
|||||||
^ |
^ |
unusually large and of the same sign. |
|
||||
1 |
and 2 |
|
This …nding that the correlation of the regressors is of opposite sign of the correlation of the coef- …cient estimates is sensitive to the assumption of homoskedasticity. If the errors are heteroskedastic then this relationship is not guaranteed.
This can be seen through a simple constructed example. Suppose that x1i and x2i only take the values f1; +1g; symmetrically, with Pr (x1i = x2i = 1) = Pr (x1i = x2i = 1) = 3=8; and Pr (x1i = 1; x2i = 1) = Pr (x1i = 1; x2i = 1) = 1=8: You can check that the regressors are mean zero, unit variance and correlation 0.5, which is identical with the setting displayed in Figure 6.5 when the error is homoskedastic.
|
|
|
Now suppose that the error is heteroskedastic. Speci…cally, suppose that E ei2 j x1i = x2i |
= |
||||||||||||||||||||||||||||||||||||||||
5 |
|
and |
|
|
2 |
x1i = x2i |
= |
1 |
: You can check that |
|
|
|
e2 |
= 1; |
|
|
|
x2 e2 |
= |
|
x2 e2 |
= 1 |
and |
|||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||||||||||
|
|
|
|
e |
|
|
|
|
|
i |
|
|
|
|
|
1i i |
|
2i i |
|
|||||||||||||||||||||||||
4 |
|
|
E |
2 |
i j |
7 |
6 |
|
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
E |
|
E |
|
E |
|
|
||||||||||||
E |
x1ix2iei |
= |
|
|
: Therefore |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||
|
|
|
|
|
|
|
|
|
V = Qxx1 |
Qxx1 |
|
32 7 |
8 |
|
32 |
|
1 |
|
2 3 |
|
|
|
|
|
||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
= |
9 |
2 |
|
1 |
|
2 |
|
|
|
|
|
|
|
|
|||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
1 |
|
1 |
7 |
|
|
1 |
|
|
1 |
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
|
|
|
54 |
|
|
54 |
|
|
5 |
|
|
|
|
|
|||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 6 |
|
|
|
1 |
76 |
|
1 |
|
76 |
|
|
|
1 |
7 |
|
|
|
|
|
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
8 |
2 |
|
|
|
|
|
|
|||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
|
2 |
1 |
|
1 |
3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
= |
|
4 |
: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
4 |
1 |
7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
^ |
|
|
4 |
|
|
^ |
|
|
5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
are positively correlated (their correlation is 1=4:) The |
||||||||||||||||||||||
Thus the coe¢ cient estimates 1 |
and 2 |
joint probability contours of their asymptotic distribution is displayed in Figure 6.6. We can see how the two estimates are positively associated.
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
117 |
||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
^ ^
Figure 6.5: Contours of Joint Distribution of ( 1; 2); homoskedastic case
What we found through this example is that in the presence of heteroskedasticity there is no simple relationship between the correlation of the regressors and the correlation of the parameter estimates.
We can extend the above analysis to study the covariance between coe¢ cient sub-vectors. For example, partitioning x0i = (x01i; x02i) and 0 = 01; 02 ; we can write the general model as
|
b |
yi = x10 i 1 + x20 i 2 + ei |
|
|
|
|
|||
|
11 b |
b |
: Make the partitions |
|
|
||||
and the coe¢ cient estimates as |
0 |
= 10 ; 20 |
|
|
|||||
Qxx = |
Q |
Q12 |
; |
11 |
12 |
: |
|
||
Q21 |
Q22 |
= 21 |
22 |
(6.17) |
|||||
From (3.37) |
|
|
Q 1 |
Q 1 |
Q Q 1 |
|
|
||
1 |
|
|
|
||||||
|
|
11 2 |
2 |
12 |
22 |
|
|||
Qxx |
= |
Q2211Q21Q111 |
11Q |
2211 |
|
|
where Q11 2 = Q11 Q12Q221Q21 and Q22 1 = Q22 Q21Q111Q12. Thus when the error is homoskedastic,
b |
b |
= 2Q1112Q12Q221 |
cov 1 |
; 2 |
which is a matrix generalization of the two-regressor case. In the general case, you can show that (Exercise 6.5)
|
|
|
V = |
V 11 |
V 12 |
|
|
|
where |
|
|
V 21 |
V 22 |
|
|
||
|
|
|
|
|
|
|
|
|
V 11 |
= Q1112 |
11 |
Q12Q221 21 |
|
12Q221Q21 + Q12Q221 22Q221Q21 |
|
Q1112 |
|
V 21 |
= Q2211 |
21 |
Q21Q111 11 |
|
22Q221Q21 + Q21Q111 12Q221Q21 |
Q1112 |
||
V 22 |
= Q2211 |
22 |
Q21Q111 12 |
|
21Q111Q12 + Q21Q111 11Q111Q12 |
Q2211 |
||
|
|
|
|
|
|
|
|
|
(6.18)
(6.19)
(6.20)
(6.21)
Unfortunately, these expressions are not easily interpretable.
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
118 |
||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
^ ^
Figure 6.6: Contours of Joint Distribution of 1 and 2; heteroskedastic case
6.6Uniformly Consistent Residuals*
We have described the least-squares residuals e^i as estimates of the errors ei: Are e^i consistent for ei? Notice that we can write the residual as
e^i = yi xi0 |
|
|
i0 |
|
|
|||||||
= |
ei |
+ xi0 |
b |
|
|
|
||||||
p |
|
i |
|
|
i0 |
b |
|
|
|
|
||
= e |
|
x |
|
|
|
|
|
b |
||||
|
|
|
|
|
|
|
|
: |
Since ! 0 it seems reasonable to guess that e^i will be close to ei if n is large. |
|||||||
Webcan bound the di¤erence in (6.22) using the Schwarz inequality (A.7) to …nd |
|||||||
je^i eij = |
xi0 |
|
kxik |
|
: |
||
|
|
b |
|
|
|
|
|
|
|
|
|
b |
|
|
(6.22)
(6.23)
To bound (6.23) we can use |
|
= Op(n 1=2) from Theorem 6.4.2, but we also need to bound |
|
|
|
|
|
|
b |
|
|
the random variable kxik.
The key is Theorem 2.12.1 which i; or
Applied to (6.23) we obtain
max je^i
1 i n
We have shown the following.
shows that Ekxik4 |
< 1 implies xi = op n1=4 uniformly in |
|||||||||||
n 1=4 max |
|
x |
|
|
p |
|
0: |
|
||||
1 |
i |
|
n k |
|
|
ik ! |
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
eij |
|
p |
|
|
|
|
|
p |
|
|||
|
max |
|
|
|
|
|
||||||
|
1 i n kxik |
|||||||||||
|
|
|
|
|
|
|
|
1=4 |
|
1=2 |
||
|
= |
|
o |
|
|
n |
1=4 |
|
||||
|
|
|
|
|
|
|
O b(n |
) |
||||
|
= |
|
op(n |
|
): |
|
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
119 |
|||||||
|
|
|
||||||
|
Theorem 6.6.1 Under Assumptions 1.5.1 and 6.4.1, uniformly in 1 i n |
|
||||||
|
|
e^i = ei + op(n 1=4): |
|
(6.24) |
|
|||
|
|
|
||||||
What about the squared residuals e^2? Squaring the two sides of (6.24) we obtain |
|
|||||||
|
|
i |
|
|
|
|
|
|
2 = |
2ei + op(n |
1=4) |
2 |
|
|
|
||
|
|
|
|
|||||
|
e^i |
|
1=4 |
|
1=2 |
) |
|
|
|
= ei + 2eiop(n |
) + op(n |
|
|
||||
= |
ei2 + op(1) |
|
|
|
|
|
(6.25) |
|
uniformly in 1 i n; since ei = op n1=4 when Ejeij4 < 1 by Theorem 2.12.1. |
|
|||||||
|
Theorem 6.6.2 Under Assumptions 1.5.1 and 6.4.1, uniformly in 1 i n |
|
||||||
|
|
e^i2 = ei2 + op(1) |
|
|
|
|||
|
|
|
|
|
|
|
|
|
6.7Asymptotic Leverage*
Recall the de…nition of leverage from (4.21)
hii = x0i X0X 1 xi:
These are the diagonal elements of the projection matrix P and appear in the formula for leave- one-out prediction errors and several covariance matrix estimators. We can show that under iid sampling the leverage values are uniformly asymptotically small.
Let min(A) and max(A) denote the smallest and largest eigenvalues of a symmetric square
matrix A; and note that max(A 1) = ( min(A)) 1 : |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||
Since 1 X0X |
p |
Q > 0 then by the CMT, |
min |
1 X0X |
p |
|
|
min |
(Q ) > 0: (The latter is |
||||||||||||||||||||||
n |
! |
xx |
|
|
|
|
|
|
|
|
|
|
|
|
n |
|
|
|
|
|
|
! |
|
xx |
|||||||
positive since Qxx is positive de…nite and thus all its |
|
eigenvalues are positive.) Then by the Trace |
|||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||
Inequality (A.10) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
hii = xi0 X01X 1 xi 1 |
1 |
xixi0! |
|
|
|
|
|
||||||||||||||||||||||
|
|
= tr |
|
|
X0X |
|
|
|
|
|
|
|
|
||||||||||||||||||
|
|
n |
|
|
n |
|
|
|
|
|
|||||||||||||||||||||
|
|
|
|
|
|
1 |
|
|
|
|
|
1 |
|
|
|
|
1 |
|
|
|
|
|
|
||||||||
|
|
max |
|
|
|
X0X |
|
|
!tr |
|
xixi0 |
|
|
|
|||||||||||||||||
|
|
n |
|
|
n |
|
|
||||||||||||||||||||||||
|
|
= |
min |
nX0X |
1 |
|
n kxik2 |
|
|
|
|
|
|||||||||||||||||||
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
( min (Qxx) + op(1)) 1 |
1 |
1maxi n kxik2 : |
(6.26) |
|||||||||||||||||||||||||
|
|
|
|
||||||||||||||||||||||||||||
|
|
|
n |
||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
Theorem 2.12.1 shows that Ekxik2 < 1 implies |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||
|
|
|
|
n 1=2 max |
x |
|
|
|
p |
|
|
0 |
|
|
|
|
|
|
|
||||||||||||
|
|
|
|
|
|
1 |
i |
|
n k |
|
|
ik ! |
|
|
|
|
|
|
|
|
|
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
120 |
|||||||
and thus |
|
|
|
|
|
2 p |
|
|
n 1 max |
x |
ik |
0: |
|
||||
1 |
i |
|
n k |
|
! |
|
|
|
|
|
|
|
|
|
|
|
It follows that (6.26) is op(1); uniformly in i:
Theorem 6.7.1 Under Assumption 1.5.1 and Ekxik2 < 1, uniformly in
1 i n, hii = op(1):
Theorem (6.7.1) implies that under random sampling with …nite variances and large samples, no individual observation should have a large leverage value. Consequently individual observations should not be in‡uential, unless one of these conditions is violated.
6.8Consistent Covariance Matrix Estimation
In Sections 5.7 and 5.8 we introduced estimators of the …nite-sample covariance matrix of the least-squares estimator in the regression model. In this section we show that these estimators are consistent for the asymptotic covariance matrix.
First, consider the covariance matrix estimate constructed under the assumption of homoskedasticity:
|
|
|
|
|
|
|
0 |
|
|
|
|
|
1 |
X0X |
1 |
|
2 |
|
|
|
|
1 2 |
|
|
|
|
|
|
|
||||||||||
|
|
|
|
|
V |
|
= |
|
|
s |
|
|
= Qxx s |
: |
|
|
|
|
|
|
|||||||||||||||||||
|
|
|
|
|
n |
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||
|
p |
|
|
b b |
|
2 |
|
p |
|
|
2 |
|
|
|
|
|
|
|
|
|
|
b |
|
|
|
|
|
|
|
|
|
||||||||
Since Qxx ! Qxx |
(Theorem 6.2.1), s |
|
|
! |
|
(Theorem 6.3.1), and Qxx is invertible (Assumption |
|||||||||||||||||||||||||||||||||
3.16.1), it follows that |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
b |
|
|
|
|
V |
0 |
|
|
|
|
|
|
|
1 |
|
2 |
|
p |
|
|
|
|
|
1 |
|
2 |
|
|
|
0 |
|
|
|
|
|
|
|
||
|
|
|
|
|
= Qxx s ! Qxx |
|
= V |
|
|
|
|
|
|
||||||||||||||||||||||||||
|
0 |
|
|
0 |
|
|
b |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
b b |
|
|
the homoskedastic covariance matrix. |
|
|
|
|
|
|
|||||||||||||||||||||||||||||
so that V is consistent for V ; |
b |
|
|
|
|
|
|
b |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
|
|
Theorem 6.8.1 Under Assumption 1.5.1 and Assumption 3.16.1, |
|
|
|
|
|
||||||||||||||||||||||||||||||||
|
|
V 0 |
p |
V 0 as n |
! 1 |
: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
b b |
! |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Now consider the heteroskedasticity-robust covariance matrix estimators V |
|
; V |
; and |
|
. |
||||||||||||||||||||||||||||||||||
|
V |
||||||||||||||||||||||||||||||||||||||
Writing |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
n |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
b b |
|
e b |
|
|
b |
||
|
|
|
|
|
= |
1 |
|
xix0e^2; |
|
|
|
|
|
|
|
|
|
|
|
(6.27) |
|||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
n |
|
=1 |
|
|
|
i |
i |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
b |
|
|
|
|
|
|
|
|
|
Xi |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
1 |
|
n |
|
|
|
|
|
|
) 2 x |
x0e^2 |
|
|
|
|
|
|
|||||||||||||
|
|
|
|
|
= |
|
|
|
|
|
|
|
(1 |
h |
|
|
|
|
|
|
|
|
|||||||||||||||||
|
|
|
|
|
|
|
n |
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||
|
|
|
|
|
e |
|
|
|
|
|
|
|
|
=1 |
|
|
|
ii |
|
|
|
|
|
i |
i i |
|
|
|
|
|
|
|
|||||||
and |
|
|
|
|
|
|
|
|
|
|
|
|
|
Xi |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
1 n |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
) 1 x |
x0e^2 |
|
|
|
|
|
|
|
|||||||||||||||||
|
|
|
|
|
= |
|
(1 |
h |
|
|
|
|
|
|
|
|
|||||||||||||||||||||||
|
|
|
|
|
|
n |
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
=1 |
|
|
|
ii |
|
|
|
|
|
i |
|
|
i i |
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xi |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
as moment estimators for = E xix0ie2i ; then the covariance matrix estimators are
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
|
121 |
|||||||||||||||||||
|
|
|
|
|
V |
|
|
= Q 1 Q 1; |
|
|
|
|
|
|
|||||||
|
|
|
|
|
b b |
|
bxx b bxx |
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
xx |
|
|
|
xx |
|
|
|
|
|
|
|||
|
|
|
|
|
V |
|
|
= Q 1 Q 1; |
|
|
|
|
|
|
|||||||
and |
|
|
|
|
e b |
|
b |
e b |
|
|
|
|
|
|
|
||||||
|
|
|
|
|
|
V |
|
= Q 1 |
|
Q 1 |
: |
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
xx |
|
|
xx |
|
|
|
|
|
|
|
||||
|
|
|
|
|
consistent for : Combined with the consistency of Q |
|
|||||||||||||||
We can show that , , and are |
xx |
||||||||||||||||||||
|
|
b |
b |
|
b |
|
|
|
|
|
|
|
|||||||||
|
|
|
|
|
|
|
|
|
|
b b |
|
e b |
|
|
b |
|
|
||||
for Qxx and the |
invertibility of Q |
xx |
we …nd that |
V |
|
, |
V |
|
; and V |
|
converge in probability to |
||||||||||
b e |
|
|
|
|
|
|
|
|
|
|
|
b |
|
Qxx1 Qxx1 = V : The complete proof is given in Section 6.18.
Theorem 6.8.2 Under Assumption 1.5.1 and Assumption 6.4.1, as n ! 1;
|
p |
|
p ; |
|
p |
|
|
p |
|
|
p |
V ; and |
|
p |
|
|
; |
|
; V |
|
V ; V |
|
V |
V : |
|||||||
b |
! |
e |
! ! |
b b |
! |
e b |
! |
|
|
b ! |
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6.9Functions of Parameters
Sometimes we are interested in a lower-dimensional function of the parameter vector = ( 1; :::; k): For example, we may be interested in a single coe¢ cient j or a ratio j= l: In these cases we can write the parameter of interest as a function of : Let h : Rk ! Rq denote this function and let
= h( )
denote the parameter of interest. The estimate of is
bb
= h( ):
By the continuous mapping theorem (Theorem 2.9.1) and the fact p we can deduce that
b |
|
|
|
b ! |
||
is consistent for . |
|
|
|
|||
|
Theorem 6.9.1 Under Assumption 1.5.1 and Assumption 3.16.1, if h( ) is con- |
|
||||
|
tinuous at the true value of ; then as |
|
p |
: |
|
|
|
|
|
n ! 1; b ! |
|
|
|
Furthermore, by the Delta Method |
|
|
|
is asymptotically normal. |
||
|
|
(Theorem 2.10.3) we know that b |
Theorem 6.9.2 Asymptotic Distribution of Functions of Parameters
Under Assumption 1.5.1 and Assumption 6.4.1, if h( ) is continuously di¤erentiable at the true value of ; then as n ! 1;
|
|
|
d |
|
|
pn |
|
(6.28) |
|||
! N (0; V ) |
|||||
where |
b |
|
|
|
|
|
|
V = H0 |
V H |
(6.29) |
|
and |
|
@ |
|
|
|
|
|
H = |
h( )0: |
|
|
|
|
@ |
|
||
|
|
|
|
|
CHAPTER 6. ASYMPTOTIC THEORY FOR LEAST SQUARES |
122 |
In many cases, the function h( ) is linear:
h( ) = R0
for some k q matrix R: In this case, H = R. In particular, if R is a “selector matrix”
R = |
I |
(6.30) |
|
0 |
|||
|
|
so that = R0 = 1 for = ( 10 ; 20 )0; then |
|
|
V = I 0 V |
I |
= V 11; |
0 |
where V 11 is given in (6.19). Under homoskedasticity the covariance matrix (6.19) simpli…es to
V 011 = Q1112 2:
We have shown that for the case (6.30) of a subset of coe¢ cients, (6.28) is
|
|
b |
d |
|
|
||
|
|
1 ! N (0; V 11) |
|
pn 1 |
with V 11 given in (6.19).
6.10Asymptotic Standard Errors
How do we estimate the covariance matrix V for ? From (6.29) we see we need estimates of
H and V . We already have an estimate of the |
latter, V |
|
(or V |
|
or V |
|
). To estimate H |
|
we |
||||||||||||||
|
|
|
|
b |
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
use |
|
H |
= |
@ |
|
h( ): b b |
e b |
|
b |
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
Putting the parts together we obtain |
|
c |
|
|
@ |
|
b |
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
V = H V |
|
H |
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
|
b: As |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
as the covariance matrix estimator for |
the primaryb |
justi…cation for V |
|
is the asymptotic |
|||||||||||||||||||
c b c |
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||
approximation (6.28), V is often calledban asymptotic covariance matrix |
estimator. |
|
|
|
|
||||||||||||||||||
|
b |
|
|
|
|
|
|
||||||||||||||||
In particular, whenbh( ) is linear h( ) = R0 then |
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||
|
|
V = R0V R: |
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
When R takes the form of a selector |
matrix as in (6.30) then |
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
|
b |
|
|
|
b b |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
V = V 11 = hV i11 ; |
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
|
|
|
|
|
|
|
|
|
|
b |
|
|
|
|
|
|
|
|
|
|
|
|
|
the upper-left block of the covariance |
matrix estimate V : |
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||
b |
|
b |
|
|
|
|
b |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
^ |
|
|
|
|
|
b |
|
; that is, |
||
When q = 1 (so h( ) is real-valued), the standard |
|
error for is the square root of V |
|
||||||||||||||||||||
|
b b |
|
|
|
|
|
|
|
|
|
|
|
qr
s(^) = n |
1=2 |
b |
1=2 |
c |
0 |
b b c |
||
|
|
|
||||||
|
V = n |
|
H V |
|
H : |
|||
|
|
|
|
|
|
|
|
^
This is known as an asymptotic standard error for s( ).
The estimator V is consistent for V under the conditions of Theorem 6.9.2 since V
|
|
|
and |
|
|
|
|
|
|
|
|
|
b b |
by Theorem 6.8.2, b |
|
|
|
@ |
h( )0 |
p |
@ |
h( )0 = H |
|||||
|
|
|
|
H = |
|
! |
|
|
|||||
|
|
|
|
@ |
@ |
|
|||||||
since |
p |
|
|
c |
h( )0 is |
b |
|
|
|
|
|||
|
|
|
|
|
|
||||||||
b |
! |
|
and the function |
@ |
|
|
|
continuous. |
|
||||
|
|
|
|
|
|
||||||||
|
|
@ |
|
|
|
|
|
|
|
|
p
! V