![](/user_photo/2706_HbeT2.jpg)
Econometrics2011
.pdf![](/html/2706/242/html_UzDPPWtQHs.KRUc/htmlconvd-E0Gqqr81x1.jpg)
CHAPTER 4. THE ALGEBRA OF LEAST SQUARES |
73 |
Alternatively, equation (4.4) writes the projection coe¢ cient as an explicit function of the population moments Qxy and Qxx: Their moment estimators are the sample moments
|
|
1 |
n |
|
|
b |
|
|
|
Xi |
|
|
1 |
|
|||
|
n |
|
|||
Qxy |
= |
|
xiyi |
|
|
b |
|
n |
=1 |
|
|
|
|
|
Xi |
|
|
Q |
xx |
= |
|
xix0 |
: |
|
|
n |
=1 |
|
The moment estimator of replaces the population moments in (4.4) with the sample moments:
= Q 1Q |
|
1 |
1 n |
|
|
||||||
b |
b |
1 |
|
bn |
|
|
|
||||
|
|
xx |
xy |
! |
|
|
X |
! |
|||
|
|
|
|
|
Xi |
|
|
||||
= |
|
|
n |
=1 xixi0 |
|
n |
i=1 xiyi |
||||
= |
|
|
n |
xixi0! 1 |
n |
xiyi |
! |
|
|||
|
|
Xi |
|
X |
|
|
|||||
|
|
=1 |
|
i=1 |
|
|
which is identical with (4.7).
Least Squares Estimation
b
De…nition 4.3.1 The least-squares estimator is
b
= argmin Sn( )
2Rk
where
Sn( ) = n1 Xn yi x0i 2 i=1
and has the solution
= |
n |
xixi0! 1 |
n |
xiyi!: |
b |
Xi |
|
X |
|
|
=1 |
|
i=1 |
|
Adrien-Marie Legendre
The method of least-squares was …rst published in 1805 by the French mathematician Adrien-Marie Legendre (1752-1833). Legendre proposed least-squares as a solution to the algebraic problem of solving a system of equations when the number of equations exceeded the number of unknowns. This was a vexing and common problem in astronomical measurement. As viewed by Legendre, (4.1) is a set of n equations with k unknowns. As the equations cannot be solved exactly, Legendre’s goal was to select to make the set of errors as small as possible. He proposed the sum of squared error criterion, and derived the algebraic solution presented above. As he noted, the …rst-order conditions (4.6) is a system of k equations with k unknowns, which can be solved by “ordinary”methods. Hence the method became known as Ordinary Least Squares and to this day we still use the abbreviation OLS to refer to Legendre’s estimation method.
CHAPTER 4. THE ALGEBRA OF LEAST SQUARES |
74 |
4.4Illustration
We illustrate the least-squares estimator in practice with the data set used to generate the estimates from Chapter 3. This is the March 2009 Current Population Survey, which has extensive information on the U.S. population. This data set is described in more detail in Section ? For this illustration, we use the sub-sample of non-white married non-military female wages earners with 12 years potential work experience. This sub-sample has 61 observations. Let yi be log wages and xi be an intercept and years of education. Then
|
|
|
|
1 n |
|
|
= |
3:025 |
|
|
|
|
|||
|
|
|
|
|
X |
|
|
|
|
|
|
|
|||
|
|
|
|
n |
i=1 xiyi |
47:447 |
|
|
|||||||
and |
1 |
n |
|
|
|
|
1 |
|
15:426 |
|
|||||
|
|
|
X |
|
|
|
|
|
|
|
|
: |
|
||
|
|
n |
i=1 xixi0 = |
15:426 |
243 |
|
|||||||||
Thus |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
b |
|
|
|
1 |
|
15:426 |
1 |
|
|
|
3:025 |
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
= |
15:426 |
|
243 |
|
47:447 |
|||||||||
|
= |
|
0:626 |
: |
|
|
|
|
|
|
|
(4.8) |
|||
|
0:156 |
|
|
|
|
|
|
|
We often write the estimated equation using the format
\ |
(4.9) |
log(W age) = 0:626 + 0:156 education: |
An interpretation of the estimated equation is that each year of education is associated with an 16% increase in mean wages.
Equation (4.9) is called a bivariate regression as there are only two variables. A multivariate regression has two or more regressors, and allows a more detailed investigation. Let’s redo the example, but now including all levels of experience. This expanded sample includes 2454 observations. Including as regressors years of experience and its square (experience2=100) (we divide by 100 to simplify reporting), we obtain the estimates
\ |
2 |
(4.10) |
log(W age) = 1:06 + 0:116 education + 0:010 experience 0:014 experience =100: |
These estimates suggest a 12% increase in mean wages per year of education, holding experience constant.
4.5Least Squares Residuals
As a by-product of estimation, we de…ne the …tted or predicted value
0 b
y^i = xi
and the residual
0 b
e^i = yi y^i = yi xi :
Note that yi = y^i + e^i and
0 b
yi = xi + e^i:
(4.11)
(4.12)
We make a distinction between the error ei and the residual e^i: The error ei is unobservable while the residual e^i is a by-product of estimation. These two variables are frequently mislabeled, which can cause confusion.
![](/html/2706/242/html_UzDPPWtQHs.KRUc/htmlconvd-E0Gqqr83x1.jpg)
CHAPTER 4. THE ALGEBRA OF LEAST SQUARES |
|
|
|
75 |
||||
Equation (4.6) implies that |
|
n |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xi |
|
|
|
|
|
(4.13) |
|
|
xie^i = 0: |
|
|
|
|||
|
|
=1 |
|
|
|
|
|
|
To see this by a direct calculation, using (4.11) and (4.7), |
|
|
|
|
||||
n |
n |
|
|
|
|
|
|
|
X |
X |
n |
b |
|
|
|
|
|
|
n |
|
|
|
|
|
||
i=1 xie^i = |
i=1 xi yi xi0 |
|
|
|
|
|
|
|
|
Xi |
X |
|
|
|
|
|
|
= |
|
xiyi xixi0 |
|
|
|
|
|
|
|
=1 |
i=1 |
b |
|
xixi0! |
|
|
! |
|
n |
n |
n |
1 |
n |
|||
= |
Xi |
xiyi xixi0 |
X |
|
xiyi |
|||
|
X |
|
|
|
X |
|
||
|
=1 |
i=1 |
|
i=1 |
|
|
i=1 |
|
|
n |
n |
|
|
|
|
|
|
= |
Xxiyi Xxiyi |
|
|
|
|
|
||
|
i=1 |
i=1 |
|
|
|
|
|
|
= |
0: |
|
|
|
|
|
|
|
When xi contains a constant, an implication of (4.13) is
1 Xn
n i=1
e^i = 0:
Thus the residuals have a sample mean of zero and the sample correlation between the regressors and the residual is zero. These are algebraic results, and hold true for all linear regression estimates.
Given the residuals, we can construct an estimator for 2 = Ee2i :
|
1 |
Xi |
|
|
|
^2 = |
|
|
e^2 |
: |
(4.14) |
|
n |
|
i |
|
|
|
=1 |
|
|
|
4.6Model in Matrix Notation
For many purposes, including computation, it is convenient to write the model and statistics in matrix notation. The linear equation (3.24) is a system of n equations, one for each observation. We can stack these n equations together as
|
|
|
|
y1 |
= |
x10 + e1 |
|
|
|
|
||
|
|
|
|
y2 |
= |
x20 + e2 |
|
|
|
|
||
|
|
|
|
|
|
. |
|
|
|
|
|
|
|
|
|
|
|
|
. |
|
|
|
|
|
|
|
|
|
|
|
|
. |
|
|
|
|
|
|
|
|
|
|
yn = xn0 + en: |
|
|
|
|
||||
Now de…ne |
0 y2 |
1 |
|
|
0 x20 |
1 |
0 e2 |
1 |
|
|||
|
y1 |
|
|
|
x10 |
|
e1 |
|
|
|||
|
B y |
n |
C |
|
|
B x |
n0 |
C |
B e |
n |
C |
: |
|
B |
C |
|
|
B |
C |
B |
C |
||||
|
y = B ... |
C |
; X = B ... |
C; |
e = B ... |
C |
||||||
|
@ |
|
A |
|
|
@ |
|
A |
@ |
|
A |
|
Observe that y and e are n 1 vectors, and X is an n k matrix. Then the system of n equations can be compactly written in the single equation
y = X + e: |
(4.15) |
![](/html/2706/242/html_UzDPPWtQHs.KRUc/htmlconvd-E0Gqqr84x1.jpg)
CHAPTER 4. THE ALGEBRA OF LEAST SQUARES |
76 |
Sample sums can also be written in matrix notation. For example
Xn
xix0i = X0X
i=1
Xn
xiyi = X0y:
i=1
Therefore |
X0X 1 X0y : |
= |
|
b |
|
The matrix version of (4.12) and estimated version of (4.15) is
b
y = X + e^;
or equivalently the residual vector is
b e^ = y X :
Using the residual vector, we can write (4.13) as
X0e^ = 0
and the error variance estimator (4.14) as
^2 = n 1e^0e^
(4.16)
(4.18)
(4.19)
Using matrix notation we have simple expressions for most estimators. This is particularly convenient for computer programming, as most languages allow matrix notation and manipulation.
Important Matrix Expressions
y |
= |
X + e |
|
||
b |
|
|
|
1 |
|
|
= |
X0X |
X0y |
||
e^ |
= |
y |
X |
|
|
X0e^ |
= |
0 |
b |
|
|
^2 = |
n 1e^0e^: |
|
Early Use of Matrices
The earliest known treatment of the use of matrix methods to solve simultaneous systems is found in Chapter 8 of the Chinese text The Nine Chapters on the Mathematical Art, written by several generations of scholars from the 10th to 2nd century BCE.
![](/html/2706/242/html_UzDPPWtQHs.KRUc/htmlconvd-E0Gqqr85x1.jpg)
CHAPTER 4. THE ALGEBRA OF LEAST SQUARES |
77 |
4.7Projection Matrix
De…ne the matrix |
P = X X0X 1 X0: |
Observe that |
|
|
P X = X X0X 1 X0X = X: |
This is a property of a projection matrix. More generally, for any matrix Z which can be written as Z = X for some matrix (we say that Z lies in the range space of X); then
P Z = P X = X X0X 1 X0X = X = Z:
As an important example, if we partition the matrix X into two matrices X1 and X2 so that
X = [X1 X2] ;
then P X1 = X1.
The matrix P is symmetric and idempotent1. To see that it is symmetric,
P 0 = X X0X 1 X0 0
=X0 0 X0X 1 0 (X)0
=X X0X 0 1 X0
=X (X)0 X0 0 1 X0
=P :
To establish that it is idempotent, the fact that P X = X implies that
P P = P X X0X 1 X0
=X X0X 1 X0
=P :
The matrix P has the property that it creates the …tted values in a least-squares regression:
P y = X X0X 1 X0y = X = y^:
Because of this property, P is also known as the “hat |
matrix”. |
b |
|
Another useful property is that the trace of P equals the number of columns of X |
|
tr P = k: |
(4.20) |
Indeed, |
|
tr P = tr X X0X 1 X0
=tr X0X 1 X0X
=tr (Ik)
=k:
1 A matrix P is symmetric if P 0 = P : A matrix P is idempotent if P P = P : See Appendix A.8.
CHAPTER 4. THE ALGEBRA OF LEAST SQUARES |
|
78 |
|
(See Appendix A.4 for de…nition and properties of the trace operator.) |
|
||
The i’th diagonal element of P = X (X0X) 1 X0 |
is |
|
|
hii = x0 X0X 1 xi |
(4.21) |
||
which is called the leverage of the i’th observation.i |
The |
hii take values in [0; 1] and sum to k |
|
n |
|
|
|
Xi |
|
|
(4.22) |
hii = k |
|
||
=1 |
|
|
|
(See Exercise 4.8). |
|
|
|
4.8 Orthogonal Projection
De…ne
M= In P
= In X X0X 1 X0
where In is the n n identity matrix. Note that
MX = (In P ) X = X P X = X X = 0:
Thus M and X are orthogonal. We call M an orthogonal projection matrix or an annihilator matrix due to the property that for any matrix Z in the range space of X then
MZ = Z P Z = 0:
For example, MX1 = 0 for any subcomponent X1 of X, and MP = 0:
The orthogonal projection matrix M has many similar properties with P , including that M is
symmetric (M0 = M) and idempotent (MM = M). Similarly to (4.20) we can calculate |
|
tr M = n k: |
(4.23) |
While P creates …tted values, M creates least-squares residuals:
My = |
y P y |
= y |
= e^: |
(4.24) |
|
|
Xb |
|
Another way of writing (4.24) is
y = P y + My = y^ + e^:
This decomposition is orthogonal, that is
y^0e^ = (P y)0 (My) = y0P My = 0:
We can also use (4.24) to write an alternative expression for the residual vector. Substituting
y = X + e into e^ = My and using MX = 0 we …nd |
|
e^ = M (X + e) = Me: |
(4.25) |
which is free of dependence on the regression coe¢ cient . |
|
Another useful application of (4.24) is to the error variance estimator (4.19) |
|
^2 = n 1e^0e^
=n 1y0MMy
=n 1y0My;
the …nal equality since MM = M. Similarly using (4.25) we …nd
^2 = n 1e0Me:
CHAPTER 4. THE ALGEBRA OF LEAST SQUARES |
79 |
4.9 Regression Components
Partition |
|
|
|
|
X = [X1 |
X2 |
] |
|
|
and |
|
|
|
|
= |
1 |
: |
|
|
2 |
|
|
||
Then the regression model can be rewritten as |
|
|
|
|
y = X1 1 + X2 2 |
+ e: |
(4.26) |
The OLS estimator of = ( 01; 02)0 is obtained by regression of y on X = [X1 X2] and can be written as
|
y = X + e^ = X1 1 + X2 2 + e^: |
(4.27) |
||||
|
expressions for |
|
and |
: |
|
|
We are interested in algebraic |
b |
1 |
b |
2 |
b |
|
The algebra for the estimator is identical as that for the population coe¢ cients as presented in |
||||||||||||||||||||
Section 3.19. |
|
|
|
|
|
|
|
b |
|
|
|
|
b |
|
|
|
|
|||
Q |
and Q |
|
as |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Partition bxx |
bxy |
|
|
2 b |
|
b |
3 |
|
2 |
1 |
X10 X1 |
1 |
X10 X2 |
3 |
||||||
|
|
xx |
|
|
|
1 |
1 |
|||||||||||||
|
Q |
|
|
= |
Q11 |
Q12 |
= |
|
n |
|
n |
|||||||||
|
b |
|
|
|
4 |
|
|
5 |
|
6 |
n |
2 |
n |
2 |
7 |
|||||
|
|
|
|
|
Q21 |
Q22 |
|
6 |
|
|
|
X0 X1 |
|
|
X0 X2 |
7 |
||||
|
|
|
|
|
|
|
|
|
|
|
||||||||||
and similarly Qxy |
|
|
|
|
b |
|
b |
|
|
4 |
2 |
1 |
|
3 |
|
5 |
||||
|
|
|
|
|
|
|
2 |
b |
3 |
|
|
|
|
|||||||
|
|
|
|
|
xy |
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
1 |
|
|
|
||||||||||
|
|
|
|
|
Q |
= |
4 |
Q1y |
5 |
= |
|
nX10 y |
: |
|
||||||
|
|
|
|
|
b |
|
b |
|
6 |
|
n 2 |
7 |
|
|
||||||
|
|
|
|
|
|
|
|
|
|
4 |
|
|
|
5 |
|
|
||||
|
|
|
|
|
|
|
|
Q2y |
|
|
6 |
|
|
X0 y |
7 |
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
By the partitioned
Qxx1 |
|
2 |
Q11 |
= |
b |
||
b |
|
4 |
Q21 |
where Q11 2 =bQ11 |
|||
Thus |
|
|
b |
b |
|
|
matrix inversion formula (A.4) |
|
|
1b |
|
|
|
|||||||||||
b |
|
3 |
|
= 2 |
b21 |
b22 |
3 = 2 |
|
|
|
1 |
||||||
Q |
12 |
|
1 |
|
Q11 |
Q12 |
|
|
|
Q 1 |
|
|
|||||
|
|
5 |
|
def |
6 |
|
|
|
7 6 |
|
|
|
|
|
|
|
|
Q22 |
|
|
Q Q |
|
|
Q22 |
|
1Q21Q11 |
|||||||||
b |
|
|
|
4 |
|
|
|
5 4 |
|
|
|
|
|
|
|
||
|
|
|
1 |
|
and Q |
b |
= Q |
Qb |
|
|
1 |
Q |
|
: |
|||
Q Q |
Q |
|
b |
|
|
Q |
12 |
||||||||||
b |
12 b |
22 b21 |
b22 1 |
|
b22 b21 b11 |
b |
|
|
|
b |
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
||
= |
|
2 ! |
|
|
|
|
|||
b |
" |
b |
|
Q1112 |
Q 1 |
||||
= |
|
Q 1 Q |
|
||||||
|
|
|
|
b |
|
b |
|
b |
|
|
|
|
22b1 |
|
21 |
11 |
|||
|
|
Q 1 |
Q |
1y |
2 |
||||
= |
|
|
|
11 |
2 |
|
|||
|
Q 1 |
Q |
|
|
! |
||||
|
|
b |
|
b |
2y 1 |
||||
|
|
b |
22 1 b |
Q2211 |
|
#" |
Q1112Q12Q221 |
|
|
b b |
b |
|
b |
|
|
b |
|
b1 |
b |
3 |
|
Q1112Q12Q221 |
(4.28) |
||||
|
b |
|
|
5 |
|
|
|
|
7 |
|
|
|
Q22 1 |
|
|
#
b
Q1y
b
Q2y
Now
Q11 2 = Q11 Q12Q221Q21 |
|
|
|
|
|
1 |
|
|
|
|||||||
|
|
1 |
|
|
1 |
|
1 |
|
|
|
1 |
|
|
|||
b |
= |
b |
X10 |
Xb1 |
b |
Xb10 X2 |
|
|
X20 |
X2 |
|
|
X20 |
X1 |
||
|
|
|
|
|
|
|||||||||||
|
n |
|
|
|
||||||||||||
|
|
n |
|
|
n |
|
|
n |
|
|||||||
|
= |
1 |
X0 |
M2X1 |
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
n 1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
![](/html/2706/242/html_UzDPPWtQHs.KRUc/htmlconvd-E0Gqqr88x1.jpg)
CHAPTER 4. THE ALGEBRA OF LEAST SQUARES |
|
|
|
|
|
|
|
|
where |
|
|
|
|
|
|
|
|
M2 = In X2 X20 X2 1 X120 |
|
|
|
|
||||
is the orthogonal projection matrix for X |
: Similarly Q |
= |
|
X0 |
M |
X |
|
where |
|
|
|||||||
2 |
b22 1 |
1 n |
2 |
1 |
|
2 |
|
|
M1 = In X1 X10 X1 |
X10 |
|
|
|
|
|||
is the orthogonal projection matrix for X1 |
: Also |
|
|
|
|
|
|
|
b
Q1y 2
b |
|
|
|
|
|
and Q |
= |
|
1 |
X0 |
M1y: |
|
|
||||
2y 1 |
|
n 2 |
|
Therefore
and
= |
Q1y Q12Q221Q2y |
|
|
1 1 |
|
|||||||||||
|
1 |
|
1 |
|
X |
|
1 X |
X |
|
|
||||||
= |
|
b |
X0 |
y b |
|
bX0 |
b |
|
|
0 |
|
|
|
|
X0 |
y |
|
|
|
|
|
|
|||||||||||
|
|
n 1 |
n 1 |
|
2 n 2 2 |
|
n 2 |
|
=n1 X01M2y
1 = X10 |
M2X1 1 X10 M2y |
b |
|
2 = X20 |
M1X2 1 X20 M1y |
: |
b |
|
|
These are algebraic expressions for the sub-coe¢ cient estimates from (4.27).
80
(4.29)
(4.30)
4.10Residual Regression
As …rst recognized by Ragnar Frisch, expressions (4.29) and (4.30) can be used to show that the least-squares estimators 1 and 2 can be found by a two-step regression procedure.
|
idempotent, M |
|
|
= M |
|
M |
|
|
and thus |
|
||||||||
Take (4.30). Since M1 isb |
|
b |
|
1 |
|
1 |
|
|
|
1 |
|
|
|
|
||||
= X0 M1X2 |
1 X0 M1y |
|
|
|||||||||||||||
b |
2 |
|
2 |
|
|
|
|
|
|
|
|
2 |
|
2 |
1 |
|
|
|
|
|
2 |
|
|
1 |
0 |
|
|
|
|
|
|
||||||
|
|
|
0 |
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
1 |
|
|
X0 |
1 |
|
||||
|
|
= |
X0 M1M1X2 |
|
|
|
|
y |
||||||||||
|
|
f f |
|
|
|
f |
|
|
|
|
|
|
|
M |
M |
|||
|
|
= |
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
where |
|
X2X2 X2e~1 |
|
|
|
|
||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
X |
|
= M |
X |
2 |
|
|
|
|
|
|
||||
|
|
|
|
f2 |
|
|
1 |
|
|
|
|
|
|
|
|
and
e~1 = M1y:
b
Thus the coe¢ cient estimate 2 is algebraically equal to the least-squares regression of e~1 on
f
X2: Notice that these two are y and X2, respectively, premultiplied by M1. But we know that multiplication by M1 is equivalent to creating least-squares residuals. Therefore e~1 is simply the
f
least-squares residual from a regression of y on X1; and the columns of X2 are the least-squares residuals from the regressions of the columns of X2 on X1:
We have proven the following theorem.
![](/html/2706/242/html_UzDPPWtQHs.KRUc/htmlconvd-E0Gqqr89x1.jpg)
CHAPTER 4. THE ALGEBRA OF LEAST SQUARES |
81 |
Theorem 4.10.1 Frisch-Waugh-Lovell
In the model (4.26), the OLS estimator of 2 and the OLS residuals e^ may be equivalently computed by either the OLS regression (4.27) or via the following algorithm:
1. |
Regress y on X1; obtain residuals e~1; |
|
|
|
2. |
Regress X2 on X1; obtain residuals X2; |
|
|
|
|
f |
b |
|
and residuals e^: |
3. |
Regress e~1 on X2; obtain OLS |
estimates |
2 |
|
f |
|
In some contexts, the FWL theorem can be used to speed computation, but in most cases there is little computational advantage to using the two-step algorithm. Rather, the primary use is theoretical.
A common application of the FWL theorem, which you may have seen in an introductory econometrics course, is the demeaning formula for regression. Partition X = [X1 X2] where X1 is the vector of observed regressors and X2 = is a vector of ones . In this case,
Observe that |
|
|
|
|
M2 = I 0 1 0: |
|
||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
X |
= |
M X1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
f1 |
= X12 0 |
|
|
1 0X1 |
|
|||||||||||||||||||
|
|
|
|
|
|
|
|
|
= |
X1 |
X |
1 |
|
|
|
|
|
|
|
|
|
|||||||||
and |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
y~ = M2y |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
= |
y |
|
|
|
|
1 |
y |
|
|||||||||||||
|
|
|
|
|
|
|
|
|
= |
y |
y; |
0 |
|
|
|
|
0 |
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
which are “demeaned”. The FWL theorem says that |
|
|
is the OLS estimate from a regression of |
|||||||||||||||||||||||||||
yi |
|
on x1i |
|
1 : |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
b |
1 |
|
|
|
|
|
|
|
||
y |
x |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
1 = |
n |
(x1i |
|
1) (x1i |
|
|
1)0! 1 |
|
|
|
|
n |
(x1i |
|
1) (yi |
|
)! |
: |
|||||||||
|
|
|
Xi |
x |
x |
|
|
|
|
|
x |
y |
||||||||||||||||||
|
|
|
b |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
X |
|
||||||||
|
|
|
|
|
=1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
i=1 |
|
|
|
|
|
|
Thus the OLS estimator for the slope coe¢ cients is a regression with demeaned data.
Ragnar Frisch
Ragnar Frisch (1895-1973) was co-winner with Jan Tinbergen of the …rst Nobel Memorial Prize in Economic Sciences in 1969 for their work in developing and applying dynamic models for the analysis of economic problems. Frisch made a number of foundational contributions to modern economics beyond the Frisch-Waugh-Lovell Theorem, including formalizing consumer theory, production theory, and business cycle theory.
![](/html/2706/242/html_UzDPPWtQHs.KRUc/htmlconvd-E0Gqqr90x1.jpg)
CHAPTER 4. THE ALGEBRA OF LEAST SQUARES |
82 |
4.11Prediction Errors
The least-squares residual e^i are not true prediction errors, as they are constructed based on the full sample including yi. A proper prediction for yi should be based on estimates constructed only using the other observations. We can do this by de…ning the leave-one-out OLS estimator of as that obtained from the sample of n 1 observations excluding the i’th observation:
Here, X( i) and y( i) value for yi is
( i) = |
0 |
1 |
|
X |
xjxj0 |
1 1 0 |
1 |
|
xjyj |
1 |
||
n |
1 |
n |
1 |
|||||||||
b |
@ |
|
|
|
1A @ |
|
X |
A |
||||
|
|
|
j=i |
|
|
|
|
|
j=i |
|
||
|
X(0 |
|
|
6 |
|
|
|
|
|
6 |
|
|
= |
i)X( i) |
|
X( i)y( i): |
|
(4.31) |
are the data matrices omitting the i’th row. The leave-one-out predicted
0 b
y~i = xi ( i);
and the leave-one-out residual or prediction error is
e~i = yi y~i:
b
A convenient alternative expression for ( i) (derived below) is
|
|
( i) = (1 hii) 1 X0X 1 xie^i |
|
||||||
where hii are the leverage |
values as de…ned in (4.21). |
|
|
|
|
|
|||
|
b |
b |
|
|
|
|
|
|
|
Using (4.32) we can simplify the expression for the prediction error: |
|||||||||
e~i |
= yi xi0 ( i) |
|
1 |
|
X0X |
1 |
|
||
|
|
|
^ |
xi0 |
|
||||
|
|
|
|
|
|
xie^i |
|||
|
|
= yi xi0 b + (1 hii) |
|
|
|
= e^i + (1 hii) 1 hiie^i = (1 hii) 1 e^i:
(4.32)
(4.33)
A convenient feature of this expression is that it shows that computation of e~i is based on a simple linear operation, and does not really require n separate estimations.
One use of the prediction errors is to estimate the out-of-sample mean squared error
~2 = |
1 |
n |
e~i2 |
|
|
Xi |
|||
|
|
|
||
n =1 |
|
|||
1 |
n |
|
||
|
|
Xi |
(1 hii) 2 e^i2: |
|
= n |
||||
=1 |
||||
|
|
|
p
This is also known as the mean squared prediction error. Its square root ~ = ~2 is the prediction standard error.
Proof of Equation (4.32). The Sherman–Morrison formula (A.3) from Appendix A.5 states that for nonsingular A and vector b
A bb0 1 = A 1 + 1 b0A 1b 1 A 1bb0A 1:
This implies
X0X xix0i 1 = X0X 1 + (1 hii) 1 X0X 1 xix0i X0X 1