Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский политехнический университет Петра Великого (бывш. СПбГПУ)

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Econometrics2011

.pdf

Скачиваний:

Добавлен:

21.03.2016

Размер:

1.77 Mб

Скачать

☆

<<< < Предыдущая 1 2 3 4 5 6 7 89 / 309 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES

Alternatively, equation (4.4) writes the projection coe¢ cient as an explicit function of the population moments Qxy and Qxx: Their moment estimators are the sample moments

		1	n
b			Xi
		1	Xi
		1	n
Qxy		=	xiyi
b		n	=1
b			Xi
Q	xx	=	xix0	:
		n	=1

The moment estimator of replaces the population moments in (4.4) with the sample moments:


= Q 1Q							1	1 n
b	b	1			bn		1	1 n
		xx			xy	!			X		!
					Xi				X
=			n	=1 xixi0				n	i=1 xiyi
=			n		xixi0! 1		n	xiyi		!
		Xi					X
		=1					i=1

which is identical with (4.7).

Least Squares Estimation

De…nition 4.3.1 The least-squares estimator is

= argmin Sn( )

2Rk

where

Sn( ) = n1 Xn yi x0i 2 i=1

and has the solution

=	n	xixi0! 1	n	xiyi!:
b	Xi		X
	=1		i=1

Adrien-Marie Legendre

The method of least-squares was …rst published in 1805 by the French mathematician Adrien-Marie Legendre (1752-1833). Legendre proposed least-squares as a solution to the algebraic problem of solving a system of equations when the number of equations exceeded the number of unknowns. This was a vexing and common problem in astronomical measurement. As viewed by Legendre, (4.1) is a set of n equations with k unknowns. As the equations cannot be solved exactly, Legendre’s goal was to select to make the set of errors as small as possible. He proposed the sum of squared error criterion, and derived the algebraic solution presented above. As he noted, the …rst-order conditions (4.6) is a system of k equations with k unknowns, which can be solved by “ordinary”methods. Hence the method became known as Ordinary Least Squares and to this day we still use the abbreviation OLS to refer to Legendre’s estimation method.

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES

4.4Illustration

We illustrate the least-squares estimator in practice with the data set used to generate the estimates from Chapter 3. This is the March 2009 Current Population Survey, which has extensive information on the U.S. population. This data set is described in more detail in Section ? For this illustration, we use the sub-sample of non-white married non-military female wages earners with 12 years potential work experience. This sub-sample has 61 observations. Let yi be log wages and xi be an intercept and years of education. Then

1 n

3:025

i=1 xiyi

47:447

and

15:426

i=1 xixi0 =

15:426

243

Thus

15:426

3:025

15:426

243

47:447

0:626

(4.8)

0:156

We often write the estimated equation using the format

\	(4.9)
log(W age) = 0:626 + 0:156 education:

An interpretation of the estimated equation is that each year of education is associated with an 16% increase in mean wages.

Equation (4.9) is called a bivariate regression as there are only two variables. A multivariate regression has two or more regressors, and allows a more detailed investigation. Let’s redo the example, but now including all levels of experience. This expanded sample includes 2454 observations. Including as regressors years of experience and its square (experience2=100) (we divide by 100 to simplify reporting), we obtain the estimates

\	2	(4.10)
log(W age) = 1:06 + 0:116 education + 0:010 experience 0:014 experience =100:

These estimates suggest a 12% increase in mean wages per year of education, holding experience constant.

4.5Least Squares Residuals

As a by-product of estimation, we de…ne the …tted or predicted value

0 b

y^i = xi

and the residual

0 b

e^i = yi y^i = yi xi :

Note that yi = y^i + e^i and

0 b

yi = xi + e^i:

(4.11)

(4.12)

We make a distinction between the error ei and the residual e^i: The error ei is unobservable while the residual e^i is a by-product of estimation. These two variables are frequently mislabeled, which can cause confusion.

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES								75
Equation (4.6) implies that		n
		n
		Xi						(4.13)
		xie^i = 0:						(4.13)
		=1
To see this by a direct calculation, using (4.11) and (4.7),
n	n
X	X	n	b
	n	n	b
i=1 xie^i =	i=1 xi yi xi0
	Xi	X
=		xiyi xixi0
	=1	i=1	b		xixi0!			!
	n	n	b	n		1	n
=	Xi	xiyi xixi0		X			xiyi
	Xi	X		X			X
	=1	i=1		i=1			i=1
	n	n
=	Xxiyi Xxiyi
	i=1	i=1
=	0:

When xi contains a constant, an implication of (4.13) is

1 Xn

n i=1

e^i = 0:

Thus the residuals have a sample mean of zero and the sample correlation between the regressors and the residual is zero. These are algebraic results, and hold true for all linear regression estimates.

Given the residuals, we can construct an estimator for 2 = Ee2i :

	1	Xi
^2 =			e^2	:	(4.14)
	n		i
	n	=1

4.6Model in Matrix Notation

For many purposes, including computation, it is convenient to write the model and statistics in matrix notation. The linear equation (3.24) is a system of n equations, one for each observation. We can stack these n equations together as

x10 + e1

x20 + e2

yn = xn0 + en:

Now de…ne

0 y2

0 x20

0 e2

x10

B y

B x

B e

y = B ...

; X = B ...

e = B ...

Observe that y and e are n 1 vectors, and X is an n k matrix. Then the system of n equations can be compactly written in the single equation

y = X + e:

(4.15)

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES

Sample sums can also be written in matrix notation. For example

xix0i = X0X

i=1

xiyi = X0y:

i=1

Therefore	X0X 1 X0y :
=	X0X 1 X0y :
b

The matrix version of (4.12) and estimated version of (4.15) is

y = X + e^;

or equivalently the residual vector is

b e^ = y X :

Using the residual vector, we can write (4.13) as

X0e^ = 0

and the error variance estimator (4.14) as

^2 = n 1e^0e^

(4.16)

(4.18)

(4.19)

Using matrix notation we have simple expressions for most estimators. This is particularly convenient for computer programming, as most languages allow matrix notation and manipulation.

Important Matrix Expressions

y	=	X + e
b				1
	=	X0X			X0y
e^	=	y	X
X0e^	=	0	b
^2 =		n 1e^0e^:

Early Use of Matrices

The earliest known treatment of the use of matrix methods to solve simultaneous systems is found in Chapter 8 of the Chinese text The Nine Chapters on the Mathematical Art, written by several generations of scholars from the 10th to 2nd century BCE.

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES

4.7Projection Matrix

De…ne the matrix	P = X X0X 1 X0:
Observe that	P = X X0X 1 X0:
	P X = X X0X 1 X0X = X:

This is a property of a projection matrix. More generally, for any matrix Z which can be written as Z = X for some matrix (we say that Z lies in the range space of X); then

P Z = P X = X X0X 1 X0X = X = Z:

As an important example, if we partition the matrix X into two matrices X1 and X2 so that

X = [X1 X2] ;

then P X1 = X1.

The matrix P is symmetric and idempotent1. To see that it is symmetric,

P 0 = X X0X 1 X0 0

=X0 0 X0X 1 0 (X)0

=X X0X 0 1 X0

=X (X)0 X0 0 1 X0

=P :

To establish that it is idempotent, the fact that P X = X implies that

P P = P X X0X 1 X0

=X X0X 1 X0

=P :

The matrix P has the property that it creates the …tted values in a least-squares regression:

P y = X X0X 1 X0y = X = y^:

Because of this property, P is also known as the “hat	matrix”.
Because of this property, P is also known as the “hat	b
Another useful property is that the trace of P equals the number of columns of X
tr P = k:	(4.20)
Indeed,

tr P = tr X X0X 1 X0

=tr X0X 1 X0X

=tr (Ik)

=k:

1 A matrix P is symmetric if P 0 = P : A matrix P is idempotent if P P = P : See Appendix A.8.

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES			78
(See Appendix A.4 for de…nition and properties of the trace operator.)
The i’th diagonal element of P = X (X0X) 1 X0	is
hii = x0 X0X 1 xi			(4.21)
which is called the leverage of the i’th observation.i	The	hii take values in [0; 1] and sum to k
n
Xi			(4.22)
hii = k			(4.22)
=1
(See Exercise 4.8).

4.8 Orthogonal Projection

De…ne

M= In P

= In X X0X 1 X0

where In is the n n identity matrix. Note that

MX = (In P ) X = X P X = X X = 0:

Thus M and X are orthogonal. We call M an orthogonal projection matrix or an annihilator matrix due to the property that for any matrix Z in the range space of X then

MZ = Z P Z = 0:

For example, MX1 = 0 for any subcomponent X1 of X, and MP = 0:

The orthogonal projection matrix M has many similar properties with P , including that M is

symmetric (M0 = M) and idempotent (MM = M). Similarly to (4.20) we can calculate
tr M = n k:	(4.23)

While P creates …tted values, M creates least-squares residuals:

My =	y P y	= y	= e^:	(4.24)
	y P y		Xb

Another way of writing (4.24) is

y = P y + My = y^ + e^:

This decomposition is orthogonal, that is

y^0e^ = (P y)0 (My) = y0P My = 0:

We can also use (4.24) to write an alternative expression for the residual vector. Substituting

y = X + e into e^ = My and using MX = 0 we …nd
e^ = M (X + e) = Me:	(4.25)
which is free of dependence on the regression coe¢ cient .
Another useful application of (4.24) is to the error variance estimator (4.19)

^2 = n 1e^0e^

=n 1y0MMy

=n 1y0My;

the …nal equality since MM = M. Similarly using (4.25) we …nd

^2 = n 1e0Me:

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES

4.9 Regression Components

Partition
X = [X1		X2	]
and
=	1	:
=	2	:
Then the regression model can be rewritten as
y = X1 1 + X2 2			+ e:	(4.26)

The OLS estimator of = ( 01; 02)0 is obtained by regression of y on X = [X1 X2] and can be written as

	y = X + e^ = X1 1 + X2 2 + e^:					(4.27)
	expressions for		and		:
We are interested in algebraic	b	1	b	2	b

The algebra for the estimator is identical as that for the population coe¢ cients as presented in
Section 3.19.								b						b
Q	and Q		as
Partition bxx	bxy				2 b		b	3		2	1		X10 X1				1		X10 X2	3
		xx			2 b		b	3			1						1
	Q			=	Q11		Q12		=			n						n
	b				4			5		6	n			2			n		2	7
					Q21		Q22			6				X0 X1					X0 X2	7
					Q21		Q22			6				X0 X1					X0 X2	7
and similarly Qxy					b		b			4	2			1			3			5
							2	b	3					1
					xy
					xy									1
					Q	=	4	Q1y	5	=					nX10 y				:
					b		4	b	5		6				n 2		7
								b			4						5
								Q2y			6					X0 y	7
								Q2y			6					X0 y	7

By the partitioned

Qxx1		2	Q11
Qxx1	=	2	b
b		4	Q21
where Q11 2 =bQ11
Thus			b
b			b

matrix inversion formula (A.4)

= 2

b21

b22

3 = 2

Q11

Q12

Q 1

def

7 6

Q22

Q Q

Q22

1Q21Q11

5 4

and Q

= Q

Q Q

12 b

22 b21

b22 1

b22 b21 b11

		b
		1
=		2 !
b	"	b			Q1112				Q 1
=			Q 1 Q
				b		b			b
				22b1			21		11
		Q 1			Q		1y	2
=				11	2
		Q 1			Q				!
		b			b		2y 1
		b		22 1 b

Q2211		#"
Q1112Q12Q221
b b	b
b

b		b1	b	3
Q1112Q12Q221				3	(4.28)
	b			5
	b			7
	Q22 1			7

Q1y

Q2y

Now

Q11 2 = Q11 Q12Q221Q21

X10

Xb1

Xb10 X2

X20

M2X1

n 1

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES
where
M2 = In X2 X20 X2 1 X120
is the orthogonal projection matrix for X	: Similarly Q	=	X0	M	X		where
is the orthogonal projection matrix for X	: Similarly Q	=	X0	M	X		where
2	b22 1	1 n	2	1		2
M1 = In X1 X10 X1		X10
is the orthogonal projection matrix for X1	: Also

Q1y 2

b
and Q	=		1	X0	M1y:
and Q	=			X0	M1y:
2y 1		n 2

Therefore

and

Q1y Q12Q221Q2y

1 1

1 X

y b

bX0

n 1

2 n 2 2

n 2

=n1 X01M2y

1 = X10	M2X1 1 X10 M2y
b

2 = X20	M1X2 1 X20 M1y	:
b

These are algebraic expressions for the sub-coe¢ cient estimates from (4.27).

(4.29)

(4.30)

4.10Residual Regression

As …rst recognized by Ragnar Frisch, expressions (4.29) and (4.30) can be used to show that the least-squares estimators 1 and 2 can be found by a two-step regression procedure.

	idempotent, M						= M			M					and thus
Take (4.30). Since M1 isb			b		1			1					1
= X0 M1X2								1 X0 M1y
b	2		2									2			2	1
	2		2			1		0				2
			0			1		0
									1					X0			1
		=	X0 M1M1X2											X0			1	y
		=	f f					f								M	M	y
		=	f f					f
where		=	X2X2 X2e~1
where
				X			= M		X		2
				f2				1			2

and

e~1 = M1y:

Thus the coe¢ cient estimate 2 is algebraically equal to the least-squares regression of e~1 on

X2: Notice that these two are y and X2, respectively, premultiplied by M1. But we know that multiplication by M1 is equivalent to creating least-squares residuals. Therefore e~1 is simply the

least-squares residual from a regression of y on X1; and the columns of X2 are the least-squares residuals from the regressions of the columns of X2 on X1:

We have proven the following theorem.

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES

Theorem 4.10.1 Frisch-Waugh-Lovell

In the model (4.26), the OLS estimator of 2 and the OLS residuals e^ may be equivalently computed by either the OLS regression (4.27) or via the following algorithm:

1.	Regress y on X1; obtain residuals e~1;
2.	Regress X2 on X1; obtain residuals X2;
	f	b		and residuals e^:
3.	Regress e~1 on X2; obtain OLS	estimates	2
		f

In some contexts, the FWL theorem can be used to speed computation, but in most cases there is little computational advantage to using the two-step algorithm. Rather, the primary use is theoretical.

A common application of the FWL theorem, which you may have seen in an introductory econometrics course, is the demeaning formula for regression. Partition X = [X1 X2] where X1 is the vector of observed regressors and X2 = is a vector of ones . In this case,

Observe that

M2 = I 0 1 0:

M X1

= X12 0

1 0X1

and

y~ = M2y

which are “demeaned”. The FWL theorem says that

is the OLS estimate from a regression of

on x1i

1 :

1 =

(x1i

1) (x1i

1)0! 1

(x1i

1) (yi

i=1

Thus the OLS estimator for the slope coe¢ cients is a regression with demeaned data.

Ragnar Frisch

Ragnar Frisch (1895-1973) was co-winner with Jan Tinbergen of the …rst Nobel Memorial Prize in Economic Sciences in 1969 for their work in developing and applying dynamic models for the analysis of economic problems. Frisch made a number of foundational contributions to modern economics beyond the Frisch-Waugh-Lovell Theorem, including formalizing consumer theory, production theory, and business cycle theory.

CHAPTER 4. THE ALGEBRA OF LEAST SQUARES

4.11Prediction Errors

The least-squares residual e^i are not true prediction errors, as they are constructed based on the full sample including yi. A proper prediction for yi should be based on estimates constructed only using the other observations. We can do this by de…ning the leave-one-out OLS estimator of as that obtained from the sample of n 1 observations excluding the i’th observation:

Here, X( i) and y( i) value for yi is

( i) =

xjxj0

1 1 0

xjyj

1A @

j=i

X(0

i)X( i)

X( i)y( i):

(4.31)

are the data matrices omitting the i’th row. The leave-one-out predicted

0 b

y~i = xi ( i);

and the leave-one-out residual or prediction error is

e~i = yi y~i:

A convenient alternative expression for ( i) (derived below) is

		( i) = (1 hii) 1 X0X 1 xie^i
where hii are the leverage	values as de…ned in (4.21).
where hii are the leverage		b	b
Using (4.32) we can simplify the expression for the prediction error:
e~i		= yi xi0 ( i)		1		X0X	1
			^	1	xi0		1
								xie^i
		= yi xi0 b + (1 hii)						xie^i

= e^i + (1 hii) 1 hiie^i = (1 hii) 1 e^i:

(4.32)

(4.33)

A convenient feature of this expression is that it shows that computation of e~i is based on a simple linear operation, and does not really require n separate estimations.

One use of the prediction errors is to estimate the out-of-sample mean squared error

~2 =	1	n	e~i2
~2 =		Xi	e~i2
		Xi
n =1
1		n
		Xi	(1 hii) 2 e^i2:
= n		Xi
= n		=1
		=1

This is also known as the mean squared prediction error. Its square root ~ = ~2 is the prediction standard error.

Proof of Equation (4.32). The Sherman–Morrison formula (A.3) from Appendix A.5 states that for nonsingular A and vector b

A bb0 1 = A 1 + 1 b0A 1b 1 A 1bb0A 1:

This implies

X0X xix0i 1 = X0X 1 + (1 hii) 1 X0X 1 xix0i X0X 1

<<< < Предыдущая 1 2 3 4 5 6 7 89 / 309 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
21.03.20167.29 Mб58dm_lektsii.pdf
#
24.11.2018268.8 Кб7Doc12.doc
#
16.04.2015366.98 Кб5Doklad_3.docx
#
16.09.2019136.7 Кб93Doklad_dlya_fila (1).doc
#
20.09.2019182.78 Кб1Domentikristalka_19-24.doc
#
21.03.20161.77 Mб10Econometrics2011.pdf
#
18.12.20181.47 Mб3EDS_final.doc
#
25.09.201962.46 Кб1Ekonomika_s_29_po_32.doc
#
16.04.201536.86 Кб31Ekzamenatsionny_test (2).doc
#
11.09.20191.21 Mб15ekzamen_kontrolling.doc
#
17.04.20192.9 Mб4Ekzamen_po_mikre_Otvety_1 (3).doc