Econometrics2011

Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский политехнический университет Петра Великого (бывш. СПбГПУ)

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

the bandwidth h > 0. Let

.pdf

Скачиваний:

Добавлен:

21.03.2016

Размер:

1.77 Mб

Скачать

☆

<<< < Предыдущая 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 2526 / 3026 27 28 29 30 > Следующая >>>

CHAPTER 17. PANEL DATA

and

243

di =	0 d...i1	1	;
	@	A
	B din	C

an n 1 dummy vector with a “1”in the i0th place. Let
		u = 0 u...1	1:
		B un	C
Then note that		@	A
Then note that
		ui = di0u;
and
	yit	= xit0 + di0u + eit:		(17.2)
Observe that	E(eit j xit; di) = 0;
	E(eit j xit; di) = 0;
so (17.2) is a valid regression, with di as a regressor along with xi:
OLS on (17.2) yields estimator	^	: Conventional inference applies.
OLS on (17.2) yields estimator	; u^	: Conventional inference applies.
Observe that

This is generally consistent.

If xit contains an intercept, it will be collinear with di; so the intercept is typically omitted from xit:

Any regressor in xit which is constant over time for all individuals (e.g., their gender) will be collinear with di; so will have to be omitted.

There are n + k regression parameters, which is quite large as typically n is very large.

Computationally, you do not want to actually implement conventional OLS estimation, as the parameter space is too large. OLS estimation of proceeds by the FWL theorem. Stacking the observations together:

		y = X + Du + e;
then by the FWL theorem,
^	= X0 (I		P D) X				1	X0 (I P D) y
	= X0 (I		P D) X					X0 (I P D) y
				1
	=	X0X		1	X0
						y		;
where
		y = y D(D0D) 1D0y
		X = X				D(D0D) 1D0X:

Since the regression of yit on di is a regression onto individual-speci…c dummies, the predicted value from these regressions is the individual speci…c mean yi; and the residual is the demean value

yit = yit yi:

The …xed e¤ects estimator is OLS of yit on xit, the dependent variable and regressors in deviation- from-mean form.

CHAPTER 17. PANEL DATA	244
Another derivation of the estimator is to take the equation

yit = x0it + ui + eit;

and then take individual-speci…c means by taking the average for the i0th individual:

1 ti

yit =

x0 + ui +

eit

tXi

Ti =t

Ti t=t

yi = x0i + ui + ei:

Subtracting, we …nd

yit = x0it + eit;

which is free of the individual-e¤ect ui:

17.3Dynamic Panel Regression

A dynamic panel regression has a lagged dependent variable

yit = yit 1 + xit0 + ui + eit:

(17.3)

This is a model suitable for studying dynamic behavior of individual agents.

Unfortunately, the …xed e¤ects estimator is inconsistent, at least if T is held …nite as n ! 1: This is because the sample mean of yit 1 is correlated with that of eit:

The standard approach to estimate a dynamic panel is to combine …rst-di¤erencing with IV or GMM. Taking …rst-di¤erences of (17.3) eliminates the individual-speci…c e¤ect:

yit = yit 1 + xit0 + eit:

(17.4)

However, if eit is iid, then it will be correlated with yit 1 :

E( yit 1 eit) = E((yit 1 yit 2) (eit eit 1)) = E(yit 1eit 1) = 2e:

So OLS on (17.4) will be inconsistent.

But if there are valid instruments, then IV or GMM can be used to estimate the equation. Typically, we use lags of the dependent variable, two periods back, as yt 2 is uncorrelated witheit: Thus values of yit k; k 2, are valid instruments.

Hence a valid estimator of and is to estimate (17.4) by IV using yt 2 as an instrument foryt 1 (which is just identi…ed). Alternatively, GMM using yt 2 and yt 3 as instruments (which is overidenti…ed, but loses a time-series observation).

A more sophisticated GMM estimator recognizes that for time-periods later in the sample, there are more instruments available, so the instrument list should be di¤erent for each equation. This is conveniently organized by the GMM principle, as this enables the moments from the di¤erent timeperiods to be stacked together to create a list of all the moment conditions. A simple application of GMM yields the parameter estimates and standard errors.

Chapter 18

Nonparametrics

18.1Kernel Density Estimation


Let X be a random variable with continuous distribution F (x) and density f(x) =								d	F (x):

		f(x) from a random sample (X ; :::; X						dx
						ng	While F (x) can be estimated by
The goal is to estimate n			1 d
the EDF F^(x) = n 1	P	i=1 1 (Xi x) ; we cannot de…ne			F^(x) since F^(x) is a step function. The
				dx

standard nonparametric method to estimate f(x) is based on smoothing using a kernel. While we are typically interested in estimating the entire function f(x); we can simply focus

on the problem where x is a speci…c …xed number, and then see how the method generalizes to estimating the entire function.

De…nition 18.1.1 K(u) is a second-order kernel function if it is a symmetric zero-mean density function.

Three common choices for kernels include the Normal

			1					u2
	K(u) =		p		exp
								2
				2
the Epanechnikov		43		0			juj > 1
	K(u) =			1 u2		; juj 1
and the Biweight or Quartic								juj > 1
		1615		0		2
	K(u) =			1 u2			; juj 1

In practice, the choice between these three rarely makes a meaningful di¤erence in the estimates. The kernel functions are used to smooth the data. The amount of smoothing is controlled by

be the kernel K rescaled by the bandwidth h: The kernel density estimator of f(x) is

f^(x) = n1 Xn Kh (Xi x) :

i=1

245

CHAPTER 18. NONPARAMETRICS

246

This estimator is the average of a set of weights. If a large number of the observations Xi are near x; then the weights are relatively large and f^(x) is larger. Conversely, if only a few Xi are near x; then the weights are small and f^(x) is small. The bandwidth h controls the meaning of “near”.

Interestingly, f^(x) is a valid density. That is, f^(x) 0 for all x; and

Z 1 f^(x)dx = Z

1 n

Z 1 Kh (Xi x) dx =

1 n

1 K (u) du = 1

i=1 Kh (Xi x) dx =

i=1

where the second-to-last equality makes the change-of-variables u = (Xi x)=h:

We can also calculate the moments of the density f^(x): The mean is

1 xf^(x)dx =

1 n

Z 1 xKh (Xi x) dx

i=1

1 n

= n i=1

Z 1

(Xi + uh) K (u) du

Xi Z

= n i=1

1 K (u) du + n i=1 h Z

1 uK (u) du

1Xn

=n i=1 Xi

the sample mean of the Xi; where the second-to-last equality used the change-of-variables u = (Xi x)=h which has Jacobian h:

The second moment of the estimated density is

1 x2f^(x)dx =

i=1

1 x2Kh (Xi x) dx

= n i=1

1 (Xi + uh)2 K (u) du

2 n

= n i=1 Xi2 + n i=1 Xih Z

1 K(u)du + n i=1 h2 Z

1 u2K (u) du

Xi2 + h2 K2

n =1

where

K2 = Z 1 u2K (u) du

is the variance of the kernel. It follows that the variance of the density f^(x) is

1 n

xf^(x)dx

x2f^(x)dx Z

= n i=1 Xi2 + h2 K2 n i=1 Xi

^2 + h2 K2

Thus the variance of the estimated density is in‡ated by the factor h2 2

relative to the sample

moment.

CHAPTER 18. NONPARAMETRICS

247

18.2Asymptotic MSE for Kernel Estimates

For …xed x and bandwidth h observe that

1	1	1
EKh (X x) = Z 1 Kh (z x) f(z)dz =	Z 1 Kh (uh) f(x + hu)hdu =	Z 1 K (u) f(x + hu)du

The second equality uses the change-of variables u = (z x)=h: The last expression shows that the expected value is an average of f(z) locally about x:

This integral (typically) is not analytically solvable, so we approximate it using a second order Taylor expansion of f(x + hu) in the argument hu about hu = 0; which is valid as h ! 0: Thus

f (x + hu) ' f(x) + f0(x)hu +

f00(x)h2u2

and therefore

EKh (X x) '

Z 1 K (u)

f(x) + f0(x)hu +

f00(x)h2u2 du

f(x) Z 1 K (u) du + f0(x)h Z 1 K (u) udu +

f00(x)h2

Z 1 K (u) u2du

= f(x) +

f00(x)h2 2

The bias of f^(x) is then

Bias(x) =

f^(x)

f(x) =

f00(x)h2 2 :

E h

n =1

We see that the bias of f^(x) at x depends on the second derivative f00(x): The sharper the derivative, the greater the bias. Intuitively, the estimator f^(x) smooths data local to Xi = x; so is estimating a smoothed version of f(x): The bias results from this smoothing, and is larger the greater the curvature in f(x):

We now examine the variance of f^(x): Since it is an average of iid random variables, using …rst-order Taylor approximations and the fact that n 1 is of smaller order than (nh) 1

var (x) =

n var (Kh (Xi x))

EKh (Xi x)2

(EKh (Xi x))2

1 K

z x

2 f(z)dz

f(x)2

nh2

Z 1

Z 1 K (u)2 f (x + hu) du

f (x)

Z 1 K (u)2 du

f (x) R(K)

where R(K) =

K (u)2 du is called the roughness of K:

Together,

the asymptotic mean-squared error (AMSE) for …xed x is the sum of the approximate

squared bias and approximate variance

AMSE

(x) =

f00

(x)2h4 4

f (x) R(K)

CHAPTER 18. NONPARAMETRICS

248

A global measure of precision is the asymptotic mean integrated squared error (AMISE)

AMISEh = Z		h4	4 R(f		)		(K)
	AMSEh(x)dx =		K	00		+	R	:	(18.1)
	AMSEh(x)dx =		4			+	nh	:	(18.1)

where R(f00) = R (f00(x))2 dx is the roughness of f00: Notice that the …rst term (the squared bias) is increasing in h and the second term (the variance) is decreasing in nh: Thus for the AMISE to decline with n; we need h ! 0 but nh ! 1: That is, h must tend to zero, but at a slower rate than n 1:

Equation (18.1) is an asymptotic approximation to the MSE. We de…ne the asymptotically optimal bandwidth h0 as the value which minimizes this approximate MSE. That is,

h0 = argmin AMISEh

It can be found by solving the …rst order condition

AMISE

= h3 4

R(f00)

R(K)

= 0

nh2

yielding

h0 =

R(K)

1=5

n 1=2:

(18.2)

K4 R(f00)

This solution takes the form h0 = cn 1=5 where c is a function of K and f; but not of n: We thus say that the optimal bandwidth is of order O(n 1=5): Note that this h declines to zero, but at a very slow rate.

In practice, how should the bandwidth be selected? This is a di¢ cult problem, and there is a large and continuing literature on the subject. The asymptotically optimal choice given in (18.2) depends on R(K); 2K; and R(f00): The …rst two are determined by the kernel function. Their values for the three functions introduced in the previous section are given here.

K	K2 =	1 u2K (u) du R(K) =		11 K (u)2 du
Gaussian		R 11	1/R(2p		)
Gaussian		R 11	1/R(2p		)
Epanechnikov		1=5		1=5
Biweight		1=7		5=7

An obvious di¢ culty is that R(f00) is unknown. A classic simple solution proposed by Silverman (1986)has come to be known as the reference bandwidth or Silverman’s Rule-of-Thumb. It uses formula (18.2) but replaces R(f00) with ^ 5R( 00); where is the N(0; 1) distribution and ^2 is an estimate of 2 = var(X): This choice for h gives an optimal rule when f(x) is normal, and gives a nearly optimal rule when f(x) is close to normal. The downside is that if the density is very far from normal, the rule-of-thumb h can be quite ine¢ cient. We can calculate that R( 00) = 3= (8p ) : Together with the above table, we …nd the reference rules for the three kernel functions introduced earlier.

Gaussian Kernel: hrule = 1:06^n 1=5 Epanechnikov Kernel: hrule = 2:34^n 1=5 Biweight (Quartic) Kernel: hrule = 2:78^n 1=5

Unless you delve more deeply into kernel estimation methods the rule-of-thumb bandwidth is a good practical bandwidth choice, perhaps adjusted by visual inspection of the resulting estimate f^(x): There are other approaches, but implementation can be delicate. I now discuss some of these choices. The plug-in approach is to estimate R(f00) in a …rst step, and then plug this estimate into the formula (18.2). This is more treacherous than may …rst appear, as the optimal h for estimation of the roughness R(f00) is quite di¤erent than the optimal h for estimation of f(x): However, there

CHAPTER 18. NONPARAMETRICS

249

are modern versions of this estimator work well, in particular the iterative method of Sheather and Jones (1991). Another popular choice for selection of h is cross-validation. This works by constructing an estimate of the MISE using leave-one-out estimators. There are some desirable properties of cross-validation bandwidths, but they are also known to converge very slowly to the optimal values. They are also quite ill-behaved when the data has some discretization (as is common in economics), in which case the cross-validation rule can sometimes select very small bandwidths leading to dramatically undersmoothed estimates. Fortunately there are remedies, which are known as smoothed cross-validation which is a close cousin of the bootstrap.

Appendix A

Matrix Algebra

A.1 Notation

A scalar a is a single number.

A vector a is a k 1 list of numbers, typically arranged in a column. We write this as

0 a2		1
a1
B a	k	C
B	k	C
a = B ...		C
@		A

Equivalently, a vector a is an element of Euclidean k space, written as a 2 Rk: If k = 1 then a is a scalar.

A matrix A is a k r rectangular array of numbers, written as

		a11	a12		a1r		7
A =	6 a21...		a22...		a2...r
	6 a		a		a		7
	4	k1		k2		kr	5
	6						7

By convention aij refers to the element in the i0th row and j0th column of A: If r = 1 then A is a column vector. If k = 1 then A is a row vector. If r = k = 1; then A is a scalar.

A standard convention (which we will follow in this text whenever possible) is to denote scalars by lower-case italics (a); vectors by lower-case bold italics (a); and matrices by upper-case bold italics (A): Sometimes a matrix A is denoted by the symbol (aij):

A matrix can be written as a set of column vectors or as a set of row vectors. That is,

							2	2		3
								1
A = a1 a2				ar			6		k	7
						=	6	...	k	7
						=	6	...		7
where							4			5
where		2 a2i			3
			a1i
	ai =	6 a		ki	7
		6		ki	7
		6 ...			7
are column vectors and		4			5
are column vectors and	aj1 aj2
j =	aj1 aj2					ajr

250

APPENDIX A. MATRIX ALGEBRA

251

are row vectors.

The transpose of a matrix, denoted A0; is obtained by ‡ipping the matrix on its diagonal.

Thus	2 a12			a22	ak2		3

		a11		a21	ak1		7
A0 =	6 ...			...	a	...
	6 a			a			7
	4		1r	2r		kr	5
	6						7
Alternatively, letting B = A0; then bij	= aji.			Note that if A is k r, then A0 is r k: If a is a

k 1 vector, then a0 is a 1 k row vector. An alternative notation for the transpose of A is A>: A matrix is square if k = r: A square matrix is symmetric if A = A0; which requires aij = aji:

A square matrix is diagonal if the o¤-diagonal elements are all zero, so that aij = 0 if i =6 j: A square matrix is upper (lower) diagonal if all elements below (above) the diagonal equal zero.

An important diagonal matrix is the identity matrix, which has ones on the diagonal. The k k identity matrix is denoted as

	6	1	0	0	7
Ik =	6	0...	1...	0...	7	:
	6	0	0	1	7
	6				7
	4				5

A partitioned matrix takes the form

A =		A11		A12			A1r		7
	6 ... ...					...
	2	A21		A22			A2r		3
	6	A		A			A		7
	4		k1		k2			kr	5
	6								7

where the Aij denote matrices, vectors and/or scalars.

A.2 Matrix Addition

If the matrices A = (aij) and B = (bij) are of the same order, we de…ne the sum

A + B = (aij + bij) :

Matrix addition follows the communtative and associative laws:

A + B =	B + A
A + (B + C) =	(A + B) + C:

A.3 Matrix Multiplication

If A is k r and c is real, we de…ne their product as

Ac = cA = (aijc) :

If a and b are both k 1; then their inner product is

a0b = a1b1 + a2b2 + + akbk = ajbj:

j=1

Note that a0b = b0a: We say that two vectors a and b are orthogonal if a0b = 0:

APPENDIX A. MATRIX ALGEBRA

252

If A is k r and B is r s; so that the number of columns of A equals the number of rows of B; we say that A and B are conformable. In this event the matrix product AB is de…ned. Writing A as a set of row vectors and B as a set of column vectors (each of length r); then the

matrix product is de…ned as	2 a20		3
	2 a20		3
	a10
AB =	6 a	k0	7			b2		bs
	6	k0	7
	6 ...		7 b1
	4		5							3
	a10 b1			a10 b2			a10 bs
	2 a20 b1			a20 b2			a20 bs
=	6 ...			...			...			7	:
	4	k0	1	k0	2		k0		s	5
	6	k0	1	k0	2		k0	b	s	7
	6 a		b	a b			a	b		7

Matrix multiplication is not communicative: in general AB 6= BA: However, it is associative and distributive:

A (BC) = (AB) C

A (B + C) = AB + AC

An alternative way to write the matrix product is to use matrix partitions. For example,

	A11	A12	B11	B12
AB =	A21	A22	B21	B22
=	A11B11 + A12B21			A11B12 + A12B22					:
=	A21B11 + A22B21			A21B12 + A22B22					:
As another example,					2			3
					2	B2		3
						B1
AB =		A1 A2		Ar	6	B	r	7
					6		r	7
					6 ...			7
					4			5

=A1B1 + A2B2 + + ArBr

=AjBj

j=1

An important property of the identity matrix is that if A is k r; then AIr = A and IkA = A: The k r matrix A, r k, is called orthogonal if A0A = Ir:

A.4 Trace

The trace of a k k square matrix A is the sum of its diagonal elements

tr (A) = aii:

i=1

Some straightforward properties for square matrices A and B and real c are

tr (cA) = c tr (A)

tr A0	= tr (A)
+ B) =		tr (A) + tr (B)
tr (A
tr (Ik)	=	k:

<<< < Предыдущая 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 2526 / 3026 27 28 29 30 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
21.03.20167.29 Mб58dm_lektsii.pdf
#
24.11.2018268.8 Кб7Doc12.doc
#
16.04.2015366.98 Кб5Doklad_3.docx
#
16.09.2019136.7 Кб92Doklad_dlya_fila (1).doc
#
20.09.2019182.78 Кб1Domentikristalka_19-24.doc
#
21.03.20161.77 Mб10Econometrics2011.pdf
#
18.12.20181.47 Mб3EDS_final.doc
#
25.09.201962.46 Кб1Ekonomika_s_29_po_32.doc
#
16.04.201536.86 Кб31Ekzamenatsionny_test (2).doc
#
11.09.20191.21 Mб14ekzamen_kontrolling.doc
#
17.04.20192.9 Mб4Ekzamen_po_mikre_Otvety_1 (3).doc