Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Econometrics2011

.pdf
Скачиваний:
10
Добавлен:
21.03.2016
Размер:
1.77 Mб
Скачать

APPENDIX B. PROBABILITY

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

273

function

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

F (x) = Pr

 

Z

 

x!

 

p

 

 

 

 

 

 

Q=r

 

 

 

(

 

 

 

 

 

 

)

 

 

 

 

 

 

 

 

r

 

 

 

 

 

 

=

E

Z x

 

 

Q

 

 

 

 

 

 

 

 

 

 

 

 

 

r

 

 

 

 

"

 

 

 

 

 

 

 

 

r

 

 

 

!#

=

E Pr Z x

Q

j Q

 

 

r

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

E

 

xr

Q

!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

r

 

 

 

 

 

 

Thus its density is

r!

f (x) = E

d

 

x

Q

 

r

dx

 

r!r !

=

E

 

x

 

 

Q

 

 

Q

 

 

 

 

 

 

 

 

 

 

 

 

r

 

 

 

r

 

 

r

 

 

 

 

2r=2 qr=2 1 exp ( q=2)!dq

 

Z

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

1

p2 exp

 

 

2r

 

2

 

0

 

 

 

 

 

r

r

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

qx2

 

q

 

 

 

1

 

 

 

 

 

r

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

 

 

 

 

 

 

(r+12 )

 

 

 

 

 

 

 

r+1

 

 

 

 

 

x2

 

 

 

 

 

 

 

 

 

 

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

p

 

 

 

r

 

 

1 + r

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

which is that of the student t with r degrees of freedom.

 

 

 

B.10

Inequalities

 

 

 

 

Jensen’

Inequality (…nite form). If g(

) :

R ! R

is convex, then for any non-negative weights

m

 

 

 

aj such that Pj=1 aj = 1; and any real numbers xj

 

01

 

 

 

m

 

 

A

 

 

m

 

 

 

 

 

 

 

@Xj

 

 

 

X

 

 

 

 

(B.12)

g

 

 

ajxj

 

 

 

 

ajg (xj) :

 

 

 

=1

 

 

 

 

j=1

 

 

 

 

 

In particular, setting aj = 1=m; then

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

1

m

 

 

1

 

1

 

m

 

 

(B.13)

g

 

=1 xj

 

j=1 g (xj) :

m

m

 

@

 

Xj

 

A

 

 

 

X

 

 

 

Loève’s cr Inequality. For r > 0;

 

X

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

m

aj

 

 

 

cr

m

 

aj

 

r

(B.14)

 

 

 

 

r

 

 

j

 

 

 

j=1

 

 

 

 

j=1 j

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

where cr = 1 when r 1 and cr = mr 1 when r 1:

Jensen’ Inequality (probabilistic form). If g( ) : Rm ! R is convex, then for any random

APPENDIX B. PROBABILITY

274

vector x for which Ekxk < 1 and Ejg (x)j < 1;

 

g(E(x)) E(g (x)) :

(B.15)

Conditional Jensen’ Inequality. If g( ) : Rm ! R is convex, then for any random vectors (y; x) for which Ekyk < 1 and Ekg (y)k < 1;

g(E(y j x)) E(g (y) j x) :

(B.16)

Conditional Expectation Inequality. For any r such that Ejyjr < 1; then

 

EjE(y j x)jr Ejyjr < 1:

(B.17)

Expectation Inequality. For any random matrix Y for which EkY k < 1;

 

kE(Y )k EkY k :

(B.18)

Hölder’s Inequality. If p > 1 and q > 1 and p1 + 1q = 1; then for any random m n matrices X and Y;

 

 

 

 

 

(B.19)

E X0Y (EkXkp)1=p (EkY kq)1=q :

 

Cauchy-Schwarz Inequality. For any random m n matrices X and Y;

 

E X0Y

EkXk2

1=2 EkY k2

1=2

:

(B.20)

 

 

 

 

 

 

Matrix Cauchy-Schwarz Inequality. Tripathi (1999). For any random x 2 Rm and y 2 R`,

Eyx0 Exx0 Exy0 Eyy0

(B.21)

Minkowski’ Inequality. For any random m n matrices X and Y;

 

(EkX + Y kp)1=p (EkXkp)1=p + (EkY kp)1=p

(B.22)

Liapunov’s Inequality. For any random m n matrix X and 1 r p;

 

(EkXkr)1=r (EkXkp)1=p

(B.23)

Markov’s Inequality (standard form). For any random vector x and non-negative function

g(x) 0;

 

Pr(g(x) > ) 1Eg(x):

(B.24)

Markov’s Inequality (strong form). For any random vector x and non-negative function

g(x) 0;

 

 

 

 

Pr(g(x) > ) 1E(g (x)

1 (g(x) > )) :

(B.25)

Chebyshev’ Inequality. For any random variable x;

 

 

 

Pr(jx Exj > )

var (x)

:

(B.26)

 

 

 

2

APPENDIX B. PROBABILITY

 

 

 

 

 

 

275

 

 

 

 

 

 

 

 

 

Proof of Jensen’ Inequality (B.12). By the de…nition of convexity, for any 2 [0; 1]

 

 

 

 

 

g ( x1 + (1 ) x2) g (x1) + (1 ) g (x2) :

 

(B.27)

 

This implies

 

 

 

 

 

 

 

 

 

 

 

 

 

g 0 m

ajxj1

= g 0a1g (x1) + (1 a1) m

 

aj

xj

1

 

 

 

 

 

1 a1

 

 

 

 

 

@X

A

@

Xj

 

 

A

 

 

 

 

 

j=1

 

 

=2

bjxj1:

 

 

 

 

 

 

 

 

a1g (x1) + (1 a1) g 0 m

 

 

 

 

 

 

 

 

 

@Xj

 

A

 

 

 

 

 

 

 

 

 

=2

 

 

 

 

 

 

 

 

 

 

m

 

 

 

 

 

 

 

 

 

 

where bj = aj=(1 a1) and Pj=2 bj = 1: By another application of (B.27) this is bounded by

 

 

 

a1g (x1)+(1 a1)

0b2g(x2) + (1 b2)g 0 m

cjxj11 = a1g (x1)+a2g(x2)+(1 a1) (1 b2)g

0 m

cjxj

1

 

@

 

@Xj

AA

 

 

 

 

@X

 

 

A

 

 

 

=2

 

 

 

 

 

j=2

 

 

 

where cj = bj=(1 b2): By repeated application of (B.27) we obtain (B.12).

Proof of Loève’s cr Inequality. For r 1 this is simply inequality (B.13) with g(u) = ur: For r < 1; de…ne bj = jajj = and r < 1 imply bj brj and thus

m

m

Xj

X

1 = bj

bjr

=1

j=1

a rewriting of the …nite form Jensen’

Pm

j=1 jajj : The facts that 0 bj 1

which implies

0 m

jajj1r

 

 

 

 

 

 

m

jajjr :

 

 

@X

 

A

 

Xj

 

 

j=1

 

 

 

=1

 

The proof is completed by observing that

 

 

 

 

 

 

 

0 m

aj

1r

0 m

jajj1r

:

 

@Xj

 

 

A

@X

A

 

 

=1

 

 

 

 

j=1

 

 

Proof of Jensen’ Inequality (B.15). Since g(u) is convex, at any point u there is a nonempty

set of subderivatives (linear surfaces touching g(u) at u but lying below g(u) for all u). Let a+b0u be a subderivative of g(u) at u = Ex: Then for all u; g(u) a + b0u yet g(Ex) = a + b0Ex:

Applying expectations, Eg(x) a + b0Ex = g(Ex); as stated.

Proof of Conditional Jensen’ Inequality. The same as the proof of (B.15), but using conditional expectations. The conditional expectations exist since Ekyk < 1 and Ekg (y)k < 1:

Proof of Conditional Expectation Inequality. As the function jujr is convex for r 1, the Conditional Jensen’ inequality implies

jE(y j x)jr E(jyjr j x) :

Taking unconditional expectations and the law of iterated expectations, we obtain

EjE(y j x)jr EE(jyjr j x) = Ejyjr < 1

APPENDIX B. PROBABILITY

276

as required.

Proof of Expectation Inequality. By the Triangle inequality, for 2 [0; 1];

k U1 + (1 )U2k kU1k + (1 ) kU2k

which shows that the matrix norm g(U) = kUk is convex. Applying Jensen’ Inequality (B.15) we …nd (B.18).

Proof of Hölder’s Inequality. Since p1 + 1q = 1 an application of Jensen’ Inequality (B.12) shows that for any real a and b

 

1

1

 

1

 

 

 

1

exp

 

a +

 

b

 

 

exp (a) +

 

exp (b) :

p

q

p

q

Setting u = exp (a) and v = exp (b) this implies

 

 

 

 

 

 

 

 

 

 

u1=pv1=q

u

v

 

 

 

 

 

+

 

 

 

 

p

q

and this inequality holds for any u > 0 and v > 0:

Set u = kXkp =EkXkp and v = kY kq =EkY kq : Note that Eu = Ev = 1: By the matrix Schwarz Inequality (A.8), kX0Y k kXk kY k. Thus

EkX0Y k

(EkXkp)1=p (EkY kq)1=q

which is (B.19).

E(kXk kY k)

(EkXkp)1=p (EkY kq)1=q

=E u1=pv1=q

E up + vq

=p1 + 1q

=1;

Proof of Cauchy-Schwarz Inequality. Special case of Hölder’s with p = q = 2:

Proof of Matrix Cauchy-Schwarz Inequality. De…ne e = y (Eyx0) (Exx0) x: Note that

Eee0 0 is positive semi-de…nite. We can calculate that

Eee0 = Eyy0 Eyx0 Exx0 Exy0:

Since the left-hand-side is positive semi-de…nite, so is the right-hand-side, which means Eyy0

(Eyx0) (Exx0) Exy0 as stated.

 

 

Proof ofr

Liapunov’s Inequality. The function g(u) = up=r is convex for u > 0 since p r: Set

u = kXk

: By Jensen’ inequality, g (Eu) Eg (u) or

 

 

(EkXkr)p=r E(kXkr)p=r = EkXkp :

 

Raising both sides to the power 1=p yields (EkXkr)1=r (EkXkp)1=p as claimed.

 

APPENDIX B. PROBABILITY

277

Proof of Minkowski’ Inequality. Note that by rewriting, using the triangle inequality (A.9), and then Hölder’s Inequality to the two expectations

EkX + Y kp = E kX + Y k kX + Y kp 1

 

 

 

 

 

 

E

X

X + Y

 

p 1

+ E

Y

kX + Y kp 1

 

 

 

k

 

k k

k

 

k

 

k1=q

 

E kX + Y kq(p 1)

1=q

(EkXkp)1=p E kX + Y kq(p 1)

+ (EkY kp)1=p

 

=

(EkXkp)1=p + (EkY kp)1=p E(kX + Y kp)(p 1)=p

 

where the second equality picks q to satisfy 1=p+1=q = 1; and the …nal equality uses this fact to make the substitution q = p=(p 1) and then collects terms. Dividing both sides by E(kX + Y kp)(p 1)=p ; we obtain (B.22).

Proof of Markov’s Inequality. Let F denote the distribution function of x: Then

Z

Pr (g(x) ) =

dF (u)

 

fg(u) g

 

 

Zfg(u) g

u)

 

 

g(

 

dF (u)

 

=

1 Z 1 (g(u) > ) g(u)dF (u)

=

1E(g (x) 1 (g(x) > ))

the inequality using the region of integration fg(u) > g: This establishes the strong form (B.25). Since 1 (g(x) > ) 1; the …nal expression is less than 1E(g(x)) ; establishing the standard form (B.24).

Proof of Chebyshev’ Inequality. De…ne y = (x Ex)2 and note that Ey = var (x) : The events fjx Exj > g and y > 2 are equal, so by an application Markov’s inequality we …nd

Pr(jx Exj > ) = Pr(y > 2) 2E(y) = 2 var (x)

as stated.

B.11 Maximum Likelihood

In this section we provide a brief review of the asymptotic theory of maximum likelihood estimation.

When the density of yi is f(y j ) where F is a known distribution function and 2 is an unknown m 1 vector, we say that the distribution is parametric and that is the parameter of the distribution F: The space is the set of permissible value for : In this setting the method of maximum likelihood is an appropriate technique for estimation and inference on : We let denote a generic value of the parameter and let 0 denote its true value.

The joint density of a random sample (y1; :::; yn) is

Yn

fn (y1; :::; yn j ) = f (yi j ) :

i=1

The likelihood of the sample is this joint density evaluated at the observed sample values, viewed as a function of . The log-likelihood function is its natural logarithm

Xn

log L( ) = log f (yi j ) :

i=1

APPENDIX B. PROBABILITY

278

The likelihood score is the derivative of the log-likelihood, evaluated at the true parameter value.

@

log f (yi j 0) :

 

 

 

 

Si =

 

 

 

 

@

 

 

We also de…ne the Hessian

 

 

 

@2

 

 

 

 

 

H = E

 

log f (yi j 0)

 

(B.28)

@ @ 0

and the outer product matrix

 

 

 

 

 

 

= E SiSi0 :

 

 

(B.29)

We now present three important features of the likelihood.

 

 

 

 

 

 

 

 

 

 

Theorem B.11.1

 

 

 

 

 

@@ Elog f (y j ) = 0

= 0

(B.30)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ESi = 0

 

(B.31)

 

 

and

 

 

 

 

 

 

H = I

 

(B.32)

 

 

 

 

 

 

 

 

 

 

 

 

The matrix I is called the information, and the equality (B.32) is called the information matrix equality.

^

The maximum likelihood estimator (MLE) is the parameter value which maximizes the likelihood (equivalently, which maximizes the log-likelihood). We can write this as

^

= argmax log L( ):

(B.33)

 

2

^

In some simple cases, we can …nd an explicit expression for as a function of the data, but these

^

cases are rare. More typically, the MLE must be found by numerical methods.

^

To understand why the MLE is a natural estimator for the parameter observe that the standardized log-likelihood is a sample average and an estimator of Elog f (yi j ) :

1

1

 

n

p

 

 

 

 

Xi

 

n log L( ) = n

 

=1 log f (yi j ) ! Elog f (yi j ) :

^

As the MLE maximizes the left-hand-side, we can see that it is an estimator of the maximizer of the right-hand-side. The …rst-order condition for the latter problem is

@

0 = @ Elog f (yi j )

^

which holds at = 0 by (B.30). This suggests that is an estimator of 0: In. fact, under

^

^

p

as n ! 1: Furthermore, we can derive

conventional regularity conditions, is consistent,

! 0

its asymptotic distribution.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

d

 

 

 

Theorem B.11.2 Under regularity conditions, pn ^ 0

 

0; I 1 .

 

 

! N

 

APPENDIX B. PROBABILITY

279

We omit the regularity conditions for Theorem B.11.2, but the result holds quite broadly for models which are smooth functions of the parameters. Theorem B.11.2 gives the general form for the asymptotic distribution of the MLE. A famous result shows that the asymptotic variance is the smallest possible.

Theorem B.11.3 Cramer-Rao Lower Bound. If is an unbiased reg-

e

e

ular estimator of ; then var( ) (nI) :

The Cramer-Rao Theorem shows that the …nite sample variance of an unbiased estimator is

bounded below by (nI) 1 : This means that the asymptotic variance of the standardized estimator p e 1

n 0 is bounded below by I : In other words, the best possible asymptotic variance among

all (regular) estimators is I 1: An estimator is called asymptotically e¢ cient if its asymptotic variance equals this lower bound. Theorem B.11.2 shows that the MLE has this asymptotic variance, and is thus asymptotically e¢ cient.

Theorem B.11.4 The MLE is asymptotically e¢ cient in the sense that its asymptotic variance equals the Cramer-Rao Lower Bound.

Theorem B.11.4 gives a strong endorsement for the MLE in parametric models.

b b

Finally, consider functions of parameters. If = g( ) then the MLE of is = g( ): This is because maximization (e.g. (B.33)) is una¤ected by parameterization and transformation. Applying the Delta Method to Theorem B.11.2 we conclude that

@

 

 

 

b

 

 

 

b

d

 

 

 

 

 

 

pn

 

 

' G0pn

 

 

 

(B.34)

 

 

 

 

! N 0; G0I 1G

where G =

 

g( 0): By Theorem B.11.4,

 

is an asymptotically e¢ cient estimator for

. The

@

 

asymptotic variance G0I 1G is the Cramerb-Rao lower bound for estimation of .

 

 

Theorem B.11.5 The Cramer-Rao lower bound for

= g( ) is G0I 1G

 

 

bb

,and the MLE = g( ) is asymptotically e¢ cient.

Proof of Theorem B.11.1. To see (B.30);

 

 

 

 

 

 

 

 

 

 

 

@

 

 

 

@

 

 

 

 

 

 

 

 

@

Elog f (y j ) = 0

=

 

@

Z

log f (y j ) f (y j 0) dy = 0

 

 

 

 

 

 

 

 

 

 

j

0)

 

 

 

 

 

 

 

@

 

 

f (y

 

 

 

 

 

 

=

@

 

f (y )

 

j

 

dy

 

 

 

 

 

 

 

 

 

 

 

@

 

j

f (y )

 

 

 

 

 

 

Z

 

 

= 0

 

 

 

 

@

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

@

Z

f (y j ) dy = 0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

@

1 = 0

= 0:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

APPENDIX B. PROBABILITY

 

 

 

280

Equation (B.31) follows by exchanging integration and di¤erentiation

@

 

 

 

@

 

 

 

E

 

log f (y j 0) =

 

Elog f (y j 0) = 0:

@

@

Similarly, we can show that

 

 

 

 

 

 

 

@2

f (y

)

 

 

 

E

@ @ 0 j 0

 

! = 0:

 

 

f (y j 0)

 

By direction computation,

@2

@ @ 0 log f (y j 0) =

=

Taking expectations yields (B.32).

 

@2

f (y

j

 

 

)

 

 

@

 

 

@

 

 

 

0

 

 

 

@ @ 0

0

 

@ f (y j 0)

@ f (y j 0)

 

 

 

 

 

 

 

 

 

 

f (y j 0)

 

 

 

 

 

f (y j 0)2

 

 

 

 

@2

f (y j 0)

 

@

 

 

@

 

0

 

 

@ @ 0

 

 

 

 

 

 

 

 

 

 

 

log f (y j 0)

 

log f (y j 0)

:

 

f (y j 0)

 

 

@

@

Proof of Theorem B.11.2 Taking the …rst-order condition for maximization of log L( ), and making a …rst-order Taylor series expansion,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

@

 

 

 

 

 

 

 

 

0 =

 

@@ log L( ) =^

 

 

 

 

 

=

X

 

log f

 

y

 

j

^

 

 

 

 

 

 

 

 

 

 

 

 

i=1 @

 

 

 

i

 

@2

 

^ 0 ;

 

 

n

@

 

 

 

 

 

n

 

 

X

 

 

 

 

 

 

X

 

 

=

i=1

@

log f (yi j 0) + i=1

@ @ 0

log f (yi j n)

where n lies on a line row in this expansion.)

^

segment joining and 0: (Technically, the speci…c value of n varies by Rewriting this equation, we …nd

^ 0

n

@2

log f (yi j n)!

1 n

!

X

 

X

= i=1

@ @ 0

i=1 Si

where Si are the likelihood scores. Since the score Si is mean-zero (B.31) with covariance matrix(equation B.29) an application of the CLT yields

 

 

n

 

1

Xi

d

p

 

 

Si ! N (0; ) :

n

=1

The analysis of the sample Hessian is somewhat more complicated due to the presence of n:

 

 

 

 

2

 

 

 

 

 

 

p

Let H( ) =

@

log f (yi; ) : If it is continuous in ; then since n ! 0 it follows that

@ @ 0

p

 

 

 

 

 

 

 

 

H( n) ! H and so

 

 

 

 

 

 

 

1 n

@2

 

1 n

@2

 

 

 

i=1

 

log f (yi; n) =

 

 

 

 

log f (yi; n) H( n) + H( n)

n

@ @ 0

n i=1

@ @ 0

 

 

X

 

 

p

 

X

 

 

 

 

 

 

 

 

 

! H

 

 

 

by an application of a uniform WLLN. (By uniform, we mean that the WLLN holds uniformly over the parameter value. This requires the second derivative to be a smooth function of the parameter.)

APPENDIX B. PROBABILITY

 

 

 

281

Together,

 

 

 

 

 

 

d

0; H 1 H 1 = N 0; I 1 ;

 

 

 

 

pn ^ 0 ! H 1N (0; ) = N

 

the …nal equality using Theorem B.11.1 .

 

 

 

 

Proof of Theorem B.11.3. Let Y = (y1; :::; yn) be the sample, and set

 

 

 

 

 

 

 

n

 

@

 

Xi

 

 

 

S =

 

log fn (Y ; 0) =

 

Si

 

 

 

@

 

 

 

 

 

 

 

 

=1

 

which by Theorem (B.11.1) has mean zero and variance n

 

: Write the estimator = (Y ) as a

 

 

e

 

I

 

e e

function of the data. Since is unbiased for any ;

 

 

Z

ee

= E = (Y ) f (Y ; ) dY :

Di¤erentiating with respect to and evaluating at 0 yields

 

 

e

 

e

 

 

 

e

@

 

 

e

 

 

@

 

 

 

 

 

 

 

 

 

I = Z (Y )

@ 0

f (Y ; ) dY

= Z

(Y )

 

@ 0

log f (Y

; ) f (Y ; 0) dY

= E

S0

= E

0

S0

the …nal equality since E(S) = 0

 

 

 

 

 

 

 

 

 

 

e

 

S0 = I; and var (S) = E(SS0) =

 

I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

 

;By the matrix Cauchy-Schwarz inequality (B.21),

E

0

 

 

 

 

e

 

E

e

 

 

 

e

 

 

0

 

 

 

 

0

 

 

 

 

 

 

 

var

=

 

 

0

0

 

 

e

 

 

 

 

 

 

 

 

 

 

 

 

e

 

 

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

E

 

0 S0 E SS0

 

E S

 

 

 

 

 

 

 

 

 

=

SS

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

= (En

)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

as stated.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Appendix C

Numerical Optimization

Many econometric estimators are de…ned by an optimization problem of the form

 

^

= argmin Q( )

(C.1)

 

2

where the parameter is 2 Rm and the criterion function is Q( ) : ! R: For example NLLS, GLS, MLE and GMM estimators take this form. In most cases, Q( ) can be computed

^

for given ; but is not available in closed form. In this case, numerical methods are required to

^

obtain :

C.1 Grid Search

Many optimization problems are either one dimensional (m = 1) or involve one-dimensional optimization as a sub-problem (for example, a line search). In this context grid search may be employed.

Grid Search. Let = [a; b] be an interval. Pick some " > 0 and set G = (b a)=" to be the number of gridpoints. Construct an equally spaced grid on the region [a; b] with G gridpoints, which is { (j) = a + j(b a)=G : j = 0; :::; Gg. At each point evaluate the criterion function and …nd the gridpoint which yields the smallest value of the criterion, which is (^|) where |^ =

^

 

 

 

 

argmin0 j G Q( (j)): This value (^|) is the gridpoint estimate of : If the grid is su¢ ciently …ne to

capture small oscillations in Q( ); the approximation error is bounded by "; that is,

 

^

 

":

 

(^|)

 

Plots of Q( (j)) against (j) can help diagnose errors in grid selection. This method

is quite robust

but potentially costly:

 

 

 

 

Two-Step Grid Search. The gridsearch method can be re…ned by a two-step execution. For an error bound of " pick G so that G2 = (b a)=" For the …rst step de…ne an equally spaced

grid on the region [a; b] with G gridpoints, which is { (j) = a + j(b a)=G

: j = 0; :::; Gg:

At each point evaluate the criterion function and let |^ = argmin0 j G Q( (j)).

For the second

step de…ne an equally spaced grid on [ (^| 1); (^| + 1)] with G gridpoints, which is { 0(k) =

2 ^ 0 ^

(^| 1) + 2k(b a)=G : k = 0; :::; Gg: Let k = argmin Q( (k)): The estimate of is

0 k G

^

k . The advantage of the two-step method over a one-step grid search is that the number of

function evaluations has been reduced from (b a)=" to 2

(b a)=" which can be substantial. The

 

 

^

disadvantage is that if the function Q( ) is irregular,

the …rst-step grid may not bracket which

 

p

thus would be missed.

C.2 Gradient Methods

Gradient Methods are iterative methods which produce a sequence i : i = 1; 2; ::: which

^

are designed to converge to : All require the choice of a starting value 1; and all require the

282

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]