Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Econometrics2011

.pdf
Скачиваний:
10
Добавлен:
21.03.2016
Размер:
1.77 Mб
Скачать

APPENDIX A. MATRIX ALGEBRA

 

 

 

 

 

 

 

 

 

 

 

253

Also, for k r A and r k B we have

 

 

 

 

 

 

 

 

 

 

 

 

tr (AB) = tr (BA) :

 

 

 

(A.1)

Indeed,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a10 b1 a10 b2

 

a10 bk

3

 

2 a20 b1 a20 b2

a20 bk

tr (AB) = tr

6 ...

 

...

 

 

...

 

7

 

4

k0

 

1

k0

 

2

 

k0

 

k

5

 

6

b

b

 

b

7

 

6 a

 

a

 

 

a

 

7

Xk

=a0ibi

i=1

Xk

=b0iai

i=1

= tr (BA) :

A.5 Rank and Inverse

The rank of the k r matrix (r k)

A = a1 a2 ar

is the number of linearly independent columns aj; and is written as rank (A) : We say that A has full rank if rank (A) = r:

A square k k matrix A is said to be nonsingular if it is has full rank, e.g. rank (A) = k: This means that there is no k 1 c =6 0 such that Ac = 0:

If a square k k matrix A is nonsingular then there exists a unique matrix k k matrix A 1 called the inverse of A which satis…es

AA 1 = A 1A = Ik:

For non-singular A and C; some important properties include

 

 

 

 

AA 1

= A 1A = Ik

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

 

A 1

 

=

A0

1

 

 

 

 

 

 

(AC) 1

=

C 1A 1

 

 

 

 

 

 

(A + C) 1 = A 1 A 1 + C 1 1 C 1

 

 

 

A 1

(A + C) 1

1

 

 

 

 

 

 

 

 

= A 1

A 1 + C 1

A 1

 

Also, if A is an orthogonal matrix, then A

 

= A:

 

 

 

Another useful result for non-singular A is known as the Woodbury matrix identity

(A + BCD) 1 = A 1 A 1BC C + CDA 1BC 1 CDA 1:

(A.2)

In particular, for C =

 

1; B = b and D = b0 for vector b we …nd what is known as the

Sherman–

Morrison formula

 

 

 

 

 

 

 

 

 

 

A bb0 1 = A 1 + 1 b0A 1b 1 A 1bb0A 1:

 

 

(A.3)

APPENDIX A. MATRIX ALGEBRA

254

The following fact about inverting partitioned matrices is quite useful.

A11

A12

1

A11

A12

 

A 1

 

A 1

A21A 1

 

 

A21

 

 

= A21

 

=

11 2

 

2

22

 

 

A22

A22

A2211A21A111

11A

2211

(A.4)

where A11 2 = A11 A12A221A21 and A22 1 = A22 A21A111A12: There are alternative algebraic representations for the components. For example, using the Woodbury matrix identity you can

show the following alternative expressions

A11

= A111 + A111A12A2211A21A111

A22

= A221 + A221A21A1112A12A221

A12

=

 

A 1A

21

A 1

 

 

11

22 1

A21

=

 

A 1A

21

A 1

 

 

22

112

Even if a matrix A does not possess an inverse, we can still de…ne the Moore-Penrose generalized inverse A as the matrix which satis…es

AA A = A

A AA = A

AA is symmetric

A A is symmetric

For any matrix A; the Moore-Penrose generalized inverse A exists and is unique.

For example, if

A011

0

 

 

A =

 

 

 

0

 

 

then

A011

0

:

A =

 

 

0

 

 

A.6 Determinant

The determinant is a measure of the volume of a square matrix.

While the determinant is widely used, its precise de…nition is rarely needed. However, we present the de…nition here for completeness. Let A = (aij) be a general k k matrix . Let = (j1; :::; jk) denote a permutation of (1; :::; k) : There are k! such permutations. There is a unique count of the number of inversions of the indices of such permutations (relative to the natural order (1; :::; k) ;

and let " = +1 if this count is even and " = 1 if the count is odd. Then the determinant of A

is de…ned as

X

 

det A =

" a1j1 a2j2 akjk :

 

 

For example, if A is 2 2; then the two permutations of (1; 2) are (1; 2) and (2; 1) ; for which

"(1;2) = 1 and "(2;1) = 1. Thus

det A = "(1;2)a11a22 + "(2;1)a21a12

= a11a22 a12a21:

Some properties include

det (A) = det (A0)

det (cA) = ck det A

APPENDIX A. MATRIX ALGEBRA

255

det (AB) = (det A) (det B)

det A 1 = (det A) 1

det CA DB = (det D) det A BD 1C if det D 6= 0

det A =6 0 if and only if A is nonsingular.

Qk

If A is triangular (upper or lower), then det A = i=1 aii

If A is orthogonal, then det A = 1

A.7 Eigenvalues

The characteristic equation of a square matrix A is

det (A Ik) = 0:

The left side is a polynomial of degree k in so it has exactly k roots, which are not necessarily distinct and may be real or complex. They are called the latent roots or characteristic roots or eigenvalues of A. If i is an eigenvalue of A; then A iIk is singular so there exists a non-zero vector hi such that

(A iIk) hi = 0:

The vector hi is called a latent vector or characteristic vector or eigenvector of A corresponding to i:

We now state some useful properties. Let i and hi, i = 1; :::; k denote the k eigenvalues and eigenvectors of a square matrix A: Let be a diagonal matrix with the characteristic roots in the diagonal, and let H = [h1 hk]:

det(A) =

Q

k

 

 

i=1 i

 

k

 

i

tr(A) = Pi=1

A is non-singular if and only if all its characteristic roots are non-zero.

If A has distinct characteristic roots, there exists a nonsingular matrix P such that A =

P 1 P and P AP 1 = .

If A is symmetric, then A = H H0 and H0AH = ; and the characteristic roots are all real. A = H H0 is called the spectral decomposition of a matrix.

The characteristic roots of A 1 are 1 1; 2 1; ..., k 1:

The matrix H has the orthonormal properties H0H = I and HH0 = I.

H 1 = H0 and (H0) 1 = H

A.8 Positive De…niteness

We say that a k k symmetric square matrix A is positive semi-de…nite if for all c =6 0; c0Ac 0: This is written as A 0: We say that A is positive de…nite if for all c =6 0; c0Ac > 0: This is written as A > 0:

Some properties include:

APPENDIX A. MATRIX ALGEBRA

256

If A = G0G for some matrix G; then A is positive semi-de…nite. (For any c 6= 0; c0Ac =0 0 where = Gc:) If G has full rank, then A is positive de…nite.

If A is positive de…nite, then A is non-singular and A 1 exists. Furthermore, A 1 > 0:

A > 0 if and only if it is symmetric and all its characteristic roots are positive.

By the spectral decomposition, A = H H0 where H0H = I and is diagonal with nonnegative diagonal elements. All diagonal elements of are strictly positive if (and only if)

A > 0:

If A > 0 then A 1 = H 1H0:

If A 0 and rank (A) = r < k then A = H H0 where A is the Moore-Penrose

generalized inverse, and = diag 1 1; 2 1; :::; k 1; 0; :::; 0

If A > 0 we can …nd a matrix B such that A = BB0: We call B a matrix square root

of A: The matrix B need not be unique. One way to construct B is to use the spectral decomposition A = H H0 where is diagonal, and then set B = H 1=2:

A square matrix A is idempotent if AA = A: If A is idempotent and symmetric then all its characteristic roots equal either zero or one and is thus positive semi-de…nite. To see this, note that we can write A = H H0 where H is orthogonal and contains the r (real) characteristic

roots. Then

A = AA = H H0H H0 = H 2H0:

By the uniqueness of the characteristic roots, we deduce that 2 = and 2i = i for i = 1; :::; r: Hence they must equal either 0 or 1. It follows that the spectral decomposition of idempotent A

takes the form

 

 

 

 

 

A = H

I

k0 r

0

H0

(A.5)

 

0

with H0H = Ik. Additionally, tr(A) = rank(A):

A.9 Matrix Calculus

Let x = (x1; :::; xk) be k 1 and g(x) = g(x1; :::; xk) : Rk ! R: The vector derivative is

0@ g (x) 1

@g (x) = B @x1 .. C

@x @ . A

 

 

 

 

 

 

@

g (x)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

@xk

 

and

 

 

 

 

 

 

 

:

@

g (x) =

@

g (x)

@

g (x)

 

@x0

@x1

@xk

Some properties are now summarized.

@@x (a0x) = @@x (x0a) = a

@@x0 (Ax) = A

@@x (x0Ax) = (A + A0) x

@2 0 (x0Ax) = A + A0

@x@x

APPENDIX A. MATRIX ALGEBRA

257

A.10 Kronecker Products and the Vec Operator

Let A = [a1 a2 an] be m n: The vec of A; denoted by vec (A) ; is the mn 1 vector

 

0 a2

1

 

 

a1

 

 

vec (A) =

B a

n

C

:

B

C

B ...

C

 

@

 

A

 

Let A = (aij) be an m n matrix and let B be any matrix. The Kronecker product of A and B; denoted A B; is the matrix

A B =

 

a11B a12B

 

a1nB

7

:

6 ...

 

...

 

...

 

2

a21B

a22B

 

a2nB

3

 

 

4

 

B

a

B

 

a B

5

 

 

6 a

 

7

 

 

6

m1

 

m2

 

 

mn

7

 

Some important properties are now summarized. These results hold for matrices for which all matrix multiplications are conformable.

(A + B) C = A C + B C

(A B) (C D) = AC BD

A (B C) = (A B) C

(A B)0 = A0 B0

tr (A B) = tr (A) tr (B)

If A is m m and B is n n; det(A B) = (det (A))n (det (B))m

(A B) 1 = A 1 B 1

If A > 0 and B > 0 then A B > 0

vec (ABC) = (C0 A) vec (B)

tr (ABCD) = vec (D0)0 (C0 A) vec (B)

A.11 Vector and Matrix Norms and Inequalities

The Euclidean norm of an m 1 vector a is

 

=

a

 

1=2

 

 

 

 

kak

 

a0m

 

 

!

1=2

 

 

 

Xi

 

 

 

 

 

 

 

=

 

 

ai2

 

 

:

 

 

 

=1

 

 

 

 

 

 

 

The Euclidean norm of an m n matrix A is

 

 

 

 

 

 

 

 

kAk = kvec (A)k

 

 

 

 

 

 

 

A

A

1=2

 

 

 

 

 

=

tr m 0

 

n

 

 

 

A

1=2

 

 

 

@Xi

X

 

 

 

 

 

=

0

 

 

 

aij2

1

 

:

=1 j=1

Taking the square root of each side yields the result.

APPENDIX A. MATRIX ALGEBRA

 

 

 

 

 

 

 

 

 

 

258

A useful calculation is for any m 1 vectors a and b, using (A.1),

 

ab0

= tr ba0ab0 1=2 = b0ba0a 1=2

= kak kbk

and in particular

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Some useful inequalities are now given: aa0 = kak2

 

 

 

(A.6)

Schwarz Inequality: For any m 1 vectors a and b,

 

 

 

 

 

 

 

 

 

 

 

 

a

 

b

 

:

 

(A.7)

 

 

 

 

 

a0b

 

k

 

k

 

Schwarz Matrix Inequality: For any m

n matrices A and B;

 

 

 

 

 

 

 

 

A

 

B

 

:

(A.8)

 

 

 

 

A0B

 

 

k k

k

Triangle Inequality:

 

 

 

 

 

 

k

 

 

 

 

 

For any m n matrices A and B;

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

kA + Bk kAk + kBk :

(A.9)

Trace Inequality. For any m m matrices A and B such that A is symmetric and B 0

 

 

 

tr (AB) max (A) tr (B)

(A.10)

where max (A) is the largest eigenvalue of A.

Proof of Schwarz Inequality: First, suppose that kbk = 0: Then b = 0 and both ja0bj = 0 and

kak kbk = 0 so the inequality is true. Second, suppose that kbk > 0 and de…ne c = a b

b0b

 

1 b0a:

Since c is a vector, c0c 0: Thus

 

 

 

 

 

 

 

Rearranging, this implies that

0 c0c = a0a

a0b 2 = b0b :

 

 

 

 

 

 

2

 

 

 

 

 

 

a0b

 

a0a b0b :

 

 

 

Proof of Schwarz Matrix Inequality: Partition A = [a1; :::; an] and B = [b1; :::; bn]. Then by partitioned matrix multiplication, the de…nition of the matrix Euclidean norm and the Schwarz inequality

 

 

 

 

 

a10 b1 a10 b2

..

 

 

 

 

 

 

..

 

 

..

 

 

 

A0B

 

=

 

a20 b1

a20 b2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.

 

 

.

.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

k

a1

k k

b1

k k

a1

k k

k

 

 

 

 

 

 

 

 

 

 

b2

 

 

 

 

 

 

 

k

 

k.k

 

k k

 

k.k

k .

 

 

 

 

a2

b1

a2

 

 

 

 

 

 

 

 

 

 

b2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

..

 

 

 

..

 

..

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0 11=2

=@Xn Xn kaik2 kbjk2A

 

i=1 j=1

 

 

n kbik2!1=2

 

 

=

n

kaik2!1=2

 

 

 

Xi

 

 

 

X

 

 

 

 

=1

 

 

 

i=1

 

 

 

 

n

m

 

1=2

n

m

 

1=2

 

A

 

A

 

 

@XX

 

@Xi

X

 

=

0

aji2

1

0

kbjik2

1

 

 

i=1 j=1

 

 

=1 j=1

 

 

= kAk kBk

APPENDIX A. MATRIX ALGEBRA

259

Proof of Triangle Inequality: Let a = vec (A) and b = vec (B) . Then by the de…nition of the matrix norm and the Schwarz Inequality

kA + Bk2 = ka + bk2

= a0a + 2a0b + b0b

a0b

kak2 + 2 kak kbk + kbk2

= (kak + kbk)2 = (kAk + kBk)20a + 2 a0b + b

Proof of Trace Inequality. By the spectral decomposition for symmetric matices, A = H H0 where has the eigenvalues j of A on the diagonal and H is orthonormal. De…ne C = H0BH which has non-negative diagonal elements Cjj since B is positive semi-de…nite. Then

 

m

 

 

m

 

 

 

 

tr (AB) = tr ( C) =

X

max

 

Xj

 

=

 

(A) tr (C)

 

jCjj

j

j

=1

jj

 

max

 

 

j=1

 

 

 

 

 

 

where the inequality uses the fact that Cjj 0: But note that

 

 

 

 

tr (C) = tr H0BH = tr HH0B = tr (B)

 

since H is orthonormal. Thus tr (AB) max (A) tr (B) as stated.

 

Appendix B

Probability

B.1 Foundations

The set S of all possible outcomes of an experiment is called the sample space for the experiment. Take the simple example of tossing a coin. There are two outcomes, heads and tails, so we can write S = fH; T g: If two coins are tossed in sequence, we can write the four outcomes as

S = fHH; HT; T H; T T g:

An event A is any collection of possible outcomes of an experiment. An event is a subset of S; including S itself and the null set ;: Continuing the two coin example, one event is A = fHH; HT g; the event that the …rst coin is heads. We say that A and B are disjoint or mutually exclusive if A \ B = ;: For example, the sets fHH; HT g and fT Hg are disjoint. Furthermore, if the sets A1; A2; ::: are pairwise disjoint and [1i=1Ai = S; then the collection A1; A2; ::: is called a partition

of S:

 

 

 

The following are elementary set operations:

 

 

Union: A [ B = fx : x 2 A or x 2 Bg:

 

 

 

Intersection: A \ B = fx : x 2 A and x 2 Bg:

 

 

Complement: Ac = fx : x 2= Ag:

 

 

 

The following are useful properties of set operations.

 

Communtatitivity: A [ B = B [ A;

A \ B = B \ A:

Associativity: A [ (B [ C) = (A [ B) [ C;

A \ (B \ C) = (A \ B) \ C:

Distributive Laws: A \(B [cC) = (A \ B) [(A \ C) ;c

A [(B \ C) = (A [ B) \(A [ C) :

DeMorgan’s Laws: (A [ B) = Ac \ Bc;

 

(A \ B)

= Ac [ Bc:

A probability function assigns probabilities (numbers between 0 and 1) to events A in S: This is straightforward when S is countable; when S is uncountable we must be somewhat more careful: A set B is called a sigma algebra (or Borel …eld) if ; 2 B , A 2 B implies Ac 2 B, and

A1; A2; ::: 2 B implies [1i=1Ai 2 B. A simple example is f;; Sg which is known as the trivial sigma algebra. For any sample space S; let B be the smallest sigma algebra which contains all of the open

sets in S: When S is countable, B is simply the collection of all subsets of S; including ; and S: When S is the real line, then B is the collection of all open and closed intervals. We call B the sigma algebra associated with S: We only de…ne probabilities for events contained in B.

We now can give the axiomatic de…nition of probability. Given S and B, a probability function Pr satis…es Pr(S) = 1; Pr(A) 0 for all A 2 B, and if A1; A2; ::: 2 B are pairwise disjoint, then

Pr (

1

Ai) =

 

1

Pr(Ai):

 

[i=1

 

 

 

i=1

 

Some

important properties of the probability function include the following

 

P

 

 

Pr (;) = 0

Pr(A) 1

Pr (Ac) = 1 Pr(A)

260

APPENDIX B. PROBABILITY

261

Pr (B \ Ac) = Pr(B) Pr(A \ B)

Pr (A [ B) = Pr(A) + Pr(B) Pr(A \ B)

If A B then Pr(A) Pr(B)

Bonferroni’s Inequality: Pr(A \ B) Pr(A) + Pr(B) 1

Boole’s Inequality: Pr (A [ B) Pr(A) + Pr(B)

For some elementary probability models, it is useful to have simple rules to count the number of objects in a set. These counting rules are facilitated by using the binomial coe¢ cients which are

de…ned for nonnegative integers n and r; n r; as

 

 

n

 

n!

 

 

r

=

 

:

r! (n

r)!

 

 

 

 

 

When counting the number of objects in a set, there are two important distinctions. Counting may be with replacement or without replacement. Counting may be ordered or unordered. For example, consider a lottery where you pick six numbers from the set 1, 2, ..., 49. This selection is without replacement if you are not allowed to select the same number twice, and is with replacement if this is allowed. Counting is ordered or not depending on whether the sequential order of the numbers is relevant to winning the lottery. Depending on these two distinctions, we have four expressions for the number of objects (possible arrangements) of size r from n objects.

 

Without

With

 

 

Replacement

Replacement

Ordered

 

n!

 

nr

 

 

(n r)!

 

 

 

 

 

Unordered

 

n

r

1

 

r

n+r

 

In the lottery example, if counting is unordered and without replacement, the number of po-

tential combinations is 496 = 13; 983; 816.

If Pr(B) > 0 the conditional probability of the event A given the event B is

Pr (A j B) = Pr (A \ B):

Pr(B)

For any B; the conditional probability function is a valid probability function where S has been replaced by B: Rearranging the de…nition, we can write

Pr(A \ B) = Pr (A j B) Pr(B)

which is often quite useful. We can say that the occurrence of B has no information about the likelihood of event A when Pr (A j B) = Pr(A); in which case we …nd

Pr(A \ B) = Pr (A) Pr(B)

(B.1)

We say that the events A and B are statistically independent when (B.1) holds. Furthermore, we say that the collection of events A1; :::; Ak are mutually independent when for any subset fAi : i 2 Ig;

Pr

Ai! = Pr (Ai) :

i2I

i2I

\

Y

Theorem 1 (Bayes’Rule). For any set B and any partition A1; A2; ::: of the sample space, then for each i = 1; 2; :::

Pr (A

 

B) =

Pr (B j Ai) Pr(Ai)

i j

Pj1=1 Pr (B j Aj) Pr(Aj)

 

 

APPENDIX B. PROBABILITY

262

B.2 Random Variables

A random variable X is a function from a sample space S into the real line. This induces a new sample space –the real line –and a new probability function on the real line. Typically, we denote random variables by uppercase letters such as X; and use lower case letters such as x for potential values and realized values. (This is in contrast to the notation adopted for most of the textbook.) For a random variable X we de…ne its cumulative distribution function (CDF) as

F (x) = Pr (X x) :

(B.2)

Sometimes we write this as FX (x) to denote that it is the CDF of X: A function F (x) is a CDF if and only if the following three properties hold:

1.limx! 1 F (x) = 0 and limx!1 F (x) = 1

2.F (x) is nondecreasing in x

3.F (x) is right-continuous

We say that the random variable X is discrete if F (x) is a step function. In the latter case, the range of X consists of a countable set of real numbers 1; :::; r: The probability function for X takes the form

 

 

 

Pr (X = j) = j;

j = 1; :::; r

(B.3)

where 0 j 1 and

r

j = 1.

 

 

 

j=1

continuous

if F (x) is continuous in x: In this case Pr(X =

We say that the

random variable X is

P

 

 

 

 

) = 0 for all 2 R so the representation (B.3) is unavailable. Instead, we represent the relative probabilities by the probability density function (PDF)

f(x) =

d

 

F (x)

dx

 

 

so that

x

 

 

 

F (x) = Z 1 f(u)du

and

 

 

Zab f(u)du:

Pr (a X b) =

These expressions only make sense if F (x) is di¤erentiable. While there are examples of continuous random variables which do not possess a PDF, these cases are unusualRand are typically ignored.

A function f(x) is a PDF if and only if f(x) 0 for all x 2 R and 11 f(x)dx:

B.3 Expectation

For any measurable real function g; we de…ne the mean or expectation Eg(X) as follows. If

X is discrete,

r

Eg(X) = g( j) j;

 

=1

 

and if X is continuous

Xj

 

1

 

 

 

 

Eg(X) = Z 1 g(x)f(x)dx:

 

The latter is well de…ned and …nite if

 

 

Z 1

(B.4)

 

jg(x)j f(x)dx < 1:

1

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]