Econometrics2011
.pdfAPPENDIX A. MATRIX ALGEBRA |
|
|
|
|
|
|
|
|
|
|
|
253 |
Also, for k r A and r k B we have |
|
|
|
|
|
|
|
|
|
|
|
|
tr (AB) = tr (BA) : |
|
|
|
(A.1) |
||||||||
Indeed, |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
a10 b1 a10 b2 |
|
a10 bk |
3 |
|||||||
|
2 a20 b1 a20 b2 |
a20 bk |
||||||||||
tr (AB) = tr |
6 ... |
|
... |
|
|
... |
|
7 |
||||
|
4 |
k0 |
|
1 |
k0 |
|
2 |
|
k0 |
|
k |
5 |
|
6 |
b |
b |
|
b |
7 |
||||||
|
6 a |
|
a |
|
|
a |
|
7 |
Xk
=a0ibi
i=1
Xk
=b0iai
i=1
= tr (BA) :
A.5 Rank and Inverse
The rank of the k r matrix (r k)
A = a1 a2 ar
is the number of linearly independent columns aj; and is written as rank (A) : We say that A has full rank if rank (A) = r:
A square k k matrix A is said to be nonsingular if it is has full rank, e.g. rank (A) = k: This means that there is no k 1 c =6 0 such that Ac = 0:
If a square k k matrix A is nonsingular then there exists a unique matrix k k matrix A 1 called the inverse of A which satis…es
AA 1 = A 1A = Ik:
For non-singular A and C; some important properties include
|
|
|
|
AA 1 |
= A 1A = Ik |
|
|
|||||
|
|
|
|
|
|
0 |
|
|
|
|
|
|
|
|
|
|
A 1 |
|
= |
A0 |
1 |
|
|
||
|
|
|
|
(AC) 1 |
= |
C 1A 1 |
|
|
||||
|
|
|
|
(A + C) 1 = A 1 A 1 + C 1 1 C 1 |
|
|||||||
|
|
A 1 |
(A + C) 1 |
1 |
|
|
|
|
|
|||
|
|
|
= A 1 |
A 1 + C 1 |
A 1 |
|
||||||
Also, if A is an orthogonal matrix, then A |
|
= A: |
|
|
|
|||||||
Another useful result for non-singular A is known as the Woodbury matrix identity |
||||||||||||
(A + BCD) 1 = A 1 A 1BC C + CDA 1BC 1 CDA 1: |
(A.2) |
|||||||||||
In particular, for C = |
|
1; B = b and D = b0 for vector b we …nd what is known as the |
Sherman– |
|||||||||
Morrison formula |
|
|
|
|
|
|
|
|
|
|
||
A bb0 1 = A 1 + 1 b0A 1b 1 A 1bb0A 1: |
|
|||||||||||
|
(A.3) |
APPENDIX A. MATRIX ALGEBRA |
254 |
The following fact about inverting partitioned matrices is quite useful.
A11 |
A12 |
1 |
A11 |
A12 |
|
A 1 |
|
A 1 |
A21A 1 |
|
|
A21 |
|
|
= A21 |
|
= |
11 2 |
|
2 |
22 |
|
|
A22 |
A22 |
A2211A21A111 |
11A |
2211 |
(A.4) |
where A11 2 = A11 A12A221A21 and A22 1 = A22 A21A111A12: There are alternative algebraic representations for the components. For example, using the Woodbury matrix identity you can
show the following alternative expressions
A11 |
= A111 + A111A12A2211A21A111 |
||||
A22 |
= A221 + A221A21A1112A12A221 |
||||
A12 |
= |
|
A 1A |
21 |
A 1 |
|
|
11 |
22 1 |
||
A21 |
= |
|
A 1A |
21 |
A 1 |
|
|
22 |
112 |
Even if a matrix A does not possess an inverse, we can still de…ne the Moore-Penrose generalized inverse A as the matrix which satis…es
AA A = A
A AA = A
AA is symmetric
A A is symmetric
For any matrix A; the Moore-Penrose generalized inverse A exists and is unique.
For example, if |
A011 |
0 |
|
|
A = |
|
|||
|
|
0 |
|
|
then |
A011 |
0 |
: |
|
A = |
||||
|
|
0 |
|
|
A.6 Determinant
The determinant is a measure of the volume of a square matrix.
While the determinant is widely used, its precise de…nition is rarely needed. However, we present the de…nition here for completeness. Let A = (aij) be a general k k matrix . Let = (j1; :::; jk) denote a permutation of (1; :::; k) : There are k! such permutations. There is a unique count of the number of inversions of the indices of such permutations (relative to the natural order (1; :::; k) ;
and let " = +1 if this count is even and " = 1 if the count is odd. Then the determinant of A |
|
is de…ned as |
X |
|
|
det A = |
" a1j1 a2j2 akjk : |
|
|
For example, if A is 2 2; then the two permutations of (1; 2) are (1; 2) and (2; 1) ; for which
"(1;2) = 1 and "(2;1) = 1. Thus
det A = "(1;2)a11a22 + "(2;1)a21a12
= a11a22 a12a21:
Some properties include
det (A) = det (A0)
det (cA) = ck det A
APPENDIX A. MATRIX ALGEBRA |
255 |
det (AB) = (det A) (det B)
det A 1 = (det A) 1
det CA DB = (det D) det A BD 1C if det D 6= 0
det A =6 0 if and only if A is nonsingular.
Qk
If A is triangular (upper or lower), then det A = i=1 aii
If A is orthogonal, then det A = 1
A.7 Eigenvalues
The characteristic equation of a square matrix A is
det (A Ik) = 0:
The left side is a polynomial of degree k in so it has exactly k roots, which are not necessarily distinct and may be real or complex. They are called the latent roots or characteristic roots or eigenvalues of A. If i is an eigenvalue of A; then A iIk is singular so there exists a non-zero vector hi such that
(A iIk) hi = 0:
The vector hi is called a latent vector or characteristic vector or eigenvector of A corresponding to i:
We now state some useful properties. Let i and hi, i = 1; :::; k denote the k eigenvalues and eigenvectors of a square matrix A: Let be a diagonal matrix with the characteristic roots in the diagonal, and let H = [h1 hk]:
det(A) = |
Q |
k |
|
|
i=1 i |
||
|
k |
|
i |
tr(A) = Pi=1 |
A is non-singular if and only if all its characteristic roots are non-zero.
If A has distinct characteristic roots, there exists a nonsingular matrix P such that A =
P 1 P and P AP 1 = .
If A is symmetric, then A = H H0 and H0AH = ; and the characteristic roots are all real. A = H H0 is called the spectral decomposition of a matrix.
The characteristic roots of A 1 are 1 1; 2 1; ..., k 1:
The matrix H has the orthonormal properties H0H = I and HH0 = I.
H 1 = H0 and (H0) 1 = H
A.8 Positive De…niteness
We say that a k k symmetric square matrix A is positive semi-de…nite if for all c =6 0; c0Ac 0: This is written as A 0: We say that A is positive de…nite if for all c =6 0; c0Ac > 0: This is written as A > 0:
Some properties include:
APPENDIX A. MATRIX ALGEBRA |
256 |
If A = G0G for some matrix G; then A is positive semi-de…nite. (For any c 6= 0; c0Ac =0 0 where = Gc:) If G has full rank, then A is positive de…nite.
If A is positive de…nite, then A is non-singular and A 1 exists. Furthermore, A 1 > 0:
A > 0 if and only if it is symmetric and all its characteristic roots are positive.
By the spectral decomposition, A = H H0 where H0H = I and is diagonal with nonnegative diagonal elements. All diagonal elements of are strictly positive if (and only if)
A > 0:
If A > 0 then A 1 = H 1H0:
If A 0 and rank (A) = r < k then A = H H0 where A is the Moore-Penrose
generalized inverse, and = diag 1 1; 2 1; :::; k 1; 0; :::; 0
If A > 0 we can …nd a matrix B such that A = BB0: We call B a matrix square root
of A: The matrix B need not be unique. One way to construct B is to use the spectral decomposition A = H H0 where is diagonal, and then set B = H 1=2:
A square matrix A is idempotent if AA = A: If A is idempotent and symmetric then all its characteristic roots equal either zero or one and is thus positive semi-de…nite. To see this, note that we can write A = H H0 where H is orthogonal and contains the r (real) characteristic
roots. Then
A = AA = H H0H H0 = H 2H0:
By the uniqueness of the characteristic roots, we deduce that 2 = and 2i = i for i = 1; :::; r: Hence they must equal either 0 or 1. It follows that the spectral decomposition of idempotent A
takes the form |
|
|
|
|
|
A = H |
I |
k0 r |
0 |
H0 |
(A.5) |
|
0 |
with H0H = Ik. Additionally, tr(A) = rank(A):
A.9 Matrix Calculus
Let x = (x1; :::; xk) be k 1 and g(x) = g(x1; :::; xk) : Rk ! R: The vector derivative is
0@ g (x) 1
@g (x) = B @x1 .. C
@x @ . A
|
|
|
|
|
|
@ |
g (x) |
|
||
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
@xk |
|
|||
and |
|
|
|
|
|
|
|
: |
||
@ |
g (x) = |
@ |
g (x) |
@ |
g (x) |
|||||
|
@x0 |
@x1 |
@xk |
Some properties are now summarized.
@@x (a0x) = @@x (x0a) = a
@@x0 (Ax) = A
@@x (x0Ax) = (A + A0) x
@2 0 (x0Ax) = A + A0
@x@x
APPENDIX A. MATRIX ALGEBRA |
257 |
A.10 Kronecker Products and the Vec Operator
Let A = [a1 a2 an] be m n: The vec of A; denoted by vec (A) ; is the mn 1 vector
|
0 a2 |
1 |
|
|
|
a1 |
|
|
|
vec (A) = |
B a |
n |
C |
: |
B |
C |
|||
B ... |
C |
|||
|
@ |
|
A |
|
Let A = (aij) be an m n matrix and let B be any matrix. The Kronecker product of A and B; denoted A B; is the matrix
A B = |
|
a11B a12B |
|
a1nB |
7 |
: |
|||
6 ... |
|
... |
|
... |
|||||
|
2 |
a21B |
a22B |
|
a2nB |
3 |
|
||
|
4 |
|
B |
a |
B |
|
a B |
5 |
|
|
6 a |
|
7 |
|
|||||
|
6 |
m1 |
|
m2 |
|
|
mn |
7 |
|
Some important properties are now summarized. These results hold for matrices for which all matrix multiplications are conformable.
(A + B) C = A C + B C
(A B) (C D) = AC BD
A (B C) = (A B) C
(A B)0 = A0 B0
tr (A B) = tr (A) tr (B)
If A is m m and B is n n; det(A B) = (det (A))n (det (B))m
(A B) 1 = A 1 B 1
If A > 0 and B > 0 then A B > 0
vec (ABC) = (C0 A) vec (B)
tr (ABCD) = vec (D0)0 (C0 A) vec (B)
A.11 Vector and Matrix Norms and Inequalities
The Euclidean norm of an m 1 vector a is
|
= |
a |
|
1=2 |
|
|
|
|
|||
kak |
|
a0m |
|
|
! |
1=2 |
|
||||
|
|
Xi |
|
|
|
|
|
|
|||
|
= |
|
|
ai2 |
|
|
: |
|
|||
|
|
=1 |
|
|
|
|
|
|
|
||
The Euclidean norm of an m n matrix A is |
|
|
|
|
|
|
|
|
|||
kAk = kvec (A)k |
|
|
|
|
|
||||||
|
|
A |
A |
1=2 |
|
|
|
|
|||
|
= |
tr m 0 |
|
n |
|
|
|
A |
1=2 |
|
|
|
|
@Xi |
X |
|
|
|
|
||||
|
= |
0 |
|
|
|
aij2 |
1 |
|
: |
=1 j=1
APPENDIX A. MATRIX ALGEBRA |
259 |
Proof of Triangle Inequality: Let a = vec (A) and b = vec (B) . Then by the de…nition of the matrix norm and the Schwarz Inequality
kA + Bk2 = ka + bk2
= a0a + 2a0b + b0b
a0b
kak2 + 2 kak kbk + kbk2
= (kak + kbk)2 = (kAk + kBk)20a + 2 a0b + b
Proof of Trace Inequality. By the spectral decomposition for symmetric matices, A = H H0 where has the eigenvalues j of A on the diagonal and H is orthonormal. De…ne C = H0BH which has non-negative diagonal elements Cjj since B is positive semi-de…nite. Then
|
m |
|
|
m |
|
|
|
|
tr (AB) = tr ( C) = |
X |
max |
|
Xj |
|
= |
|
(A) tr (C) |
|
jCjj |
j |
j |
=1 |
jj |
|
max |
|
|
j=1 |
|
|
|
|
|
|
|
where the inequality uses the fact that Cjj 0: But note that |
|
|
|
|
||||
tr (C) = tr H0BH = tr HH0B = tr (B) |
|
|||||||
since H is orthonormal. Thus tr (AB) max (A) tr (B) as stated. |
|
APPENDIX B. PROBABILITY |
261 |
Pr (B \ Ac) = Pr(B) Pr(A \ B)
Pr (A [ B) = Pr(A) + Pr(B) Pr(A \ B)
If A B then Pr(A) Pr(B)
Bonferroni’s Inequality: Pr(A \ B) Pr(A) + Pr(B) 1
Boole’s Inequality: Pr (A [ B) Pr(A) + Pr(B)
For some elementary probability models, it is useful to have simple rules to count the number of objects in a set. These counting rules are facilitated by using the binomial coe¢ cients which are
de…ned for nonnegative integers n and r; n r; as |
|
|
||
n |
|
n! |
|
|
r |
= |
|
: |
|
r! (n |
r)! |
|||
|
|
|
|
|
When counting the number of objects in a set, there are two important distinctions. Counting may be with replacement or without replacement. Counting may be ordered or unordered. For example, consider a lottery where you pick six numbers from the set 1, 2, ..., 49. This selection is without replacement if you are not allowed to select the same number twice, and is with replacement if this is allowed. Counting is ordered or not depending on whether the sequential order of the numbers is relevant to winning the lottery. Depending on these two distinctions, we have four expressions for the number of objects (possible arrangements) of size r from n objects.
|
Without |
With |
|
|||
|
Replacement |
Replacement |
||||
Ordered |
|
n! |
|
nr |
|
|
(n r)! |
|
|
||||
|
|
|
||||
Unordered |
|
n |
r |
1 |
||
|
r |
n+r |
|
In the lottery example, if counting is unordered and without replacement, the number of po-
tential combinations is 496 = 13; 983; 816.
If Pr(B) > 0 the conditional probability of the event A given the event B is
Pr (A j B) = Pr (A \ B):
Pr(B)
For any B; the conditional probability function is a valid probability function where S has been replaced by B: Rearranging the de…nition, we can write
Pr(A \ B) = Pr (A j B) Pr(B)
which is often quite useful. We can say that the occurrence of B has no information about the likelihood of event A when Pr (A j B) = Pr(A); in which case we …nd
Pr(A \ B) = Pr (A) Pr(B) |
(B.1) |
We say that the events A and B are statistically independent when (B.1) holds. Furthermore, we say that the collection of events A1; :::; Ak are mutually independent when for any subset fAi : i 2 Ig;
Pr |
Ai! = Pr (Ai) : |
i2I |
i2I |
\ |
Y |
Theorem 1 (Bayes’Rule). For any set B and any partition A1; A2; ::: of the sample space, then for each i = 1; 2; :::
Pr (A |
|
B) = |
Pr (B j Ai) Pr(Ai) |
|
i j |
Pj1=1 Pr (B j Aj) Pr(Aj) |
|||
|
|