Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
35
Добавлен:
10.05.2015
Размер:
92.1 Кб
Скачать

matrix identities

sam roweis

(revised June 1999)

note that a,b,c and A,B,C do not depend on X,Y,x,y or z

0.1 basic formulae

A(B + C) = AB + AC

(1a)

(A + B)T = AT + BT

(1b)

(AB)T = BT AT

(1c)

if individual inverses exist (AB) 1

= B 1A 1

(1d)

(A 1)T

= (AT ) 1

(1e)

0.2 trace, determinant and rank

 

 

 

 

jABj = jAjjBj

(2a)

 

 

 

 

jA 1j

1

 

 

 

 

 

 

=

 

 

(2b)

 

 

 

 

jAj

 

 

 

 

jAj

= Yevals

(2c)

 

 

 

 

Tr [A]

= evals

(2d)

if the cyclic products are well de ned;

X

 

Tr [ABC : : :] = Tr [BC : : : A] = Tr [C : : : AB] = : : :

 

 

 

(2e)

rank [A] = rank

AT A = rank

AAT

 

 

 

 

 

 

(2f)

 

 

 

biggest eval

 

 

 

 

 

 

 

 

 

 

 

 

condition number = = r

 

 

 

 

 

 

 

(2g)

smallest eval

 

 

 

 

derivatives of scalar forms with respect to scalars, vectors, or matricies are indexed in the obvious way. similarly, the indexing for derivatives of vectors and matrices with respect to scalars is straightforward.

1

0.3derivatives of traces

 

 

 

 

 

 

 

@Tr [X]

= I

(3a)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

@X

 

 

 

 

@Tr [XA]

=

@Tr [AX]

= AT

(3b)

 

@X

 

 

 

 

 

 

@X

 

 

@Tr XT A

 

@Tr AXT

 

 

 

 

 

 

 

 

 

=

 

 

 

 

 

= A

(3c)

@X

 

 

 

@X

 

 

 

 

 

 

 

 

 

XT AX

 

 

 

 

 

@Tr @X

 

= (A + AT )X

(3d)

 

 

 

@Tr X 1A

 

 

 

 

 

 

 

 

 

@X

 

= X 1AT X 1

(3e)

0.4derivatives of determinants

 

 

 

@jAXBj

=

AXB

(X 1)T =

AXB

(XT ) 1

(4a)

 

 

 

@X

@ ln jXj

 

 

j

j

j

j

 

 

 

= (X 1)T

= (XT ) 1

 

 

(4b)

@X

 

 

 

 

 

 

 

 

 

 

@ ln jX(z)j

= Tr

X 1

@X

 

(4c)

 

@z

 

 

 

 

 

@z

 

 

@jXT AXj

=

XT AX

(AX(XT AX) 1

+ AT X(XT AT X) 1) (4d)

@X

 

j

 

j

 

 

 

0.5derivatives of scalar forms

 

 

 

@(aT x)

=

@(xT a)

= a

(5a)

 

 

 

@x

@x

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

@(xT Ax)

= (A + AT )x

(5b)

 

 

 

 

 

 

 

 

 

@x

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

@(aT Xb)

= abT

(5c)

 

 

 

 

 

 

 

 

 

@X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

@(aT XT b)

= baT

(5d)

 

 

 

 

 

 

 

 

 

@X

 

 

 

 

 

 

 

 

 

 

 

 

 

@(aT Xa)

@(aT XT a)

 

 

 

 

 

 

=

 

 

 

 

 

= aaT

(5e)

 

 

 

 

 

 

 

@X

 

 

@X

 

 

 

 

 

 

 

 

 

@(aT XT CXb)

= CT XabT + CXbaT

(5f)

 

 

 

 

 

 

 

 

@X

 

 

 

 

 

 

 

 

 

 

 

@ (Xa + b)T C(Xa + b)

 

 

 

 

@X

 

 

 

 

= (C + CT )(Xa + b)aT

(5g)

2

the derivative of one vector y with respect to another vector x is a matrix whose (i; j)th element is @y(j)=@x(i). such a derivative should be written as @yT =@x in which case it is the Jacobian matrix of y wrt x. its determinant

R

represents the ratio of the hypervolume dy to that of dx so that

f(y)dy =

f(y(x))j@y

T

=@xjdx.

T

=@x

T

and

 

however, the sloppy forms @y=@x, @yR

 

@y=@xT are often used for this Jacobain matrix.

0.6derivatives of vector/matrix forms

 

 

 

@(X 1)

 

 

 

 

 

 

@X

 

 

 

 

 

 

 

 

 

= X 1

 

 

X 1

(6a)

 

 

 

 

 

 

@z

@z

 

 

 

 

@(Ax)

 

= A

@x

 

 

 

 

(6b)

 

 

 

 

 

 

@z

@z

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

@(XY)

 

= X

@Y

 

+

@X

Y

(6c)

 

 

 

 

 

 

@z

 

@z

 

 

 

 

 

 

 

 

 

 

@z

 

 

@(AXB)

 

= A

@X

B

(6d)

 

 

 

 

 

 

@z

 

 

 

 

 

 

 

 

 

 

@z

 

 

 

 

 

 

 

@(xT A)

 

= A

 

 

 

 

 

 

 

 

(6e)

 

 

 

 

 

 

@x

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

@(xT )

 

= I

 

 

 

 

 

 

 

 

(6f)

 

 

 

 

 

 

@x

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

@(xT AxxT )

 

= (A + AT )xxT + xT AxI

(6g)

 

 

 

@x

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.7constrained maximization

the maximum over x of the quadratic form:

T x

1

 

2xT A 1x

(7a)

subject to the J conditions cj(x) = 0 is given by:

A + AC ; = 4(CT AC)CT A

(7b)

where the jth column of C is @cj(x)=@x

0.8symmetric matrices

have real eigenvalues, though perhaps not distinct and can always be diagonalized to the form:

A = C CT

(8)

3

where the columns of C are (orthonormal) eigenvectors (i.e. CCT = I) and the diagonal of has the eigenvalues

0.9block matrices

for conformably partitioned block matrices, addition and multiplication is performed by adding and multiplying blocks in exactly the same way as scalar elements of regular matrices

however, determinants and inverses of block matrices are very tricky; for 2 blocks by 2 blocks the results are:

A11

A12

 

 

 

 

 

 

 

 

A21

A221

= jA22j jF11j = jA11j jF22j

 

(9a)

 

A

A

 

 

 

 

F 1

A 1A F 1

 

 

A21

A22

 

 

F221A21A111

F221

 

 

11

 

12

 

=

11

11

12 22

 

(9b)

 

 

 

 

 

 

=

A111 + A111A12F221A21A111

 

F111A12A221

 

where

 

 

 

 

 

A221A21F111

 

A221 + A221A21F111A12A221

 

 

 

 

 

 

 

 

 

 

 

 

F11 = A11 A12A221A21

F22 = A22 A21A111A12

 

for block diagonal matrices things are much easier:

 

A011

A0221

= jA11jjA22j

(9d)

 

A

0

 

 

 

 

A 1 0

 

0 A22

 

 

0 A221

 

11

 

 

 

=

11

(9e)

0.10matrix inversion lemma (sherman-morrison-woodbury)

using the above results for block matrices we can make some substitutions and get the following important results:

(A + XBXT ) 1 = A 1 A 1X(B 1 + XT A 1X) 1XT A 1

(10)

jA + XBXT j = jBjjAjjB 1 + XT A 1Xj

(11)

where A and B are square and invertible matrices but need not be of the same dimension. this lemma often allows a really hard inverse to be converted into an easy inverse. the most typical example of this is when A is large but diagonal, and X has many rows but few columns

4

Соседние файлы в папке Literature