Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
An_Introduction_to_Information_Retrieval.pdf
Скачиваний:
419
Добавлен:
26.03.2016
Размер:
6.9 Mб
Скачать

DRAFT! © April 1, 2009 Cambridge University Press. Feedback welcome.

403

18 Matrix decompositions and latent semantic indexing

On page 123 we introduced the notion of a term-document matrix: an M × N matrix C, each of whose rows represents a term and each of whose columns represents a document in the collection. Even for a collection of modest size, the term-document matrix C is likely to have several tens of thousands of rows and columns. In Section 18.1.1 we first develop a class of operations from linear algebra, known as matrix decomposition. In Section 18.2 we use a special form of matrix decomposition to construct a low-rank approximation to the term-document matrix. In Section 18.3 we examine the application of such low-rank approximations to indexing and retrieving documents, a technique referred to as latent semantic indexing. While latent semantic indexing has not been established as a significant force in scoring and ranking for information retrieval, it remains an intriguing approach to clustering in a number of domains including for collections of text documents (Section 16.6, page 372). Understanding its full potential remains an area of active research.

Readers who do not require a refresher on linear algebra may skip Section 18.1, although Example 18.1 is especially recommended as it highlights a property of eigenvalues that we exploit later in the chapter.

18.1Linear algebra review

We briefly review some necessary background in linear algebra. Let C be an M × N matrix with real-valued entries; for a term-document matrix, all RANK entries are in fact non-negative. The rank of a matrix is the number of linearly independent rows (or columns) in it; thus, rank(C) ≤ min{M, N}. A square r × r matrix all of whose off-diagonal entries are zero is called a diagonal matrix; its rank is equal to the number of non-zero diagonal entries. If all r diagonal entries of such a diagonal matrix are 1, it is called the identity

matrix of dimension r and represented by Ir.

For a square M × M matrix C and a vector ~x that is not all zeros, the values

Online edition (c) 2009 Cambridge UP

404

(18.1)

EIGENVALUE

(18.2)

18 Matrix decompositions and latent semantic indexing

of λ satisfying

C ~x = λ~x

are called the eigenvalues of C . The N-vector ~x satisfying Equation (18.1) for an eigenvalue λ is the corresponding right eigenvector. The eigenvector corresponding to the eigenvalue of largest magnitude is called the principal eigenvector. In a similar fashion, the left eigenvectors of C are the M-vectors y such that

~yT C = λ~yT.

The number of non-zero eigenvalues of C is at most rank(C).

The eigenvalues of a matrix are found by solving the characteristic equation, which is obtained by rewriting Equation (18.1) in the form (C λ IM)~x = 0. The eigenvalues of C are then the solutions of |(C λ IM)| = 0, where |S| denotes the determinant of a square matrix S. The equation |(C λ IM)| = 0 is an Mth order polynomial equation in λ and can have at most M roots, which are the eigenvalues of C. These eigenvalues can in general be complex, even if all entries of C are real.

We now examine some further properties of eigenvalues and eigenvectors, to set up the central idea of singular value decompositions in Section 18.2 below. First, we look at the relationship between matrix-vector multiplication and eigenvalues.

Example 18.1: Consider the matrix

 

 

 

.

S =

0

20

0

 

30

0

0

 

0

0

1

Clearly the matrix has rank 3, and has 3 non-zero eigenvalues λ1 = 30, λ2 = 20 and λ3 = 1, with the three corresponding eigenvectors

x~1 =

0

, x~2

=

1

and x~3

=

0

.

 

1

 

 

0

 

 

0

 

0

0

1

For each of the eigenvectors, multiplication by S acts as if we were multiplying the

eigenvector by a multiple of the identity matrix; the multiple is different for each

2

eigenvector. Now, consider an arbitrary vector, such as ~v = 4 . We can always 6

express~v as a linear combination of the three eigenvectors of S; in the current example

we have

4

= 2x~1 + 4x~2 + 6x~3.

~v =

 

 

2

 

 

6

Online edition (c) 2009 Cambridge UP

18.1 Linear algebra review

405

Suppose we multiply ~v by S:

S~v = S(2x~1 + 4x~2 + 6x~3)

=2Sx~1 + 4Sx~2 + 6Sx~3

=2λ1x~1 + 4λ2 x~2 + 6λ3 x~3

(18.3) = 60x~1 + 80x~2 + 6x~3.

Example 18.1 shows that even though ~v is an arbitrary vector, the effect of multiplication by S is determined by the eigenvalues and eigenvectors of S. Furthermore, it is intuitively apparent from Equation (18.3) that the product S~v is relatively unaffected by terms arising from the small eigenvalues of S; in our example, since λ3 = 1, the contribution of the third term on the right hand side of Equation (18.3) is small. In fact, if we were to completely ignore the contribution in Equation (18.3) from the third eigenvector corresponding

 

 

60

 

60

 

 

 

0

to λ3 = 1, then the product S~v would be computed to be

80

rather than

the correct product which is

80

; these two vectors are relatively close

 

 

6

 

 

 

to each other by any of

various metrics one could apply (such as the length

 

 

 

 

 

of their vector difference).

This suggests that the effect of small eigenvalues (and their eigenvectors) on a matrix-vector product is small. We will carry forward this intuition when studying matrix decompositions and low-rank approximations in Section 18.2. Before doing so, we examine the eigenvectors and eigenvalues of special forms of matrices that will be of particular interest to us.

For a symmetric matrix S, the eigenvectors corresponding to distinct eigenvalues are orthogonal. Further, if S is both real and symmetric, the eigenvalues are all real.

(18.4)

Example 18.2: Consider the real, symmetric matrix

S =

1

2

.

 

2

1

 

From the characteristic equation |S λ I| = 0, we have the quadratic (2 − λ)2 − 1 = 0, whose solutions yield the eigenvalues 3 and 1. The corresponding eigenvectors

 

1

and

1

are orthogonal.

−1

1

Online edition (c) 2009 Cambridge UP

406

18 Matrix decompositions and latent semantic indexing

18.1.1Matrix decompositions

MATRIX

DECOMPOSITION

EIGEN DECOMPOSITION

(18.5)

(18.6)

(18.7)

In this section we examine ways in which a square matrix can be factored into the product of matrices derived from its eigenvectors; we refer to this process as matrix decomposition. Matrix decompositions similar to the ones in this section will form the basis of our principal text-analysis technique in Section 18.3, where we will look at decompositions of non-square termdocument matrices. The square decompositions in this section are simpler and can be treated with sufficient mathematical rigor to help the reader understand how such decompositions work. The detailed mathematical derivation of the more complex decompositions in Section 18.2 are beyond the scope of this book.

We begin by giving two theorems on the decomposition of a square matrix into the product of three matrices of a special form. The first of these, Theorem 18.1, gives the basic factorization of a square real-valued matrix into three factors. The second, Theorem 18.2, applies to square symmetric matrices and is the basis of the singular value decomposition described in Theorem 18.3.

Theorem 18.1. (Matrix diagonalization theorem) Let S be a square real-valued

M × M matrix with M linearly independent eigenvectors. Then there exists an eigen decomposition

S = UΛU1,

where the columns of U are the eigenvectors of S and Λ is a diagonal matrix whose diagonal entries are the eigenvalues of S in decreasing order

λ1

λ2

 

 

 

 

, λ

 

λ

 

.

 

 

· · ·

λ

M

 

 

i

 

i+1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

If the eigenvalues are distinct, then this decomposition is unique.

To understand how Theorem 18.1 works, we note that U has the eigenvectors of S as columns

U = (u~1 u~2 · · · u~M) .

Then we have

SU = S (u~1 u~2 · · · u~M)

=

(λ1u~1 λ2u~2 · · · λMu~M)

 

 

 

 

 

=

(u~ u~

 

u~

)

 

λ1

λ2

 

 

 

.

 

1 2

· · ·

M

 

 

 

 

· · ·

λ

M

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Online edition (c) 2009 Cambridge UP

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]