- •1 Introduction
- •1.1 What makes eigenvalues interesting?
- •1.2 Example 1: The vibrating string
- •1.2.1 Problem setting
- •1.2.2 The method of separation of variables
- •1.3.3 Global functions
- •1.3.4 A numerical comparison
- •1.4 Example 2: The heat equation
- •1.5 Example 3: The wave equation
- •1.6 The 2D Laplace eigenvalue problem
- •1.6.3 A numerical example
- •1.7 Cavity resonances in particle accelerators
- •1.8 Spectral clustering
- •1.8.1 The graph Laplacian
- •1.8.2 Spectral clustering
- •1.8.3 Normalized graph Laplacians
- •1.9 Other sources of eigenvalue problems
- •Bibliography
- •2 Basics
- •2.1 Notation
- •2.2 Statement of the problem
- •2.3 Similarity transformations
- •2.4 Schur decomposition
- •2.5 The real Schur decomposition
- •2.6 Normal matrices
- •2.7 Hermitian matrices
- •2.8 Cholesky factorization
- •2.9 The singular value decomposition (SVD)
- •2.10 Projections
- •2.11 Angles between vectors and subspaces
- •Bibliography
- •3 The QR Algorithm
- •3.1 The basic QR algorithm
- •3.1.1 Numerical experiments
- •3.2 The Hessenberg QR algorithm
- •3.2.1 A numerical experiment
- •3.2.2 Complexity
- •3.3 The Householder reduction to Hessenberg form
- •3.3.2 Reduction to Hessenberg form
- •3.4 Improving the convergence of the QR algorithm
- •3.4.1 A numerical example
- •3.4.2 QR algorithm with shifts
- •3.4.3 A numerical example
- •3.5 The double shift QR algorithm
- •3.5.1 A numerical example
- •3.5.2 The complexity
- •3.6 The symmetric tridiagonal QR algorithm
- •3.6.1 Reduction to tridiagonal form
- •3.6.2 The tridiagonal QR algorithm
- •3.7 Research
- •3.8 Summary
- •Bibliography
- •4.1 The divide and conquer idea
- •4.2 Partitioning the tridiagonal matrix
- •4.3 Solving the small systems
- •4.4 Deflation
- •4.4.1 Numerical examples
- •4.6 Solving the secular equation
- •4.7 A first algorithm
- •4.7.1 A numerical example
- •4.8 The algorithm of Gu and Eisenstat
- •4.8.1 A numerical example [continued]
- •Bibliography
- •5 LAPACK and the BLAS
- •5.1 LAPACK
- •5.2 BLAS
- •5.2.1 Typical performance numbers for the BLAS
- •5.3 Blocking
- •5.4 LAPACK solvers for the symmetric eigenproblems
- •5.6 An example of a LAPACK routines
- •Bibliography
- •6 Vector iteration (power method)
- •6.1 Simple vector iteration
- •6.2 Convergence analysis
- •6.3 A numerical example
- •6.4 The symmetric case
- •6.5 Inverse vector iteration
- •6.6 The generalized eigenvalue problem
- •6.7 Computing higher eigenvalues
- •6.8 Rayleigh quotient iteration
- •6.8.1 A numerical example
- •Bibliography
- •7 Simultaneous vector or subspace iterations
- •7.1 Basic subspace iteration
- •7.2 Convergence of basic subspace iteration
- •7.3 Accelerating subspace iteration
- •7.4 Relation between subspace iteration and QR algorithm
- •7.5 Addendum
- •Bibliography
- •8 Krylov subspaces
- •8.1 Introduction
- •8.3 Polynomial representation of Krylov subspaces
- •8.4 Error bounds of Saad
- •Bibliography
- •9 Arnoldi and Lanczos algorithms
- •9.2 Arnoldi algorithm with explicit restarts
- •9.3 The Lanczos basis
- •9.4 The Lanczos process as an iterative method
- •9.5 An error analysis of the unmodified Lanczos algorithm
- •9.6 Partial reorthogonalization
- •9.7 Block Lanczos
- •9.8 External selective reorthogonalization
- •Bibliography
- •10 Restarting Arnoldi and Lanczos algorithms
- •10.2 Implicit restart
- •10.3 Convergence criterion
- •10.4 The generalized eigenvalue problem
- •10.5 A numerical example
- •10.6 Another numerical example
- •10.7 The Lanczos algorithm with thick restarts
- •10.8 Krylov–Schur algorithm
- •10.9 The rational Krylov space method
- •Bibliography
- •11 The Jacobi-Davidson Method
- •11.1 The Davidson algorithm
- •11.2 The Jacobi orthogonal component correction
- •11.2.1 Restarts
- •11.2.2 The computation of several eigenvalues
- •11.2.3 Spectral shifts
- •11.3 The generalized Hermitian eigenvalue problem
- •11.4 A numerical example
- •11.6 Harmonic Ritz values and vectors
- •11.7 Refined Ritz vectors
- •11.8 The generalized Schur decomposition
- •11.9.1 Restart
- •11.9.3 Algorithm
- •Bibliography
- •12 Rayleigh quotient and trace minimization
- •12.1 Introduction
- •12.2 The method of steepest descent
- •12.3 The conjugate gradient algorithm
- •12.4 Locally optimal PCG (LOPCG)
- •12.5 The block Rayleigh quotient minimization algorithm (BRQMIN)
- •12.7 A numerical example
- •12.8 Trace minimization
- •Bibliography
Chapter 8
Krylov subspaces
8.1Introduction
In the power method or in the inverse vector iteration we computed, up to normalization,
sequences of the form
x, Ax, A2x, . . .
The information available at the k-th step of the iteration is the single vector x(k) =
Akx/kAkxk. One can pose the question if discarding all the previous information x(0), . . . , x(k−1)
is not a too big waste of information. This question is not trivial to answer. On one hand there is a big increase of memory requirement, on the other hand exploiting all the information computed up to a certain iteration step can give much better approximations to the searched solution. As an example, let us consider the symmetric matrix
|
|
|
|
2 |
−1 |
|
|
|
|
|
|
|
T = |
51 |
2 |
|
−1 |
2 −1 |
|
|
|
|
|
|
|
|
|
|
... ... ... |
|
|
R50×50. |
||||||
π |
|
|
|
|||||||||
|
|
|
|
|
− |
1 |
2 |
− |
1 |
|
||
|
|
|
|
|
|
|
1 |
|
|
|
||
|
|
|
|
|
|
|
|
2 |
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
− |
|
|
|
|
|
the lowest eigenvalue of which is around 1. Let us choose x = [1, . . . , 1] and compute the first three iterates of inverse vector iteration, x, T −1x, and T −2x. We denote their Rayleigh
k |
ρ(k) |
ϑ(k) |
ϑ(k) |
ϑ(k) |
|
|
1 |
2 |
3 |
1 |
10.541456 |
10.541456 |
|
|
2 |
1.012822 |
1.009851 |
62.238885 |
|
3 |
0.999822 |
0.999693 |
9.910156 |
147.211990 |
Table 8.1: Ritz values ϑ(jk) vs. Rayleigh quotients ρ(k) of inverse vector iterates.
quotients by ρ(1), ρ(2), and ρ(3), respectively. The Ritz values ϑ(jk), 1 ≤ j ≤ k, obtained with the Rayleigh-Ritz procedure with Kk(x) = span(x, T −1x, . . . , T 1−kx), k = 1, 2, 3,
are given in Table 8.1. The three smallest eigenvalues of T are 0.999684, 3.994943, and 8.974416. The approximation errors are thus ρ(3) −λ1 ≈ 0.000′14 and ϑ(3)1 −λ1 ≈ 0.000′009,
which is 15 times smaller.
These results immediately show that the cost of three matrix vector multiplications can be much better exploited than with (inverse) vector iteration. We will consider in this
145
146 |
CHAPTER 8. KRYLOV SUBSPACES |
section a kind of space that is very often used in the iterative solution of linear systems as well as of eigenvalue problems.
8.2Definition and basic properties
Definition 8.1 The matrix
(8.1) Km(x) = Km(x, A) := [x, Ax, . . . , A(m−1)x] Fn×m
generated by the vector x Fn is called a Krylov matrix. Its columns span the Krylov
(sub)space
n
(8.2) Km(x) = Km(x, A) := span x, Ax, A2x, . . . , A(m−1)
o
x = R (Km(x)) Fn.
The Arnoldi and Lanczos algorithms are methods to compute an orthonormal basis of the Krylov space. Let
h |
i |
x, Ax, . . . , Ak−1x = Q(k)R(k)
be the QR factorization of the Krylov matrix Km(x). The Ritz values and Ritz vectors of A in this space are obtained by means of the k × k eigenvalue problem
(8.3) Q(k) AQ(k)y = ϑ(k)y.
If (ϑ(jk), yj ) is an eigenpair of (8.3) then (ϑ(jk), Q(k)yj ) is a Ritz pair of A in Km(x). The following properties of Krylov spaces are easy to verify [1, p.238]
1. Scaling. Km(x, A) = Km(αx, βA), α, β 6= 0.
2.Translation. Km(x, A − σI) = Km(x, A).
3.Change of basis. If U is unitary then UKm(U x, U AU) = Km(x, A). In fact,
Km(x, A) = [x, Ax, . . . , A(m−1)x]
=U[U x, (U AU)U x, . . . , (U AU)m−1U x],
=UKm(U x, U AU).
Notice that the scaling and translation invariance hold only for the Krylov subspace, not for the Krylov matrices.
What is the dimension of Km(x)? It is evident that for n × n matrices A the columns of the Krylov matrix Kn+1(x) are linearly dependent. (A subspace of Fn cannot have a dimension bigger than n.) On the other hand if u is an eigenvector corresponding to the eigenvalue λ then Au = λu and, by consequence, K2(u) = span{u, Au} = span{u} = K1(u). So, there is a smallest m, 1 ≤ m ≤ n, depending on x such that
|
K1(x) 6= K2(x) 6= · · · 6= Km(x) = Km+1(x) = · · · |
For this number m, |
|
(8.4) |
Km+1(x) = [x, Ax, . . . , Amx] Fn×m+1 |
8.3. POLYNOMIAL REPRESENTATION OF KRYLOV SUBSPACES |
147 |
||
has linearly dependant columns, i.e., there is a nonzero vector a Fm+1 such that |
|
||
(8.5) |
Km+1(x)a = p(A)x = 0, |
p(λ) = a0 + a1λ + · · · + amλm. |
|
The polynomial p(λ) is called the minimal polynomial of A relativ to x. By construction, the highest order coe cient am 6= 0.
If A is diagonalizable, then the degree of the minimal polynomial relativ to x has a simple geometric meaning (which does not mean that it is easily checked). Let
x = ui = [u1, . . . , um] |
.. |
|
, |
m |
1 |
|
|
Xi |
1 |
|
|
|
. |
|
|
|
|
|
|
=1 |
|
|
|
where the ui are eigenvectors of A, Aui = λiui, and λi |
6= λj for i 6= j. Notice that we |
||
have arranged the eigenvectors such that the coe cients in the above sum are all unity.
Now we have |
λi ui = [u1, . . . , um] |
.. |
|
, |
|
A x = |
|||||
|
m |
|
λ1k |
|
|
|
Xi |
λm |
|
||
k |
k |
|
. |
|
|
|
=1 |
k |
|
||
|
|
|
|
||
and, by consequence,
|
|
|
|
|
|
|
|
|
1 |
λ1 |
j |
|
|
|
|
|
|
1 |
λ2 |
||
|
|
|
|
|
|
|
. . |
|||
K x = [u1 |
, . . . , um] |
|
||||||||
|
| |
|
|
Cn×m |
|
.. .. |
||||
|
|
|
|
{z |
|
} |
|
|
||
|
|
|
|
|
|
|
|
1 |
λm |
|
|
|
|
|
|
|
|
|
| |
|
|
λ22 |
· · · |
λ2− |
|
|
|
|
λ12 |
· · · |
λ1j−1 |
|
|
||
|
j |
1 |
|
|||
.. ... |
.. |
|
. |
|||
. |
|
. |
|
|||
λm |
· · · |
λm− |
|
|
||
|
|
|
|
|
||
2 |
|
j |
1 |
|
|
|
{z |
|
|
|
|
} |
|
Since matrices of the form |
|
|
||
1 |
λ1 |
· · · |
||
.. .. ... |
||||
|
1 |
λ2 |
· · · |
|
1 |
λ |
s |
|
|
|
|
|
|
|
|
. . |
|
||
|
|
|
· · · |
|
|
|
|
|
Cm×j |
|
λ1s−1 |
|
|
|
6 |
6 |
.. |
|
||||
λ2s−1 |
|
|
Fs s |
|
|
λs 1 |
|
|
× , |
λi = λj for i = j, |
|
. |
|
|
|||
m− |
|
|
|
|
|
|
|
|
|
|
|
so-called Vandermonde matrices, are nonsingular if the λi are di erent (their determinant equals Qi=6 j (λi − λj )) the Krylov matrices Kj (x) are nonsingular for j ≤ m. Thus for diagonalizable matrices A we have
dim Kj (x, A) = min{j, m}
where m is the number of eigenvectors needed to represent x. The subspace Km(x) is the smallest invariant space that contains x.
8.3Polynomial representation of Krylov subspaces
In this section we assume A to be Hermitian. Let s Kj (x). Then
|
j−1 |
j−1 |
|
X |
Xi |
(8.6) |
s = ciAix = π(A)x, |
π(ξ) = ciξi. |
|
i=0 |
=0 |
148 |
CHAPTER 8. KRYLOV SUBSPACES |
Let Pj be the space of polynomials of degree ≤ j. Then (8.6) becomes |
|
(8.7) |
Kj (x) = {π(A)x | π Pj−1} . |
Let m be the smallest index for which Km(x) = Km+1(x). Then, for j ≤ m the mapping
X X
Pj−1 ciξi → ciAix Kj (x)
is bijective, while it is only surjective for j > m.
Let Q Fn×j be a matrix with orthonormal columns that span Kj (x), and let A′ = Q AQ. The spectral decomposition
A′X′ = X′Θ, X′ X′ = I, Θ = diag(ϑi, . . . , ϑj ),
of A′ provides the Ritz values of A in Kj (x). The columns yi of Y = QX′ are the Ritz vectors.
By construction the Ritz vectors are mutually orthogonal. Furthermore,
(8.8) Ayi − ϑiyi Kj (x)
because
Q (AQx′i − Qx′iϑi) = Q AQx′i − x′iϑi = A′x′i − x′iϑi = 0.
It is easy to represent a vector in Kj (x) that is orthogonal to yi.
Lemma 8.2 Let (ϑi, yi), 1 ≤ i ≤ j be Ritz values and Ritz vectors of A in Kj (x), j ≤ m. Let ω Pj−1. Then
(8.9) ω(A)x yk ω(ϑk) = 0.
Proof. “ =” Let first ω Pj with ω(x) = (x − ϑk)π(x), π Pj−1. Then
|
y |
ω(A)x = y |
(A |
− |
ϑ |
I)π(A)x, |
here we use that A = A |
|
(8.10) |
k |
k |
|
k |
|
(8.8) |
|
|
|
|
= (Ayk − |
ϑkyk) π(A)x |
0, |
||||
|
|
= |
||||||
whence (8.9) is su cient. “= ” Let now
Kj (x) Sk := {τ(A)x | τ Pj−1, τ(ϑk) = 0} , τ(ϑk) = (x − ϑk)ψ(x), ψ Pj−2
=(A − ϑkI) {ψ(A)x|ψ Pj−2}
=(A − ϑkI)Kj−1(x).
Sk has dimension j − 1 and yk, according to (8.10) is orthogonal to Sk. As the dimension of a subspace of Kj (x) that is orthogonal to yk is j − 1, it must coincide with Sk. These elements have the form that is claimed by the Lemma.
Next we define the polynomials
j |
|
j |
Y |
µ(ξ) |
iY |
µ(ξ) := i=1(ξ − ϑi) Pj , πk(ξ) := |
(ξ − ϑk) |
= i=1(ξ − ϑi) Pj−1. |
|
|
=k |
|
|
6 |
Then the Ritz vector yk can be represented in the form
(8.11) yk = πk(A)x , kπk(A)xk
8.3. POLYNOMIAL REPRESENTATION OF KRYLOV SUBSPACES |
149 |
as πk(ξ) = 0 for all ϑi, i 6= k. According to Lemma 8.2 πk(A)x is perpendicular to all yi with i 6= k. Further,
(8.12) |
βj := kµ(A)xk = min {kω(A)xk | ω Pj monic} . |
(A polynomial in Pj is monic if its highest coe cients aj = 1.) By the first part of Lemma 8.2 µ(A)x Kj+1(x) is orthogonal to Kj (x). As each monic ω Pj can be written in the form
ω(ξ) = µ(ξ) + ψ(ξ), ψ Pj−1,
we have
kω(A)xk2 = kµ(A)xk2 + kψ(A)xk2,
as ψ(A)x Kj (x). Because of property (8.12) µ is called the minimal polynomial of x of degree j. (In (8.5) we constructed the minimal polynomial of degree m in which case βm = 0.)
Let u1, · · · , um be the eigenvectors of A corresponding to λ1 < · · · < λm that span Km(x). We collect the first i of them in the matrix Ui := [u1, . . . , ui]. Let kxk = 1. Let
ϕ := (x, ui) and ψ := (x, UiU x) |
( |
≤ |
ϕ). |
(Remember that UiU |
x is the orthogonal |
|||||||||||||||||||
projection of x on R(Ui).) |
|
|
|
i |
|
|
|
|
|
|
|
|
|
|
|
|
|
i |
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
Let |
|
|
|
UiU x |
|
|
|
|
|
|
|
|
|
|
(I |
− |
UiU )x |
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
g := |
|
|
|
|
i |
|
|
and |
|
h := |
|
|
|
i |
. |
|
||||||||
|
|
UiU x |
|
|
|
|
|
|
|
|
|
|||||||||||||
|
|
|
k |
k |
|
|
|
|
|
|
|
k |
(I |
− |
UiU )x |
k |
|
|
||||||
Then we have |
|
|
|
|
i |
|
|
|
|
|
|
|
|
i |
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
k |
U |
U |
x |
k |
= cos ψ, |
|
k |
(I |
− |
U |
U |
)x |
k |
= sin ψ. |
|
|
||||||||
i |
|
i |
|
|
|
|
|
|
i |
|
i |
|
|
|
|
|
|
|||||||
The following Lemma will be used for the estimation of the di erence ϑ(ij) − λi of the desired eigenvalue and its approximation from the Krylov subspace.
Lemma 8.3 ([1, p.241]) For each π Pj−1 and each i ≤ j ≤ m the Rayleigh quotient
|
ρ(π(A)x; A |
− |
λ |
I) = |
(π(A)x) (A − λiI)(π(A)x) |
= ρ(π(A)x; A) |
− |
λ |
|
||||
|
|
i |
|
kπ(A)xk2 |
|
|
|
|
|
i |
|||
satisfies the inequality |
|
|
|
|
|
|
k π(λi) k |
|
|
|
|
||
(8.13) |
ρ(π(A)x; A − λiI) ≤ (λm − λi) cos ϕ |
|
. |
|
|
|
|||||||
|
|
|
|
|
|
sin ψ |
|
|
π(A)h |
2 |
|
|
|
Proof. With the definitions of g and h from above we have
x = UiUi x + (I − UiUi )x = cos ψ g + sin ψ h.
which is an orthogonal decomposition. As R(Ui) is invariant under A,
s := π(A)x = cos ψ π(A)g + sin ψ π(A)h
is an orthogonal decomposition of s. Thus, |
|
|
||||
(8.14) ρ(π(A)x; A |
− |
λ |
I) = |
cos2 ψ g (A − λiI)π2(A)g + sin2 ψ h (A − λiI)π2(A)h |
. |
|
|
i |
|
kπ(A)xk |
2 |
|
|
|
|
|
|
|
|
|
Since λ1 < λ2 < · · · < λm, we have
