Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Lecture Notes on Solving Large Scale Eigenvalue Problems.pdf
Скачиваний:
49
Добавлен:
22.03.2016
Размер:
2.32 Mб
Скачать

3.7. RESEARCH

75

Algorithm 3.6 shows the implicit symmetric tridiagonal QR algorithm. The shifts are chosen acording to Wilkinson. An issue not treated in this algorithm is deflation. Deflation is of big practical importance. Let us consider the following 6 × 6 situation

 

b2

a2

b3

 

 

 

 

 

 

 

a1

b2

 

 

 

 

 

 

.

T =

 

b3

03

a

4

b

5

 

 

 

 

 

 

 

 

 

 

 

 

a

0

 

 

 

 

 

 

 

 

b5

a5

b6

 

 

 

 

 

 

 

b6

a6

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The shift for the next step is determined from elements a5, a6, and b6. According to (3.11) the first plane rotation is determined from the shift and the elements a1 and b1. The implicit shift algorithm then chases the bulge down the diagonal. In this particular situation, the procedure finishes already in row/column 4 because b4 = 0. Thus the shift which is an approximation to an eigenvalue of the second block (rows 4 to 6) is applied to the wrong first block (rows 1 to 3). Clearly, this shift does not improve convergence.

If the QR algorithm is applied in its direct form, then still the first block is not treated properly, i.e. with a (probably) wrong shift, but at least the second block is diagonalized rapidly.

Deflation is done as indicated in Algorithm 3.6: If

if |bk| < ε(|ak−1| + |ak|) then deflate.

Deflation is particularly simple in the symetric case since it just means that a tridiagonal eigenvalue problem decouples in two (or more) smaller tridiagonal eigenvalue problems. Notice, however, that the eigenvectors are still n elements long.

3.7Research

Still today the QR algorithm computes the Schur form of a matrix and is by far the most popular approach for solving dense nonsymmetric eigenvalue problems. Multishift and aggressive early deflation techniques have led to significantly more e cient sequential implementations of the QR algorithm during the last decade. For a brief survey and a discussion of the parallelization of the QR algorithm, see [6].

3.8Summary

The QR algorithm is a very powerful algorithm to stably compute the eigenvalues and (if needed) the corresponding eigenvectors or Schur vectors. All steps of the algorithm cost O(n3) floating point operations, see Table 3.1. The one exception is the case where only eigenvalues are desired of a symmetric tridiagonal matrix. The linear algebra software package LAPACK [1] contains subroutines for all possible ways the QR algorithm may be employed.

We finish by repeating, that the QR algorithm is a method for dense matrix problems. The reduction of a sparse matrix to tridiagonal or Hessenberg form produces fill-in, thus destroying the sparsity structure which one almost always tries to preserve.

76

CHAPTER 3.

THE QR ALGORITHM

 

 

 

 

 

 

 

 

nonsymmetric case

symmetric case

 

 

 

without

with

without

with

 

 

 

Schurvectors

eigenvectors

 

 

 

 

 

 

 

 

transformation to Hessenberg/tridiagonal form

 

10 n3

14 n3

4 n3

8 n3

 

 

 

3

3

3

3

 

 

 

 

 

 

 

 

real double step Hessenberg/tridiagonal QR al-

 

20 n3

50 n3

24n2

6n3

 

gorithm (2 steps per eigenvalues assumed)

 

3

3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

total

 

10n3

25n3

4 n3

9n3

 

 

 

 

 

3

 

 

 

 

 

 

 

 

Table 3.1: Complexity in flops to compute eigenvalues and eigenvectors/Schur vectors of a real n × n matrix

Bibliography

[1]E. ANDERSON, Z. BAI, C. BISCHOF, J. DEMMEL, J. DONGARRA, J. D. CROZ, A. GREENBAUM, S. HAMMARLING, A. MCKENNEY, S. OSTROUCHOV,

AND D. SORENSEN, LAPACK Users’ Guide – Release 2.0, SIAM,

Philadel-

phia, PA, 1994.

(Software and guide are available from Netlib

at URL

http://www.netlib.org/lapack/).

 

[2]P. ARBENZ AND G. H. GOLUB, Matrix shapes invariant under the symmetric QR algorithm, Numer. Linear Algebra Appl., 2 (1995), pp. 87–93.

[3]J. W. DEMMEL, Applied Numerical Linear Algebra, SIAM, Philadelphia, PA, 1997.

[4]J. G. F. FRANCIS, The QR transformation – Parts 1 and 2, Computer J., 4 (1961– 1962), pp. 265–271 and 332–345.

[5]G. H. GOLUB AND C. F. VAN LOAN, Matrix Computations, The Johns Hopkins University Press, Baltimore, MD, 2nd ed., 1989.

[6]B. K˚AGSTROM¨ , D. KRESSNER, AND M. SHAO, On aggressive early deflation in parallel variants of the QR algorithm, in Applied Parallel and Scientific Computing (PARA 2010), K. J´onasson, ed., Heidelberg, 2012, Springer, pp. 1–10. (Lecture Notes in Computer Science, 7133).

[7]B. N. PARLETT, The QR algorithm, Computing Sci. Eng., 2 (2000), pp. 38–42.

[8]H. RUTISHAUSER, Solution of eigenvalue problems with the LR-transformation, NBS Appl. Math. Series, 49 (1958), pp. 47–81.

[9]J. H. WILKINSON, The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, 1965.

Chapter 4

Cuppen’s Divide and Conquer

Algorithm

In this chapter we deal with an algorithm that is designed for the e cient solution of the

symmetric tridiagonal eigenvalue problem

a2 ...

 

 

 

 

 

 

 

 

b1

 

 

 

 

(4.1)

T x = λx,

T =

a1

b1

 

 

bn

 

 

.

 

... ...

 

 

1

 

 

 

 

bn

 

 

 

 

 

 

 

 

 

1

an

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

We noticed from Table 3.1 that the reduction of a full symmetric matrix to a similar tridiagonal matrix requires about 83 n3 while the tridiagonal QR algorithm needs an estimated 6n3 floating operations (flops) to converge. Because of the importance of this subproblem a considerable e ort has been put into finding faster algorithms than the QR algorithms to solve the tridiagonal eigenvalue problem. In the mid-1980’s Dongarra and Sorensen [4] promoted an algorithm originally proposed by Cuppen [2]. This algorithm was based on a divide and conquer strategy. However, it took ten more years until a stable variant was found by Gu and Eisenstat [5, 6]. Today, a stable implementation of this latter algorithm is available in LAPACK [1].

4.1The divide and conquer idea

Divide and conquer is an old strategy in military to defeat an enemy going back at least to Caesar. In computer science, divide and conquer (D&C) is an important algorithm design paradigm. It works by recursively breaking down a problem into two or more subproblems of the same (or related) type, until these become simple enough to be solved directly. The solutions to the subproblems are then combined to give a solution to the original problem. Translated to our problem the strategy becomes

1.Partition the tridiagonal eigenvalue problem into two (or more) smaller tridiagonal eigenvalue problems.

2.Solve the two smaller problems.

3.Combine the solutions of the smaller problems to get the desired solution of the overall problem.

Evidently, this strategy can be applied recursively.

77

78

CHAPTER 4. CUPPEN’S DIVIDE AND CONQUER ALGORITHM

4.2Partitioning the tridiagonal matrix

Partitioning the irreducible tridiagonal matrix is done in the following way. We write (4.2)

 

a1

 

b1

 

..

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a2

.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.

.

 

 

.

.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.

 

 

 

.

bm

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

bm

 

 

1 am

 

 

bm

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

T =

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

bm

 

 

am+1

bm+1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.

..

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

bm+1

am+2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.

..

 

.

..

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

bn

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

bn−1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

an

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a1

b1

 

..

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a2

 

.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.

.

 

 

.

.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.

 

.

 

bm

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

=

 

 

 

 

 

bm

1

am

 

 

bm

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

+

 

±

bm

bm

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

am+1

 

bm bm+1

 

 

 

 

 

 

 

 

 

 

 

 

bm

±

bm

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

bm+1

 

am+2

 

 

..

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.

..

 

 

 

.

..

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

bn

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

bn−1 an

 

 

 

 

 

 

 

 

 

 

=

T

1

 

 

 

 

 

+ ρuuT

 

 

 

 

 

with u =

 

 

 

e

 

and ρ = ± bm,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

±e1m

 

 

 

 

 

 

 

 

 

T2

 

 

 

 

 

 

 

 

 

 

 

 

 

where em is a vector of length m ≈

n

and e1

 

is a vector of length n − m. Notice that

2

 

the most straightforward way to partition the problem without modifying the diagonal elements leads to a rank-two modification. With the approach of (4.2) we have the original T as a sum of two smaller tridiagonal systems plus a rank-one modification.

4.3Solving the small systems

We solve the half-sized eigenvalue problems,

(4.3) Ti = QiΛiQTi , QTi Qi = I, i = 1, 2.

These two spectral decompositions can be computed by any algorithm, in particular also by this divide and conquer algorithm by which the Ti would be further split. It is clear that by this partitioning an large number of small problems can be generated that can be potentially solved in parallel. For a parallel algorithm, however, the further phases of the algorithm must be parallelizable as well.

Plugging (4.3) into (4.2) gives

 

 

 

 

 

Q2

 

 

 

 

 

 

Q1T

 

Q2T

 

T1

 

T2

 

 

Q1

Λ1

Λ2

 

(4.4)

 

 

 

 

 

 

 

 

 

 

 

+ ρuu

 

 

 

 

 

=

 

 

+ ρvv

 

with

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(4.5)

 

v =

Q1T

 

Q2T

 

u =

± Q1T em

=

± last row of Q1 .

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Q2T e1

 

first row of Q2

 

 

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]