Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Челябинский Государственный Университет

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Prime Numbers

.pdf

Скачиваний:

Добавлен:

23.03.2015

Размер:

2.99 Mб

Скачать

☆

<<< < Предыдущая 37 38 39 40 41 42 43 44 45 46 47 4849 / 6149 50 51 52 53 54 55 56 57 58 59 60 61 > Следующая >>>

472 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

q = 0;

while(v2(r) ≤ v2(y)) {

q = q − 2v2(r)−v2(x); r = r − 2v2(r)−v2(y)y;

}

q = q cmod 2v2(y)−v2(x)+1; r = x + qy/2v2(y)−v2(x); return (q, r);

}

5. [Half-binary gcd function (recursive)]

hbingcd(k, x, y) { // Matrix returned; G, u, v, k1, k2, k3, q, r are local.


G =		1	0
G =		0	1 ;
if(v2(y) > k) return G;
k1	= k/2 ;
k3	= 22k1+1;
u = x mod k3;
v = y mod k3;
G = hbingcd(k1, u, v);				// Recurse.
(u, v)T = G(x, y)T ;

k2 = k − v2(v); if(k2 < 0) return G;

(q, r) = divbin(u, v);

k3 = 2v2(v)−v2(u);

G =	0	k3	G;
	k3	q

k3 = 22k2+1;
u =	v2−v2(v)		mod k3;
v =	r2−v2(v)		mod k3;

G = hbingcd(k2, u, v)G; return G;

}

See the last part of Exercise 9.20 for some implicit advice on what operations in Algorithm 9.4.7 are relevant to good performance. In addition, note that to achieve the remarkably low complexity of either of these recursive gcds, the implementor should make sure to have an e cient large-integer multiply. Whether the multiply occurs in a matrix multiplication, or anywhere else, the use of breakover techniques should be in force. That is, for small operands one uses grammar-school multiply, then for larger operands one may employ a Karatsuba or Toom–Cook approach, but use one of the optimal, FFT-based options for very large operands. In other words, the multiplication complexity M (N ) appearing in the complexity formula atop the present section needs be taken seriously upon implementation. These various fast multiplication algorithms are discussed later in the chapter (Sections 9.5.1 and 9.5.2).

9.5 Large-integer multiplication

473

It is natural to ask whether there exist extended forms of such recursive- gcd algorithms, along the lines, say, of Algorithm 2.1.4 or Algorithm 9.4.3, to e ect asymptotically fast modular inversion. The answer is yes, as explained in [Stehl´ and Zimmermann 2004] and [Cesari 1998].

9.5 Large-integer multiplication

When numbers have, say, hundreds or thousands (even millions) of decimal digits, there are modern methods for multiplication. In practice, one ﬁnds that the classical “grammar-school” methods just cannot e ect multiplication in certain desired ranges. This is because, of course, the bit complexity of grammar-school multiply of two size-N numbers is O ln2 N . It turns out that by virtue of modern transform and convolution techniques, this complexity can be brought down to

O(ln N (ln ln N )(ln ln ln N )),

as we discuss in more detail later in this section.

The art of large-integer arithmetic has, especially in modern times, sustained many revisions. Just as with the fast Fourier transform (FFT) engineering literature itself, there seems to be no end to the publication of new approaches, new optimizations, and new applications for computational number theory. The forest is su ciently thick that we have endeavored in this section to render an overview rather than an encyclopedic account of this rich and exotic ﬁeld. An interesting account of multiplication methods from a theoretical point of view is [Bernstein 1997], and modern implementations are discussed, with historical references, in [Crandall 1994b, 1996a].

9.5.1Karatsuba and Toom–Cook methods

The classical multiplication methods can be applied on parts of integers to speed up large-integer multiplication, as observed by Karatsuba. His recursive scheme assumes that numbers be represented in split form

x = x0 + x1W,

with x0, x1 [0, W − 1], which is equivalent to base-W representation, except that here the base will be about half the size of x itself. Note that x is therefore a “size-W 2” integer. For two integers x, y of this approximate size, the Karatsuba relation is

xy =	t + u	−	v +	t − u	W + vW 2	,	(9.17)

2			2

where

t = (x0 + x1)(y0 + y1),

u= (x0 − x1)(y0 − y1), v = x1y1,

474 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

and we obtain xy, which is originally a size-W 2 multiply, for the price of only three size-W multiplies (and some ﬁnal carry adjustments, to achieve base-W representation of the ﬁnal result). This is in principle an advantage, because if grammar-school multiply is invoked throughout, a size-W 2 multiply should be four, not three times as expensive as a size-W one. It can be shown that if one applies the Karatsuba relation to t, u, v themselves, and so on recursively, the asymptotic complexity for a size-N multiply is

O (ln N )ln 3/ ln 2

bit operations, a theoretical improvement over grammar-school methods. We say “theoretical improvement” because computer implementations will harbor so-called overhead, and the time to arrange memory and recombine subproducts and so on might rule out the Karatsuba method as a viable alternative. Still, it is often the case in practice that the Karatsuba approach does, in fact, outperform the grammar-school approach over a machineand implementation-dependent range of operands.

But a related method, the Toom–Cook method, reaches the theoretical boundary of O ln1+ N bit operations for the multiplicative part of size-N multiplication—that is, ignoring all the additions inherent in the method. However, there are several reasons why the method is not the ﬁnal word in the art of large-integer multiply. First, for large N the number of additions is considerable. Second, the complexity estimate presupposes that multiplications by constants (such as 1/2, which is a binary shift, and so on) are inexpensive. Certainly multiplications by small constants are so, but the Toom–Cook coe cients grow radically as N increases. Still, the method is of theoretical interest and does have its practical applications, such as fast multiplication on machines whose fundamental word multiply is especially sluggish with respect to addition. The Toom–Cook method hinges on the idea that given two polynomials

x(t) = x0	+ x1t + . . . + xD−1tD−1,	(9.18)
y(t) = y0	+ y1t + . . . + yD−1tD−1,	(9.19)

the polynomial product z(t) = x(t)y(t) is completely determined by its values at 2D − 1 separate t values, for example by the sequence of evaluations (z(j)), j [1 − D, D − 1]:

Algorithm 9.5.1 (Symbolic Toom–Cook multiplication). Given D, this algorithm generates the (symbolic)Toom–Cook scheme for multiplication of (D- digit)-by-(D-digit) integers.

1. [Initialize]

Form two symbolic polynomials x(t), y(t) each of degree (D − 1), as in equation (9.18);

2. [Evaluation]

9.5 Large-integer multiplication

475

Evaluate symbolically z(j) = x(j)y(j) for each j [1 − D, D − 1], so that each z(j) is cast in terms of the original coe cients of the x and y

	polynomials;
3.	[Reconstruction]
	Solve symbolically for the coe cients zj in the following linear system of
	(2D − 1) equations:
	z(t) =	2D−2		k	, t [1	− D, D − 1];
4.	[Report scheme]	k=0	zkt		, t [1	− D, D − 1];

Report a list of the (2D − 1) relations, each relation casting zj in terms of the original x, y coe cients;

The output of this algorithm will be a set of formulae that give the coe cients of the polynomial product z(t) = x(t)y(t) in terms of the coe cients of the original polynomials. But this is precisely what is meant by integer multiplication, if each polynomial corresponds to a D-digit representation in a ﬁxed base B.

To underscore the Toom–Cook idea, we note that all of the Toom–Cook multiplies occur in the [Evaluation] step of Algorithm 9.5.1. We give next a speciﬁc multiplication algorithm that requires ﬁve such multiplies. The previous, symbolic, algorithm was used to generate the actual relations of this next algorithm:

Algorithm 9.5.2 (Explicit D = 3 Toom–Cook integer multiplication).

For integers x, y given in base B as

x = x0 + x1B + x2B2, y = y0 + y1B + y2B2,

this algorithm returns the base-B digits of the product z = xy, using the theoretical minimum of 2D − 1 = 5 multiplications for acyclic convolution of length-3 sequences.

1. [Initialize]

r0 = x0 − 2x1 + 4x2; r1 = x0 − x1 + x2; r2 = x0;

r3 = x0 + x1 + x2; r4 = x0 + 2x1 + 4x2; s0 = y0 − 2y1 + 4y2; s1 = y0 − y1 + y2; s2 = y0;

s3 = y0 + y1 + y2; s4 = y0 + 2y1 + 4y2;

2. [Toom–Cook multiplies] for(0 ≤ j < 5) tj = rj sj ;

3. [Reconstruction]

476 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

z0 = t2;

z1 = t0/12 − 2t1/3 + 2t3/3 − t4/12;

z2 = −t0/24 + 2t1/3 − 5t2/4 + 2t3/3 − t4/24; z3 = −t0/12 + t1/6 − t3/6 + t4/12;

z4 = t0/24 − t1/6 + t2/4 − t3/6 + t4/24;

4. [Adjust carry] carry = 0;

for(0 ≤ n < 5) {

v = zn + carry; zn = v mod B; carry = v/B ;

}

return (z0, z1, z2, z3, z4, carry);

Now, as opposed to the Karatsuba method, in which a size-B2 multiply is brought down to that of three size-B ones for, let us say, a “gain” of 4/3, Algorithm 9.5.2 does a size-B3 multiply in the form of ﬁve size-B ones, for a gain of 9/5. When either algorithm is used in a recursive fashion (for example, the Step [Toom–Cook multiplies] is done by calling the same, or another, Toom–Cook algorithm recursively), the complexity of multiplication of two size-N integers comes down to

O (ln N )ln(2D−1)/ ln D ,

small multiplies (meaning of a ﬁxed size independent of N ), which complexity can, with su ciently high Toom–Cook degree d = D − 1, be brought down below any given complexity estimate of O ln1+ N small multiplies. However, it is to be noted forcefully that this complexity ignores the addition count, as well as the constant-coe cient multiplies (see Exercises 9.37, 9.78 and Section 9.5.8).

The Toom–Cook method can be recognized as a scheme for acyclic convolution, which together with other types of convolutions, we address later in this chapter. For more details on Karatsuba and Toom–Cook methods, the reader may consult [Knuth 1981], [Crandall 1996a], [Bernstein 1997].

9.5.2Fourier transform algorithms

Having discussed multiplication methods that enjoy complexities as low as O ln1+ N small ﬁxed multiplications (but perhaps unfortunate addition counts), we shall focus our attention on a class of multiplication schemes that enjoy low counts of all operation types. These schemes are based on the notion of the discrete Fourier transform (DFT), a topic that we now cover in enough detail to render the subsequent multiply algorithms accessible.

At this juncture we can think of a “signal” simply as a sequence of elements, in order to forge a connection between transform theory and the ﬁeld of signal processing. Throughout the remainder of this chapter, signals

9.5 Large-integer multiplication

477

might be sequences of polynomial coe cients, or sequences in general, and will be denoted by x = (xn), n [0, D − 1] for some “signal length” D.

The ﬁrst essential notion is that multiplication is a kind of convolution. We shall make that connection quite precise later, observing for the moment that the DFT is a natural transform to employ in convolution problems. For the DFT has the unique property of converting convolution to a less expensive dyadic product. We start with a deﬁnition:

Deﬁnition 9.5.3 (The discrete Fourier transform (DFT)). Let x be a signal of length D consisting of elements belonging to some algebraic domain in which D−1 exists, and let g be a primitive D-th root of unity in that domain; that is, gk = 1 if and only if k ≡ 0 (mod D). Then the discrete Fourier transform of x is that signal X = DF T (x) whose elements are

	D−1
	j
Xk =		xj g−jk,	(9.20)
		=0
with the inverse DF T −1(X) = x given by
1		D−1
xj =		Xkgjk.	(9.21)

D

k=0

That the transform DF T −1 is well-deﬁned as the correct inverse is left as an exercise. There are several important manifestations of the DFT:

Complex-ﬁeld DFT: x, X CD, g a primitive D-th root of 1 such as e2πi/D;

Finite-ﬁeld DFT: x, X FDpk , g a primitive D-th root of 1 in the same ﬁeld;

Integer-ring DFT: x, X ZDN , g a primitive D-th root of 1 in the ring, D−1, g−1 exist.

It should be pointed out that the above are common examples, yet there are many more possible scenarios. As just one extra example, one may deﬁne a DFT over quadratic ﬁelds (see Exercise 9.50).

In the ﬁrst instance of complex ﬁelds, the practical implementations involve ﬂoating-point arithmetic to handle complex numbers (though when the signal has only real elements, signiﬁcant optimizations apply, as we shall see). In the second, ﬁnite-ﬁeld, cases one uses ﬁeld arithmetic with all terms reduced (mod p). The third instance, the ring-based DFT, is sometimes applied simultaneously for N = 2n − 1 and N = 2n + 1, in which cases the assignments g = 2 and D = n, D = 2n, respectively, can be made when n is coprime to both N, N .

It should be said that there exists a veritable menagerie of alternative transforms, many of them depending on basis functions other than the complex exponential basis functions of the traditional DFT; and often, such alternatives admit of fast algorithms, or assume real signals, and so on. Though such transforms lie beyond the scope of the present book, we observe

478 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

that some of them are also suited for the goal of convolution, so we name a few: The Walsh–Hadamard transform, for which one needs no multiplication, only addition; the discrete cosine transform (DCT), which is a real-signal, real-multiplication analogue to the DFT; various wavelet transforms, which sometimes admit of very fast (O(N ) rather than O(N ln N )) algorithms; realvalued FFT, which uses either cos or sin in real-only summands; the realsignal Hartley transform, and so on. Various of these options are discussed in [Crandall 1994b, 1996a].

Just to clear the air, we hereby make explicit the almost trivial di erence between the DFT and the celebrated fast Fourier transform (FFT). The FFT is an operation belonging to the general class of divide-and-conquer algorithms, and which calculates the DFT of Deﬁnition 9.5.3. The FFT will typically appear in our algorithm layouts in the form X = F F T (x), where it is understood that the DFT is being calculated. Similarly, an operation F F T −1(x) returns the inverse DFT. We make the distinction explicit because “FFT” is in some sense a misnomer: The DFT is a certain sum—an algebraic quantity—yet the FFT is an algorithm. Here is a heuristic analogy to the distinction: In this book, the equivalence class x (mod N ) are theoretical entities, whereas the operation of reducing x modulo p we have chosen to write a little di erently, as x mod p. By the same token, within an algorithm the notation X = F F T (x) means that we are performing an FFT operation on the signal X; and this operation gives, of course, the result DF T (x). (Yet another reason to make the almost trivial distinction is that we have known students who incorrectly infer that an FFT is some kind of “approximation” to the DFT, when in fact, the FFT is sometimes more accurate then a literal DFT summation, in the sense of roundo error, mainly because of reduced operation count for the FFT.)

The basic FFT algorithm notion has been traced all the way back to some observations of Gauss, yet some authors ascribe the birth of the modern theory to the Danielson–Lanczos identity, applicable when the signal length D is even:

D−1	xj g−jk =	D/2−1	g2	−jk	+ g−k	D/2−1	g2	−jk .
DF T (x) =	xj g−jk =	x2j	g2	−jk	+ g−k	x2j+1	g2	−jk .
						j
j=0		j=0				=0

(9.22) A beautiful identity indeed: A DFT sum for signal length D is split into two sums, each of length D/2. In this way the Danielson–Lanczos identity ignites a recursive method for calculating the transform. Note the so-called twiddle factors g−k, which ﬁgure naturally into the following recursive form of FFT. In this and subsequent algorithm layouts we denote by len(x) the length of a signal x. In addition, when we perform element concatenations of the form (aj )j J we mean the result to be a natural, left-to-right, element concatenation as the increasing index j runs through a given set J. Similarly, U V is a signal having the elements of V appended to the right of the elements of U .

9.5 Large-integer multiplication

479

Algorithm 9.5.4 (FFT, recursive form). Given a length-(D = 2d) signal x whose DFT (Deﬁnition 9.5.3) exists, this algorithm calculates said transform via a single call F F T (x). We employ the signal-length function len(), and within the recursion the root g of unity is to have order equal to current signal length.

1. [Recursive F F T function]

F F T (x) {

n = len(x);

if(n == 1) return x; m = n/2;

X= (x2j )mj=0−1;

Y = (x2j+1)mj=0−1;

X= F F T (X);

Y = F F T (Y );

U = (Xk mod m)nk=0−1;

V = (g−kYk mod m)nk=0−1;

return U + V ;

}

//The even part of x. // The odd part of x.

//Two recursive calls of half length.

//Use root g of order n.

//Realization of identity (9.22).

A little thought shows that the number of operations in the algebraic domain of interest is

O(D ln D),

and this estimate holds for both multiplies and add/subtracts. The D ln D complexity is typical of divide-and-conquer algorithms, another example of which would be the several methods for rapid sorting of elements in a list. This recursive form is instructive, and does have its applications, but the overwhelming majority of FFT implementations use a clever loop structure ﬁrst achieved in [Cooley and Tukey 1965]. The Cooley–Tukey algorithm uses the fact that if the elements of the original length-(D = 2d) signal x are given a certain “bit-scrambling” permutation, then the FFT can be carried out with convenient nested loops. The scrambling intended is reverse-binary reindexing, meaning that xj gets replaced by xk, where k is the reverse-binary representation of j. For example, for signal length D = 25, the new element x5 after scrambling is the old x20, because the binary reversal of 5 = 001012 is 101002 = 20. Note that this bit-scrambling of indices could in principle be carried out via sheer manipulation of indices to create a new, scrambled list; but it is often more e cient to do the scrambling in place, by using a certain sequence of two-element transpositions. It is this latter scheme that appears in the next algorithm.

A most important observation is that the Cooley–Tukey scheme actually allows the FFT to be performed in place, meaning that an original signal x is replaced, element by element, with the DFT values. This is an extremely memory-e cient way to proceed, accounting for a great deal of the popularity of the Cooley–Tukey and related forms. With bit-scrambling properly done, the overall Cooley–Tukey scheme yields an in-place, in-order (meaning natural DFT order) FFT. Historically, the phrase “decimation in time” is attributed to

480 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

the Cooley–Tukey form, the phrase meaning that as in the Danielson–Lanczos splitting identity (9.22), we cut up (decimate) the time domain—the index on the original signal. The Gentleman–Sande FFT falls into the “decimation in frequency” class, for which a similar game is played on the k index of the transform elements Xk.

Algorithm 9.5.5 (FFT, in-place, in-order loop forms with bit-scramble).

Given a (D = 2d)-element signal x, the functions herein perform an FFT via nested loops. The two essential FFTs are laid out as decimation-in-time (Cooley– Tukey) and decimation-in-frequency (Gentleman–Sande) forms. Note that these forms can be applied symbolically, or in number-theoretical transform mode, by identifying properly the root of unity and the ring or ﬁeld operations.

1. [Cooley–Tukey, decimation-in-time FFT]
F F T (x) {
scramble(x);
n = len(x);
for(m = 1; m < n; m = 2m) {		// m ascends over 2-powers.
for(0 ≤ j < m) {
a = g−jn/(2m);
for(i = j; i < n; i = i + 2m)
(xi, xi+m) = (xi + axi+m, xi − axi+m);
}
}
return x;
}
2. [Gentleman–Sande, decimation-in-frequency FFT]
F F T (x) {
n = len(x);
for(m = n/2; m ≥ 1; m = m/2) {		// m descends over 2-powers.
for(0 ≤ j < m) {
a = g−jn/(2m);
for(i = j; i < n; i = i + 2m)
(xi, xi+m) = (xi + xi+m, a(xi − xi+m));
}
}
scramble(x);
return x;
}
3. [In-place scramble procedure]
scramble(x) {	// In-place, reverse-binary element scrambling.
n = len(x);
j = 0;
for(0 ≤ i < n − 1) {
if(i < j) (xi, xj ) = (xj , xi);		// Swap elements.
k = n/2 ;

9.5 Large-integer multiplication

481

while(k ≤ j) { j = j − k; k = k/2 ;

}

j = j + k;

}

return;

}

It is to be noted that when one performs a convolution in the manner we shall exhibit later, the scrambling procedures are not needed, provided that one performs required FFTs in a speciﬁc order.

Correct is Gentleman–Sande form (with scrambling procedure omitted) ﬁrst, Cooley–Tukey form (without initial scrambling) second. This works out because, of course, scrambling is an operation of order two.

Happily, in cases where scrambling is not desired, or when contiguous memory access is important (e.g., on vector computers), there is the Stockham FFT, which avoids bit-scrambling and also has an innermost loop that runs essentially consecutively through data memory. The cost of all this is that one must use an extra copy of the data. The typical implementations of the Stockham FFT are elegant [Van Loan 1992], but there is a particular variant that has proved quite useful on modern vector machinery. This special variant is the “ping-pong” FFT, because one goes back and forth between the original data and a separate copy. The following algorithm display is based on a suggested design of [Papadopoulos 1999]:

Algorithm 9.5.6 (FFT, “ping-pong” variant, in-order, no bit-scramble).

Given a (D = 2d)-element signal x, a Stockham FFT is performed, but with the original x and external data copy y used in alternating fashion. We interpret X, Y below as pointers to the (complex) signals x, y, respectively, but operating under the usual rules of pointer arithmetic; e.g., X[0] is the ﬁrst complex datum of x initially, but if 4 is added to pointer X, then X[0] = x4, and so on. If exponent d is even, pointer X has the FFT result, else pointer Y has it.

1. [Initialize]
J = 1;
X = x; Y = y;	// Assign memory pointers.
2. [Outer loop]
for(d ≥ i > 0) {
m = 0;
while(m < D/2) {
a = e−2πim/D;
for(J ≥ j > 0) {
Y [0] = X[0] + X[D/2];
Y [J] = a(X[0] − X[D/2]);
X = X + 1;
Y = Y + 1;

<<< < Предыдущая 37 38 39 40 41 42 43 44 45 46 47 4849 / 6149 50 51 52 53 54 55 56 57 58 59 60 61 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
15.04.2019144.38 Кб5prakticheskie_zanyatia_XIXvek_IIItret.doc
#
01.05.2025172.54 Кб1PRAKTIKA.doc
#
27.09.201958.37 Кб11Praktika_1.doc
#
23.09.2019165.38 Кб1Pravovoe.doc
#
01.04.2025645.12 Кб2Predmet_i_zadachi_istorii_psikhologii.doc
#
23.03.20152.99 Mб50Prime Numbers.pdf
#
11.11.201972.7 Кб3PRIMERN_J_PEREChEN_TEM_DOKLADOV_PO_DISTsIPLINE.doc
#
08.03.201627.33 Кб15Printsip_razdelenia_vlastey_Зиябоева А.Р.docx
#
20.11.2019407.55 Кб2Programma_Eff.doc
#
23.03.2015212.48 Кб12Programma_i_plany_semin_zan.doc
#
26.09.2019140.8 Кб0Programma_realizm_OZO_2011.doc