Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Prime Numbers

.pdf
Скачиваний:
43
Добавлен:
23.03.2015
Размер:
2.99 Mб
Скачать

492 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

As a brief digression, we should note here that the original Goldbach conjecture is true if a di erent signal of infinite length, namely

G = (1, 1, 1, 0, 1, 1, 0, 1, 1, 0, . . .),

where the 1’s occur at indices (p−3)/2 for the odd primes p = 3, 5, 7, 11, 13, . . ., has the property that the acyclic A G has no zero elements. In this case the n-th element of the acyclic is precisely the number of Goldbach representations of 2n + 6.

Back to Theorem 9.5.13: It is advantageous to study the length-N DFT Y of the aforementioned signal y. This DFT turns out to be a famous sum:

Yk(N ) = cN (k) = e±2πijk/N , (9.26)

gcd(j,N )=1

where j is understood to run over those elements in the interval [0, N − 1] that are coprime to N , so the sign choice in the exponent doesn’t matter, while cN (k) is the standard notation for the Ramanujan sum, which sum is already known to enjoy intriguing multiplicative properties [Hardy and Wright 1979]. In fact, the appearance of the Ramanujan sum in Section 1.4.4 suggests that it makes sense for cN also to have some application in discrete convolution studies. We leave the proof of Theorem 9.5.13 to the reader (see Exercise 9.40), but wish to make several salient points. First, the sum in relation (9.26) can itself be thought of as a result of “sieving” out finite sums corresponding to the divisors of N . This gives rise to interesting series algebra. Second, it is remarkable that the cyclic length-N convolution of y with itself can be given a closed form. The result is

|

(9.27)

(y × y)n = ϕ2(N, n) = (p − θ(n, p)),

p N

where θ(n, p) is 1 if p|n, else 2. Thus, for 0 ≤ n < N , this product expression is the exact number of representations of either n or n+N as a+b with both a, b coprime to N . As discussed in the exercises, to complete this line of reasoning one must invoke negacyclic convolution ideas (or some other means such as sieving) to show that the representations of n + N are, for an appropriate range n, less than those of n itself. These observations will, after some final arguments, prove Theorem 9.5.13.

Now to yet another application of convolution. In 1847 E. Kummer discovered that if p > 2 is a regular prime, then Fermat’s last theorem, that

xp + yp = zp

has no Diophantine solution with xyz = 0, holds. (We note in passing that FLT is now a genuine theorem of A. Wiles, but the techniques here predated that work and still have application to such remaining open problems as the Vandiver conjecture.) Furthermore, p is regular if it does not divide any of the numerators of the even-index Bernoulli numbers

B2, B4, . . . , Bp−3.

9.5 Large-integer multiplication

493

There is an elegant relation due to Shokrollahi in 1994; see [Buhler et al. 2000], that gives a congruence for precisely these Bernoulli numbers:

Theorem 9.5.14. Let g be a primitive root of the odd prime p, and set:

cj = (g1 mod p)(gj mod p) g−j p

for j [0, p − 2]. Then for k [1, (p − 3)/2] we have

p−2

2kj

2k

 

B2k

 

j

 

 

 

 

 

 

 

 

 

1 − g

 

2kg (mod p).

(9.28)

cj g

 

 

 

=0

 

 

 

 

 

 

 

We see that Shokrollahi’s relation involves a length-(p1) DFT, with the operant field being Fp. One could proceed with an FFT algorithm, except that there are two problems with that approach. First, the best lengths for standard FFTs are powers of two; and second, one cannot use floating-point arithmetic, especially when the prime p is large, unless the precision is extreme (and somehow guaranteed). But we have the option of performing a DFT itself via convolution (see Algorithm 9.6.6), so the Shokrollahi procedure for determining regular primes; indeed, for finding precise irregularity indices of any prime, can be e ected via power-of-two length convolutions. As we shall see later, there are “symbolic FFT” means to do this, notably in Nussbaumer convolution, which avoids floating-point arithmetic and so is suitable for pureinteger convolution. These approaches—Shokrollahi identity and Nussbaumer convolution—have been used together to determine all regular primes p < 12000000 [Buhler et al. 2000].

9.5.4Discrete weighted transform (DWT) methods

One variant of DFT-based convolution that has proved important for modern primality and factorization studies (and when the relevant integers are large, say in the region of 21000000 and beyond) is the discrete weighted transform (DWT). This transform is defined as follows:

Definition 9.5.15 (Discrete weighted transform (DWT)). Let x be a signal of length D, and let a be a signal (called the weight signal) of the same length, with the property that every aj is invertible. Then the discrete weighted transform X = DW T (x, a) is the signal of elements

D−1

 

 

 

j

(a x)j g−jk,

 

Xk =

(9.29)

=0

 

 

with the inverse DW T 1(X, a) = x given by

 

1

D−1

 

xj =

 

 

(9.30)

 

Xkgjk.

Daj

k=0

 

 

 

 

494 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

Furthermore, the weighted

cyclic convolution of two signals

is the signal

z = x ×a y having

 

 

1

j+k≡n

(9.31)

 

 

zn = an

(a x)j (a y)k.

 

 

(mod D)

 

It is clear that the DWT is simply the DFT of the dyadic product signal a x consisting of elements aj xj . The considerable advantage of the DWT is that particular weight signals give rise to useful alternative convolutions. In some cases, the DWT eliminates the need for the zero padding of the standard FFT multiplication Algorithm 9.5.12. We first state an important result:

Theorem 9.5.16 (Weighted convolution theorem). Let signals x, y and weight signal a have the same length D. Then the weighted cyclic convolution of x, y satisfies

x ×a y = DW T 1(DW T (x, a) DW T (y, a), a),

that is to say,

1

D−1

 

 

 

(x ×a y)n =

Dan

(X Y )kgkn.

 

 

k=0

Thus FFT algorithms may be applied now to weighted convolution. In particular, one may compute not just the cyclic, but also the negacyclic, convolution in this manner, because the specific choice of weight signal

a = Aj , j [0, D − 1]

yields, when A is a primitive 2D-th root of unity in the field, the identity:

x ×y = x ×a y,

(9.32)

which means that the weighted cyclic in this case is the negacyclic. Note that when the D-th root g has a square root in the field, as is the case with the complex field arithmetic, we can simply assign A2 = g to e ect the negacyclic. Another interesting instance of generator A, namely when A is a primitive 4D- th root of unity, gives the so-called right-angle convolution [Crandall 1996a].

These observations lead in turn to an important algorithm that has been used to advantage in modern factorization studies. By using the DWT, the method obviates zero padding entirely. Consider the problem of multiplication of two numbers, modulo a Fermat number Fn = 22n + 1. This operation can happen, of course, a great number of times in attempts to factor an Fn. There are at least three ways to attempt (xy) mod Fn via convolution of length-D signals where D and a power-of-two base B are chosen such that Fn = BD +1:

(1)Zero-pad each of x, y up to length 2D, perform cyclic convolution, do carry adjust as necessary, take the result (mod Fn).

9.5 Large-integer multiplication

495

(2)Perform length-D weighted convolution, with weight generator A a primitive (2D)-th root of unity, do carry adjust as necessary.

(3)Create length-(D/2) “fold-over” signals, as x = L(x) + iH(x) and similarly for a y , employ a weighted convolution with generator A a primitive (4D)-th root of unity, do carry adjust.

Method (1) could, of course, involve Algorithm 9.5.12, with perhaps a fast Fermat-mod of Section 9.2.3; but one could instead use a pureinteger Nussbaumer convolution discussed later. Method (2) is the negacyclic approach, in which the weighted convolution can be seen to be multiplication (mod Fn); that is, the mod operation is “free” (see Exercises). Method (3) is the right-angle convolution approach, which also gives the mod operation for free (see Exercises). Note that neither method (2) nor method (3) involves zero-padding, and that method (3) actually halves the signal lengths (at the expense of complex arithmetic). We focus on method (3), to state the following algorithm, which, as with Algorithm 9.5.12, is often implemented in a floatingpoint paradigm:

Algorithm 9.5.17 (DWT multiplication modulo Fermat numbers). For a given Fermat number Fn = 22n + 1, and positive integers x, y ≡ −1 (mod Fn), this algorithm returns (xy) mod Fn. We choose B, D such that Fn = BD + 1, with the inputs x, y interpreted as length-D signals of base-B digits. We assume that there exists a primitive 4D-th root of unity, A, in the field.

1.

[Initialize]

 

 

 

 

 

E = D/2;

// Halve the signal length and “fold-over” the signals.

 

x = L(x) + iH(x);

 

 

 

// Length-E signals.

 

y = L(y) + iH(y);

 

 

 

 

 

a = 1, A, A2, . . . , AE

1

;

// Weight signal.

2.

[Apply transforms]

 

 

 

 

X = DW T (x, a);

 

 

 

// Via an e cient length-E FFT algorithm.

Y= DW T (y, a);

3.[Dyadic product]

Z= X Y ;

4.[Inverse transform]

z= DW T 1(Z, a);

5.[Unfold signal]

6.

z = Re(z) Im(z);

// Now z will have length D.

[Round digits]

 

 

z = round(z);

// Elementwise rounding to nearest integer.

7.

[Adjust carry in base B]

 

 

carry = 0;

 

 

for(0 ≤ n < D) {

 

v = zn + carry; zn = v mod B;

496 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

carry = v/B ;

}

8. [Final modular adjustment]

Include possible carry > 0 as a high digit of z;

z= z mod Fn;

//Via another ’carry’ loop or via special-form mod methods.

return z;

Note that in the steps [Adjust carry in base B] and [Final modular adjustment] the logic depends on the digits of the reconstructed integer z being positive. We say this because there are e cient variants using balanceddigit representation, in which variants care must be taken to interpret negative digits (and negative carry) correctly.

This algorithm was used in the discoveries of new factors of F13, F15, F16, and F18 [Brent et al. 2000] (see the Fermat factor tabulation in Section 1.3.2), and also to establish the composite character of F22, F24, and of various cofactors for other Fn [Crandall et al. 1995], [Crandall et al. 1999]. In more recent times, [Woltman 2000] has implemented the algorithm to forge highly e cient factoring software for Fermat numbers (see remarks following Algorithm 7.4.4).

Another DWT variant has been used in the discovery of eight Mersenne

primes 21398269 1, 22976221 1, 23021377 1, 26972593 1, 213466917 1, 220996011

1, 224036583 1, 225964951 1 (see Table 1.2), the last of which being the largest known explicit prime as of the present writing. For these discoveries, a network of volunteer users ran extensive Lucas–Lehmer tests that involve vast numbers of squarings modulo p = 2q 1. The algorithm variant in question has been called the irrational-base discrete weighted transform (IBDWT) [Crandall and Fagin 1994], [Crandall 1996a] for the reason that a special digit representation reminiscent of irrational-base expansion is used, which representation amounts to a discrete rendition of an attempt to expand in an irrational base. Let p = 2q 1 and observe first that if an integer x be represented in base B = 2

as

q−1

x = xj 2j ,

j=0

equivalently, x is the length-q signal (xj ); and similarly for an integer y, then the cyclic convolution x× y has, without carry, the digits of (xy) mod p. Thus, in principle, the standard FFT multiply could be e ected in this way, modulo Mersenne primes, without zero-padding. However, there are two problems with this approach. First, the arithmetic is merely bitwise, not exploiting typical machine advantages of word arithmetic. Second, one would have to invoke a length-q FFT. This can certainly be done (see Exercises), but power-of-two lengths are usually more e cient, definitely more prevalent.

It turns out that both of the obstacles to a not-zero-padded Mersenne multiply-mod can be overcome, if only we could somehow represent integers x in the irrational base B = 2q/D, with 1 < D < q being some power of two.

9.5 Large-integer multiplication

497

This is because the representation

D−1

x = xj 2qj/D, j=0

and similarly for y (where the digits in base B are generally irrational also), leads to the equivalence, without carry, of (xy) mod p and x × y. But now the signal lengths are powers of two, and the digits, although not integers, are some convenient word size. It turns out to be possible to mimic this irrational base expansion, by using a certain variable-base representation according to the following theorem:

Theorem 9.5.18 (Crandall). For p = 2q 1 (p not necessarily prime) and integers 0 ≤ x, y < p, choose signal length 1 < D < q. Interpret x as the signal (x0, . . . , xD−1) from the variable-base representation

D−1

D−1

j

x = j=0 xj 2 qj/D = j=0 xj 2 i=1 di ,

 

 

 

where

di = qi/D − q(i − 1)/D ,

and each digit xj is in the interval [0, 2dj+1 1], and all of this similarly for y. Define a length-D weight signal a by

aj = 2 qj/D−qj/D.

Then the weighted cyclic convolution x ×a y is a signal of integers, equivalent without carry to the variable base representation of (xy) mod p.

This theorem is proved and discussed in [Crandall and Fagin 1994], [Crandall 1996a], the only nontrivial part being the proof that the elements of the weighted convolution x ×a y are actually integers. The theorem leads immediately to

Algorithm 9.5.19 (IBDWT multiplication modulo Mersenne numbers).

For a given Mersenne number p = 2q 1 (need not be prime), and positive integers x, y, this algorithm returns—via floating-point FFT—the variable-base representation of (xy) mod p. Herein we adopt the nomenclature of Theorem 9.5.18, and assume a signal length D = 2k such that 2q/D is an acceptable word size (small enough that we avoid unacceptable numerical error).

1. [Initialize base representations]

Create the signal x as the collection of variable-base digits (xj ), as in Theorem 9.5.18, and do the same for y;

Create the weight signal a, also as in Theorem 9.5.18;

2. [Apply transforms]

X = DW T (x, a); // Perform via floating-point length-D FFT algorithm.

Y = DW T (y, a);

498 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

3.[Dyadic product]

Z = X Y ;

4.[Inverse transform]

z= DW T 1(Z, a);

5.[Round digits]

z = round(z);

// Elementwise rounding to nearest integer.

6. [Adjust carry in variable base] carry = 0;

for(0 ≤ n < len(z)) {

B = 2dn+1 ; // Size of place-n digits. v = zn + carry;

zn = v mod B; carry = v/B ;

}

7. [Final modular adjustment]

Include possible carry > 0 as a high digit of z;

z = z mod p;

// Via carry loop or special-form mod.

return z;

 

As this scheme is somewhat intricate, an example is appropriate. Consider multiplication modulo the Mersenne number p = 2521 1. We take q = 521 and choose signal length D = 16. Then the signal d of Theorem 9.5.18 can be seen to be

d = (33, 33, 32, 33, 32, 33, 32, 33, 33, 32, 33, 32, 33, 32, 33, 32),

and the weight signal will be

a = 1, 27/16, 27/8, 25/16, 23/4, 23/16, 25/8, 21/16, 21/2, 215/16,

23/8, 213/16, 21/4, 211/16, 21/8, 29/16 .

In a typical floating-point FFT implementation, this a signal is, of course, given inexact elements. But in Theorem 9.5.18 the weighted convolution (as calculated approximately, just prior to the [Round digits] step of Algorithm 9.5.19) consists of exact integers. Thus, the game to be played is to choose signal length D to be as small as possible (the smaller, the faster the FFTs that do the DWT), while not allowing the rounding errors to give incorrect elements of z. Rigorous theorems on rounding error are hard to come by, although there are some observations—some rigorous and some not so—in [Crandall and Fagin 1994] and references therein. More modern treatments include the very useful book [Higham 1996] and the paper [Percival 2003] on generalized IBDWT; see Exercise 9.48.

9.5.5Number-theoretical transform methods

The DFT of Definition 9.5.3 can be defined over rings and fields other than the traditional complex field. Here we give some examples of transforms over finite

9.5 Large-integer multiplication

499

rings and fields. The primary observation is that over a ring or field, the DFT defining relations (9.20) and (9.21) need no modification whatever, as long as we understand the requisite operations to occur (legally) in the algebraic domain at hand. In particular, a number-theoretical DFT of length D supports cyclic convolution of length D, via the celebrated convolution Theorem 9.5.11, whenever both D1 and g, a primitive D-th root of unity, exist in the algebraic domain. With these constraints in mind, number-theoretical transforms have attained a solid niche, in regard to fast algorithms in the field of digital signal processing. Not just raw convolution, but other interesting applications of such transforms can be found in the literature. A typical example is the use of number-theoretical transforms for classical algebraic operations [Yagle 1995], while yet more applications are summarized in [Madisetti and Williams 1997].

Our first example will be the case that the relevant domain is Fp. For a prime p and some divisor d|p − 1 let the field be Fp and consider the relevant transform to be

 

(p−1)/d−1

 

 

j

 

Xk =

xj h−jk mod p,

(9.33)

 

=0

 

where h is an element of multiplicative order (p − 1)/d in Fp. Note that the mod operation can in principle be taken either after individual summands, or for the whole sum, or in some combination of these, so that for convenience we simply append the symbols “mod p” to indicate that a transform element Xk is to be reduced to lie in the interval [0, p−1]. Now the inverse transform is

 

(p−1)/d−1

 

xj = −d

 

 

Xkhjk mod p,

(9.34)

k=0

whose prefactor is just ((p − 1)/d)1 mod p ≡ −d. These transforms can be used to provide increased precision for convolutions. The idea is to establish each convolution element (mod pr) for some convenient set of primes {pr}, whence the exact convolution can be reconstructed using the Chinese remainder theorem.

Algorithm 9.5.20 (Integer convolution on a CRT prime set). Given

two signals x, y each of length N = 2m having integer elements bounded by 0 ≤ xj , yj < M , this algorithm returns the cyclic convolution x × y via the CRT with distinct prime moduli p1, p2, . . . , pq .

1. [Initialize]

Find a set of primes of the form pr = arN + 1 for r = 1, . . . , q such that

pr > N M 2;

 

for(

r

q)

{

 

1

 

 

 

Find a primitive root gr of pr;

 

hr = grar mod pr;

// hr is an N -th root of 1.

}

 

 

 

 

 

2. [Loop over primes]

500 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

 

for(1 ≤ r ≤ q) {

 

 

h = hr; p = pr; d = ar;

// Preparing for DFTs.

 

X(r) = DF T (x);

// Via relation (9.33).

 

Y (r) = DF T (y);

 

3.

[Dyadic product]

 

4.

Z(r) = X(r) Y (r);

 

[Inverse transforms]

 

 

z(r) = DW T 1(Z(r));

// Via relation (9.34).

5.

}

 

[Reconstruct elements]

 

 

From the now known relations zj ≡ zj(r)

(mod pr) find each (unambiguous)

 

element zj in [0, N M 2) via CRT reconstruction, using such as Algorithm

 

2.1.7 or 9.5.26;

 

 

return z;

 

What this algorithm does is allow us to invoke length-2m FFTs for the DFT and its inverse, except that only integer arithmetic is to be used in the usual FFT butterflies (and of course the butterflies are continually reduced (mod pr) during the FFT calculations). This scheme has been used to good e ect in [Montgomery 1992a] in various factorization implementations. Note that if the forward DFT (9.33) is performed with a decimation-in-frequency (DIF) algorithm, and the reverse DFT (9.34) with a DIT algorithm, there is no need to invoke the scramble function of Algorithm 9.5.5 in either of the FFT functions shown there.

A second example of useful number-theoretical transforms has been called the discrete Galois transform (DGT) [Crandall 1996a], with relevant field Fp2 for p = 2q 1 a Mersenne prime. The delightful fact about such fields is that the multiplicative group order is

|Fp2 | = p2 1 = 2q+1(2q−1 1),

so that in practice, one can find primitive roots of unity of orders N = 2k as long as k ≤ q + 1. We can thus define discrete transforms of such lengths, as

N −1

 

j

 

Xk = xj h−jk mod p,

(9.35)

=0

 

where now all arithmetic is presumed, due to the known structure of Fp2 for primes p ≡ 3 (mod 4), to involve complex (Gaussian) integers (mod p) with

N = 2k,

xj = Re(xj ) + i Im(xj ),

h = Re(h) + i Im(h),

the latter being an element of multiplicative order N in Fp2 , with the transform element Xk itself being a Gaussian integer (mod p). Happily, there

9.5 Large-integer multiplication

501

is a way to find immediately an element of suitable order, thanks to the following result of [Creutzburg and Tasche 1989]:

Theorem 9.5.21 (Creutzburg and Tasche). Let p = 2q 1 be a Mersenne prime with q odd. Then

g = 22q−2 + i(3)2q−2

is an element of order 2q+1 in Fp2 .

These observations lead to the following integer convolution algorithm, in which we indicate the enhancements that can be invoked to reduce the complex arithmetic. In particular, we exploit the fact that integer signals are real, so the imaginary components of their elements vanish in the field, and thus the transform lengths are halved:

Algorithm 9.5.22 (Convolution via DGT (Crandall)). Given two signals x, y each of length N = 2k 2 and whose elements are integers in the interval [0, M ], this algorithm returns the integer convolution x × y. The method used is convolution via “discrete Galois transform” (DGT).

1.

[Initialize]

 

 

 

 

 

N M 2 and q > k;

 

Choose a Mersenne prime p = 2q

 

 

 

 

 

 

1 such that p >q+1

;

 

Use Theorem 9.5.21 to find an element g of order 2

 

 

h = g2q+2−k ;

 

 

// h is now an element of order N/2.

2.

[Fold signals to halve their lengths]

 

 

1;

 

 

 

y = y2j + iy2j+1

, j = 0, . . . , N/2

 

 

 

 

x = x2j

+ ix2j+1 , j = 0, . . . , N/2

1;

 

 

3.

[Length-N/2

transforms]

 

 

 

 

 

 

 

 

 

 

 

 

X = DF T (x);

// Via, say, split-radix FFT (mod p), root h.

Y= DF T (y);

4.[Special dyadic product]

for(0 ≤ k < N/2) {

Zk = (Xk + X−k)(Yk + Y−k) + 2(XkYk − X−kY−k) − h−k(Xk

X−k)(Yk − Y−k);

}

 

 

5. [Inverse length-N/2 transform]

 

z = 1 DF T 1

(Z);

// Via split-radix FFT (mod p) with root h.

4

 

 

6. [Unfold signal to double its length]

z = (Re(zj ), Im(zj )) , j = 0, . . . , N/2 1; return z;

To implement this algorithm, one needs only a complex (integer only!) FFT (mod p), complex multiplication (mod p), and a binary ladder for powering in the field. The split-radix FFT indicated in the algorithm, though it is normally used in reference to standard floating-point FFT’s, can nevertheless be used because “i” is defined [Crandall 1997b].

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]