Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Челябинский Государственный Университет

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Prime Numbers

.pdf

Скачиваний:

Добавлен:

23.03.2015

Размер:

2.99 Mб

Скачать

☆

<<< < Предыдущая 34 35 36 37 38 39 40 41 42 43 44 4546 / 6146 47 48 49 50 51 52 53 54 55 56 57 58 > Следующая >>>

442	Chapter 8 THE UBIQUITY OF PRIME NUMBERS

we have toured. But there could be a variant that is easier to implement. For example, it is not unreasonable to presume that the very ﬁrst working QTM DL/factoring solvers might make use of one of the currently lesspopular methods, in favor of simplicity. Observe that rho methods involve very little beyond modular squaring and adding. (As with many factoring algorithm candidates for QTM implementation, the eventual gcd operations could just be classical.) What is more, at the very heart of rho methods lives the phenomenon of periodicity, and as we have seen, QTMs are periodicity detectors par excellence.

Chapter 9

FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

In this chapter we explore the galaxy of “fast” algorithms that admit of applications in prime number and factorization computations. In modern times, it is of paramount importance to be able to manipulate multipleprecision integers, meaning integers that in practice, on prevailing machinery, have to be broken up into pieces, with machine operations to involve those pieces, with a view to eventual reassembly of desired results. Although multiple-precision addition and subtraction of integers is quite common in numerical studies, we assume that notions of these very simple fundamental operations are understood, and start with multiplication, which is perhaps the simplest arithmetic algorithm whose classical form admits of genuine enhancements.

9.1 Tour of “grammar-school” methods

9.1.1Multiplication

One of the most common technical aspects of our culture is the classical, or shall we say “grammar-school,” method of long multiplication. Though we shall eventually concentrate on fast, modern methods of remarkable e ciency, the grammar-school multiply remains important, especially when the relevant integers are not too large, and itself allows some speed enhancements. In the typical manifestation of the algorithm, one simply writes out, one below the other, the two integers to be multiplied, then constructs a parallelogram of digitwise products. Actually, the parallelogram is a rhombus, and to complete the multiply we need only add up the columns of the rhombus, with carry. If each of x, y to be multiplied has D digits in some given base B (also called the “radix”), then the total number of operations required to calculate xy is O(D2), because that is how many entries appear in the rhombus. Here, an “operation” is either a multiply or an add of two numbers each of size B. We shall refer to such a fundamental, digitwise, multiply as a “size-B multiply.”

A formal exposition of grammar-school multiply is simple but illuminating, especially in view of later enhancements. We start with two deﬁnitions:

Deﬁnition 9.1.1. The base-B representation of a nonnegative integer x is the shortest sequence of integer digits (xi) such that each digit satisﬁes

444 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

0 ≤ xi < B, and

D−1

x = xiBi.

i=0

Deﬁnition 9.1.2. The balanced base-B representation of a nonnegative integer x is the shortest sequence of integer digits (xi) such that each digit satisﬁes − B/2 ≤ xi ≤ (B − 1)/2 , and

D−1
i
x =	xiBi.
=0

Say we wish to calculate a product z = xy for x, y both nonnegative. Upon contemplation of the grammar-school rhombus, it becomes evident that given x, y in base-B representation, say, we end up summing columns to construct integers

i		(9.1)
wn =	xiyj ,	(9.1)

+j=n

where i, j run through all indices in the respective digit lists for x, y. Now the sequence (wn) is not generally yet the base-B representation of the product z. What we need to do, of course, is to perform the wn additions with carry. The carry operation is best understood the way we understood it in grammar school: A column sum wn a ects not only the ﬁnal digit zn, but sometimes higher-order digits beyond this. Thus, for example, if w0 is equal to B + 5, then z0 will be 5, but a 1 must be added into z1; that is, a carry occurs.

These notions of carry are, of course, elementary, but we have stated them because such considerations ﬁgure strongly into modern enhancements to this basic multiply. In actual experience, the carry considerations can be more delicate and, for the programmer, more troublesome than any other part of the algorithm.

9.1.2Squaring

From the computational perspective, the connection between multiplication and squaring is interesting. We expect the operation xx to involve generally more redundancy than an arbitrary product xy, so that squaring should be easier than general multiplication. Indeed, this intuition turns out to be correct. Say that x has D digits in base B representation, and note that (9.1) can be rewritten for the case of squaring as

n
i	(9.2)
wn = xixn−i,	(9.2)
=0

where n [0, D − 1]. But this sum for wn generally has reﬂection symmetry, and we can write

	n/2
wn = 2	i	(9.3)
wn = 2	xixn−i − δn,	(9.3)
	=0

9.1 Tour of “grammar-school” methods

445

where δn is 0 for n odd, else x2n/2 for n even. It is clear that each column component wn involves about half the size-B multiplies required for the general multiplication algorithm. Of course, ﬁnal carry operations must be performed on the wn, to get the ﬁnal digits zn of the product z = x2, but in most practical instances, this squaring is indeed roughly twice as fast as a multiple-precision multiply. There exist in the literature some very readable expositions of the squaring algorithm and related algorithms. See, for example, [Menezes et al. 1997].

There is an elegant, if simple, argument showing that general multiplication has no more than twice the complexity of squaring. One invokes the identity

4xy = (x + y)2 − (x − y)2,

(9.4)

which indicates that a multiplication can be e ected by two squarings and a divide by four, this ﬁnal divide presumed trivial (as, say, a right-shift by two bits). This observation is not just academic, for in certain practical scenarios this algebraic rule may be exploited (see Exercise 9.6).

9.1.3Div and mod

Div and mod operations are omnipresent in prime-number and factorization studies. These operations often occur in combination with multiplication, in fact, this symbiosis is exploited in some of the algorithms we shall describe. It is quite common that one spends computation e ort on operations such as xy (mod p), for primes p, or in factorization studies xy (mod N ) where N is to be factored.

It is a primary observation that the mod operation can hinge on the div operation. We shall use, as before, the notation x mod N to denote the operation that results in the least nonnegative residue of x (mod N ), while the greatest integer in x/N , denoted by x/N , is the div result. (In some computer languages these operations are written “x%N ” and “x div N ,” respectively, while in others the integer divide “x/N ” means just div, while in yet others the div is “Floor[x/N ],” and so on.) For integers x and positive integers N , a basic relation in our present notation is

x mod N = x − N x/N .

(9.5)

Note that this relation is equivalent to the quotient–remainder decomposition x = qN + r, with q, r being respectively the div and mod results under consideration. So the div operation begets the mod, and we can proceed with algorithm descriptions for div.

Analogous to “grammar-school” multiplication is, of course, the elementary method of long division. It is fruitful to contemplate even this simple long division algorithm, with a view to enhancements. In the normal execution of long division in a given base B, the divisor N is ﬁrst justiﬁed to the left, with respect to the dividend x. That is to say, a power Bb of the base is found such that m = BbN ≤ x < Bb+1N . Then one ﬁnds x/m , which

446 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

quotient is guaranteed to be in the interval [1, B − 1]. The quotient here is, of course, the leading base-B digit of the ﬁnal div result. One then replaces x with x − m x/m , and divides m by B, that is, shifts m down by one digit, and so on recursively. This sketch shows us right o that for certain bases B, things are relatively simple. In fact, if one adopts binary representations (B = 2), then a complete div algorithm can be e ected such that there are no multiplies at all. The method can actually be of practical interest, especially on machinery that has addition, subtraction, bit-shifting (left-shift means multiply-by-2, right-shift means divide-by-2), but little else in the way of operations. Explicitly, we proceed as follows:

Algorithm 9.1.3 (Classical binary divide). Given positive integers x ≥ N , this algorithm performs the div operation, returning x/N . (See Exercise 9.7 for the matter of also returning the value x mod N .)

1. [Initialize]

Find the unique integer b such that 2bN ≤ x < 2b+1N ;

//This can be done by successive left-shifts of the binary representation of N , or better, by comparing the bit lengths of x, N and possibly doing an extra shift.

m = 2bN ; c = 0;

2.[Loop over b bits] for(0 ≤ j ≤ b) {

c = 2c;

a = x − m; if(a ≥ 0) {

c = c + 1;

x = a;

}

m = m/2;

}

return c;

A similar binary approach can be used to e ect the common “mul-mod” operation (xy) mod N , where we have adapted the treatment in [Arazi 1994]:

Algorithm 9.1.4 (Binary mul-mod). We are given positive integers x, y with 0 ≤ x, y < N . This algorithm returns the composite operation (xy) mod N . We assume the base-2 representation of Deﬁnition 9.1.1 for x, so that the binary bits of x are (x0, . . . , xD−1), with xD−1 > 0 being the high bit.

1.[Initialize] s = 0;

2.[Loop over D bits]

for(D − 1 ≥ j ≥ 0) { s = 2s;

if(s ≥ N ) s = s − N ; if(xj == 1) s = s + y;

9.2 Enhancements to modular arithmetic

447

if(s ≥ N ) s = s − N ;

}

return s;

The binary divide and mul-mod algorithms, though illuminating, su er from a basic practical shortcoming: One is not taking due advantage of multiple-bit arithmetic as is commonly available on any reasonably powerful computer. One would like to perform multiple-bit operations within machine registers, rather than just operating one bit at a time. For this reason, larger bases than B = 2 are usually used, and many modern div implementations invoke “Algorithm D,” see [Knuth 1981, p. 257], which is a ﬁnely tuned version of the classical long division. That algorithm is a good example of one that has more pseudocode complexity than does our binary Algorithm (9.1.3), yet amounts to a great deal of optimization in actual programs.

9.2Enhancements to modular arithmetic

The classical div and mod algorithms discussed in Section 9.1.3 all involve some sort of explicit divide operation. For the binary algorithms given, this division is trivial; that is, if 0 ≤ a < 2b, then a/b is of course either 0 or 1. In the case of Knuth’s Algorithm D for higher bases than B = 2, one is compelled to estimate small div results. But there exist more modern algorithms for which no explicit division of any kind is required. The advantage of these methods to the computationalist is twofold. First, complete numbertheoretical programs can be written without relatively complicated long division; and second, the optimization of all the arithmetic can be focused onto just one aspect, namely multiplication.

9.2.1Montgomery method

An observation in [Montgomery 1985]	has turned out to be important
in the computational ﬁeld, especially in	situations where modular powers

(xy ) mod N are to be calculated with optimal speed (and, as we see later, the operands are not too overwhelmingly large). Observe, ﬁrst of all, that “naive” multiply-mod takes one multiply and one divide (not counting subtractions), and so the spirit of the Montgomery method—as with other methods discussed in this chapter—is to lower or, if we are lucky, remove the di culty of the divide step.

The Montgomery method, which is a generalization of an old method of Hensel for computing inverses of 2-adic numbers, stems from the following theorem, leading to e cient means for the computation of quantities (xR−1) mod N , for certain conveniently chosen R:

Theorem 9.2.1 (Montgomery). Let N, R be coprime positive integers, and deﬁne N = (−N −1) mod R. Then for any integer x, the number

y = x + N ((xN ) mod R)

448 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

is divisible by R, with

y/R	≡	xR−1 (mod N ).	(9.6)

Furthermore, if 0 ≤ x < RN , the di erence y/R − ((xR−1) mod N ) is either 0 or N .

As we shall see, Theorem 9.2.1 will be most useful when there are several or many multiplications modulo N to be performed, such as in a powering ladder, in which case the computation of the auxiliary number N is only a one-time charge for the entire calculation. When N is odd and R is a power of 2, which is often the case in applications, the “mod R” operation is trivial, as is the division by R to get y. In addition, there is an alternative way to compute N using Newton’s method; see Exercise 9.12. It may help in the case N odd and R a power of 2 to cast the basic Montgomery operation in the language of bit operations. Let R = 2s, let & denote the bitwise “and” operation, and let >> c denote “right-shift by c bits.” Then the left-hand side of equation (9.6) can be cast as

y/R = (x + N		((x		N )&(R	−	1))) >> s,	(9.7)

in which the two required multiplies are explicit.

So now, for 0 ≤ x < RN , we have a way to calculate (xR−1) mod N with a small number (two) of multiplies. This is not quite the mod result x mod N of course, but the Montgomery method applies well to the calculation of powers (xy ) mod N . The reason is that multiplication by R−1 or R on the residue system of {x : 0 ≤ x < N } results in a complete residue system (mod N ). Thus, powering arithmetic can be performed in a di erent residue system, with one initial multiply-mod operation and successive calls to a Montgomery multiplication, to yield results (mod N ). To make these ideas precise, we adopt the following deﬁnition:


Deﬁnition 9.2.2.			For gcd(R, N ) = 1 and 0 ≤ x < N , the (R, N )-residue of
x is	x	= (xR) mod N .
Deﬁnition 9.2.3.			The Montgomery product of two integers a, b is

M (a, b) = (abR−1) mod N .

Then the required facts can be collected in the following theorem:

Theorem 9.2.4 (Montgomery rules). Let R, N be as in Deﬁnition 9.2.2, and 0 ≤ a, b < N . Then a mod N = M (a, 1) and M (a, b) = ab.

This theorem gives rise to the Montgomery powering technique. For example, an example corollary of the theorem is that

M (M (M (				),		), 1) = x3 mod N.	(9.8)
	x,		x		x

To render the notion of general Montgomery powering explicit, we next give the relevant algorithms.

9.2 Enhancements to modular arithmetic		449
Algorithm 9.2.5 (Montgomery product).		This algorithm returns M (c, d)
for integers 0 ≤ c, d < N , with N odd, and R = 2s > N .
1.	[Montgomery mod function M ]
	M (c, d) {
	x = cd;
	z = y/R;	// From Theorem 9.2.1.
2.	[Adjust result]
	if(z ≥ N ) z = z − N ;

return z;

}

The [Adjust result] step in this algorithm always works because cd < RN by hypothesis. The only importance of the choice that R be a power of two is that fast arithmetic may be employed in the evaluation of z = y/R.

Algorithm 9.2.6 (Montgomery powering). This algorithm returns xy mod N , for 0 ≤ x < N , y > 0, and R chosen as in Algorithm 9.2.5. We denote by (y0, . . . , yD−1) the binary bits of y.

1.[Initialize]

x = (xR) mod N ; p = R mod N ;

2.[Power ladder]

for(D − 1 ≥ j ≥ 0) { p = M (p, p);

if(yj == 1) p = M (p, x);

}

3. [Final extraction of power] return M (p, 1);

//Via some divide/mod method.

//Via Algorithm 9.2.5.

//Now p is xy .

Later in this chapter we shall have more to say about general power ladders; the ladder here is exhibited primarily to show how one may call the M () function to advantage.

The speed enhancements of an eventual powering routine all center on the M () function, in particular on the computation of z = y/R. We have noted that to get z, two multiplies are required, as in equation (9.7). But the story does not end here; in fact, the complexity of the Montgomery mod operation can be brought (asymptotically, large N ) down to that of one size-N multiply. (To state it another way, the composite operation M (x y) asymptotically requires two size-N multiplies, which can be thought of as one for the “ ” operation.) The details of the optimizations are intricate, involving various manifestations of the inner multiply loops of the M () function [Ko¸c et al. 1996], [Bosselaers et al. 1994]. But these details stem at least in part from a wasted operation in equation (9.7): The right-shifting e ectively destroys some of the bits generated by the two multiplies. We shall see this shifting phenomenon again in the next section. In actual program implementations of Montgomery’s scheme, one can assign a word-size base B = 2b, so that

450 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

a convenient value R = Bk may be used, whence the z value in Algorithm 9.2.5 can be obtained by looping k times and doing arithmetic (mod B) that is particularly convenient for the machine. Explicit word-oriented loops that achieve the optimal asymptotic complexity are laid out nicely in [Menezes et al. 1997].

9.2.2Newton methods

We have seen in Section 9.1 that the div operation may be e ected via additions, subtractions, and bit-shifts, although, as we have also seen, the algorithm can be bested by moving away from the binary paradigm into the domain of general base representations. Then we saw that the technique of Montgomery mod gives us an asymptotically e cient means for powering with respect to a ﬁxed modulus. It is interesting, perhaps at ﬁrst surprising, that general div and mod may be e ected via multiplications alone; that is, even the small div operations attendant to optimized div methods are obviated, as are the special precomputations of the Montgomery method.

One approach to such a general div and mod scheme is to realize that the classical Newton method for solving equations may be applied to the problem of reciprocation. Let us start with reciprocation in the domain of real numbers. If one is to solve f (x) = 0, one proceeds with an (adroit) initial guess for x, call this guess x0, and iterates

xn+1 = xn − f (xn)/f (xn),

(9.9)

for n = 0, 1, 2 . . ., whence—if the initial guess x0 is good enough—the sequence (xn) converges to the desired solution. So to reciprocate a real number a > 0, one is trying to solve 1/x − a = 0, so that an appropriate iteration would be

xn+1 = 2xn − axn2 .

(9.10)

Assuming that this Newton iteration for reciprocals is successful (see Exercise 9.13), we see that the real number 1/a can be obtained to arbitrary accuracy with multiplies alone. To calculate a general real division b/a, one simply multiplies b by the reciprocal 1/a, so that general division in real numbers can be done in this way via multiplies alone.

But can the Newton method be applied to the problem of integer div? Indeed it can, provided that we proceed with care in the deﬁnition of a generalized reciprocal for integer division. We ﬁrst introduce a function B(N ), deﬁned for nonnegative integers N as the number of bits in the binary representation of N , except that B(0) = 0. Thus, B(1) = 1, B(2) = B(3) = 2, and so on. Next we establish a generalized reciprocal; instead of reciprocals 1/a for real a, we consider a generalized reciprocal of integer N as the integer part of an appropriate large power of 2 divided by N .

Deﬁnition 9.2.7. The generalized reciprocal R(N ) is deﬁned for positive integers N as 4B(N −1)/N .

9.2 Enhancements to modular arithmetic

451

The reason for the particular power in the deﬁnition is to allow our eventual general div algorithm to function. Next, we give a method for rapid computation of R(N ), based on multiplies, adds, and subtracts alone:

Algorithm 9.2.8 (Generalized reciprocation). This algorithm returns R(N ) for positive integer N .

1. [Initialize]

b = B(N − 1); r = 2b; s = r;

2. [Perform discrete Newton iteration] r = 2r − N r2/2b /2b ;

if(r ≤ s) goto [Adjust result]; s = r;

goto [Perform discrete Newton iteration];

3. [Adjust result]

y = 4b − N r; while(y < 0) { r = r − 1; y = y + N ;

}

return r;

Note that Algorithm 9.2.8 involves a possible “repair” of the ﬁnal return value, in the form of the while(y < 0) loop. This is a key to making the algorithm precise, as we see in the proof of the following theorem:

Theorem 9.2.9 (Generalized reciprocal iteration). The reciprocation Algorithm 9.2.8 works; that is, the returned value is R(N ).

Proof. We have

2b−1 < N ≤ 2b.

Let c = 4b/N , so that R(N ) = c . Let

f (r) = 2r −	N		r2	,
	2b		2b

and let g(r) = 2r − N r2/4b = 2r − r2/c. Since deleting the ﬂoor functions in the deﬁnition of f (r) gives us g(r), and since N/2b ≤ 1, we have

g(r) ≤ f (r) < g(r) + 2

for every r.

Since g(r) = c − (c − r)2/c, we have

c − (c − r)2/c ≤ f (r) < c − (c − r)2/c + 2.

We conclude that f (r) < c + 2 for all r. Further, if r < c, then

f (r) ≥ g(r) = 2r − r2/c > r.

<<< < Предыдущая 34 35 36 37 38 39 40 41 42 43 44 4546 / 6146 47 48 49 50 51 52 53 54 55 56 57 58 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
15.04.2019144.38 Кб5prakticheskie_zanyatia_XIXvek_IIItret.doc
#
01.05.2025172.54 Кб1PRAKTIKA.doc
#
27.09.201958.37 Кб11Praktika_1.doc
#
23.09.2019165.38 Кб1Pravovoe.doc
#
01.04.2025645.12 Кб2Predmet_i_zadachi_istorii_psikhologii.doc
#
23.03.20152.99 Mб50Prime Numbers.pdf
#
11.11.201972.7 Кб3PRIMERN_J_PEREChEN_TEM_DOKLADOV_PO_DISTsIPLINE.doc
#
08.03.201627.33 Кб15Printsip_razdelenia_vlastey_Зиябоева А.Р.docx
#
20.11.2019407.55 Кб2Programma_Eff.doc
#
23.03.2015212.48 Кб12Programma_i_plany_semin_zan.doc
#
26.09.2019140.8 Кб0Programma_realizm_OZO_2011.doc