
Prime Numbers
.pdf442 |
Chapter 8 THE UBIQUITY OF PRIME NUMBERS |
we have toured. But there could be a variant that is easier to implement. For example, it is not unreasonable to presume that the very first working QTM DL/factoring solvers might make use of one of the currently lesspopular methods, in favor of simplicity. Observe that rho methods involve very little beyond modular squaring and adding. (As with many factoring algorithm candidates for QTM implementation, the eventual gcd operations could just be classical.) What is more, at the very heart of rho methods lives the phenomenon of periodicity, and as we have seen, QTMs are periodicity detectors par excellence.
Chapter 9
FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC
In this chapter we explore the galaxy of “fast” algorithms that admit of applications in prime number and factorization computations. In modern times, it is of paramount importance to be able to manipulate multipleprecision integers, meaning integers that in practice, on prevailing machinery, have to be broken up into pieces, with machine operations to involve those pieces, with a view to eventual reassembly of desired results. Although multiple-precision addition and subtraction of integers is quite common in numerical studies, we assume that notions of these very simple fundamental operations are understood, and start with multiplication, which is perhaps the simplest arithmetic algorithm whose classical form admits of genuine enhancements.
9.1 Tour of “grammar-school” methods
9.1.1Multiplication
One of the most common technical aspects of our culture is the classical, or shall we say “grammar-school,” method of long multiplication. Though we shall eventually concentrate on fast, modern methods of remarkable e ciency, the grammar-school multiply remains important, especially when the relevant integers are not too large, and itself allows some speed enhancements. In the typical manifestation of the algorithm, one simply writes out, one below the other, the two integers to be multiplied, then constructs a parallelogram of digitwise products. Actually, the parallelogram is a rhombus, and to complete the multiply we need only add up the columns of the rhombus, with carry. If each of x, y to be multiplied has D digits in some given base B (also called the “radix”), then the total number of operations required to calculate xy is O(D2), because that is how many entries appear in the rhombus. Here, an “operation” is either a multiply or an add of two numbers each of size B. We shall refer to such a fundamental, digitwise, multiply as a “size-B multiply.”
A formal exposition of grammar-school multiply is simple but illuminating, especially in view of later enhancements. We start with two definitions:
Definition 9.1.1. The base-B representation of a nonnegative integer x is the shortest sequence of integer digits (xi) such that each digit satisfies
444 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC
0 ≤ xi < B, and
D−1
x = xiBi.
i=0
Definition 9.1.2. The balanced base-B representation of a nonnegative integer x is the shortest sequence of integer digits (xi) such that each digit satisfies − B/2 ≤ xi ≤ (B − 1)/2 , and
D−1 |
|
i |
|
x = |
xiBi. |
=0 |
|
Say we wish to calculate a product z = xy for x, y both nonnegative. Upon contemplation of the grammar-school rhombus, it becomes evident that given x, y in base-B representation, say, we end up summing columns to construct integers
i |
|
(9.1) |
wn = |
xiyj , |
+j=n
where i, j run through all indices in the respective digit lists for x, y. Now the sequence (wn) is not generally yet the base-B representation of the product z. What we need to do, of course, is to perform the wn additions with carry. The carry operation is best understood the way we understood it in grammar school: A column sum wn a ects not only the final digit zn, but sometimes higher-order digits beyond this. Thus, for example, if w0 is equal to B + 5, then z0 will be 5, but a 1 must be added into z1; that is, a carry occurs.
These notions of carry are, of course, elementary, but we have stated them because such considerations figure strongly into modern enhancements to this basic multiply. In actual experience, the carry considerations can be more delicate and, for the programmer, more troublesome than any other part of the algorithm.
9.1.2Squaring
From the computational perspective, the connection between multiplication and squaring is interesting. We expect the operation xx to involve generally more redundancy than an arbitrary product xy, so that squaring should be easier than general multiplication. Indeed, this intuition turns out to be correct. Say that x has D digits in base B representation, and note that (9.1) can be rewritten for the case of squaring as
n |
|
i |
(9.2) |
wn = xixn−i, |
|
=0 |
|
where n [0, D − 1]. But this sum for wn generally has reflection symmetry, and we can write
|
n/2 |
|
wn = 2 |
i |
(9.3) |
xixn−i − δn, |
||
|
=0 |
|
9.1 Tour of “grammar-school” methods |
445 |
where δn is 0 for n odd, else x2n/2 for n even. It is clear that each column component wn involves about half the size-B multiplies required for the general multiplication algorithm. Of course, final carry operations must be performed on the wn, to get the final digits zn of the product z = x2, but in most practical instances, this squaring is indeed roughly twice as fast as a multiple-precision multiply. There exist in the literature some very readable expositions of the squaring algorithm and related algorithms. See, for example, [Menezes et al. 1997].
There is an elegant, if simple, argument showing that general multiplication has no more than twice the complexity of squaring. One invokes the identity
4xy = (x + y)2 − (x − y)2, |
(9.4) |
which indicates that a multiplication can be e ected by two squarings and a divide by four, this final divide presumed trivial (as, say, a right-shift by two bits). This observation is not just academic, for in certain practical scenarios this algebraic rule may be exploited (see Exercise 9.6).
9.1.3Div and mod
Div and mod operations are omnipresent in prime-number and factorization studies. These operations often occur in combination with multiplication, in fact, this symbiosis is exploited in some of the algorithms we shall describe. It is quite common that one spends computation e ort on operations such as xy (mod p), for primes p, or in factorization studies xy (mod N ) where N is to be factored.
It is a primary observation that the mod operation can hinge on the div operation. We shall use, as before, the notation x mod N to denote the operation that results in the least nonnegative residue of x (mod N ), while the greatest integer in x/N , denoted by x/N , is the div result. (In some computer languages these operations are written “x%N ” and “x div N ,” respectively, while in others the integer divide “x/N ” means just div, while in yet others the div is “Floor[x/N ],” and so on.) For integers x and positive integers N , a basic relation in our present notation is
x mod N = x − N x/N . |
(9.5) |
Note that this relation is equivalent to the quotient–remainder decomposition x = qN + r, with q, r being respectively the div and mod results under consideration. So the div operation begets the mod, and we can proceed with algorithm descriptions for div.
Analogous to “grammar-school” multiplication is, of course, the elementary method of long division. It is fruitful to contemplate even this simple long division algorithm, with a view to enhancements. In the normal execution of long division in a given base B, the divisor N is first justified to the left, with respect to the dividend x. That is to say, a power Bb of the base is found such that m = BbN ≤ x < Bb+1N . Then one finds x/m , which
446 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC
quotient is guaranteed to be in the interval [1, B − 1]. The quotient here is, of course, the leading base-B digit of the final div result. One then replaces x with x − m x/m , and divides m by B, that is, shifts m down by one digit, and so on recursively. This sketch shows us right o that for certain bases B, things are relatively simple. In fact, if one adopts binary representations (B = 2), then a complete div algorithm can be e ected such that there are no multiplies at all. The method can actually be of practical interest, especially on machinery that has addition, subtraction, bit-shifting (left-shift means multiply-by-2, right-shift means divide-by-2), but little else in the way of operations. Explicitly, we proceed as follows:
Algorithm 9.1.3 (Classical binary divide). Given positive integers x ≥ N , this algorithm performs the div operation, returning x/N . (See Exercise 9.7 for the matter of also returning the value x mod N .)
1. [Initialize]
Find the unique integer b such that 2bN ≤ x < 2b+1N ;
//This can be done by successive left-shifts of the binary representation of N , or better, by comparing the bit lengths of x, N and possibly doing an extra shift.
m = 2bN ; c = 0;
2.[Loop over b bits] for(0 ≤ j ≤ b) {
c = 2c;
a = x − m; if(a ≥ 0) {
c = c + 1;
x = a;
}
m = m/2;
}
return c;
A similar binary approach can be used to e ect the common “mul-mod” operation (xy) mod N , where we have adapted the treatment in [Arazi 1994]:
Algorithm 9.1.4 (Binary mul-mod). We are given positive integers x, y with 0 ≤ x, y < N . This algorithm returns the composite operation (xy) mod N . We assume the base-2 representation of Definition 9.1.1 for x, so that the binary bits of x are (x0, . . . , xD−1), with xD−1 > 0 being the high bit.
1.[Initialize] s = 0;
2.[Loop over D bits]
for(D − 1 ≥ j ≥ 0) { s = 2s;
if(s ≥ N ) s = s − N ; if(xj == 1) s = s + y;
9.2 Enhancements to modular arithmetic |
447 |
if(s ≥ N ) s = s − N ;
}
return s;
The binary divide and mul-mod algorithms, though illuminating, su er from a basic practical shortcoming: One is not taking due advantage of multiple-bit arithmetic as is commonly available on any reasonably powerful computer. One would like to perform multiple-bit operations within machine registers, rather than just operating one bit at a time. For this reason, larger bases than B = 2 are usually used, and many modern div implementations invoke “Algorithm D,” see [Knuth 1981, p. 257], which is a finely tuned version of the classical long division. That algorithm is a good example of one that has more pseudocode complexity than does our binary Algorithm (9.1.3), yet amounts to a great deal of optimization in actual programs.
9.2Enhancements to modular arithmetic
The classical div and mod algorithms discussed in Section 9.1.3 all involve some sort of explicit divide operation. For the binary algorithms given, this division is trivial; that is, if 0 ≤ a < 2b, then a/b is of course either 0 or 1. In the case of Knuth’s Algorithm D for higher bases than B = 2, one is compelled to estimate small div results. But there exist more modern algorithms for which no explicit division of any kind is required. The advantage of these methods to the computationalist is twofold. First, complete numbertheoretical programs can be written without relatively complicated long division; and second, the optimization of all the arithmetic can be focused onto just one aspect, namely multiplication.
9.2.1Montgomery method
An observation in [Montgomery 1985] |
has turned out to be important |
in the computational field, especially in |
situations where modular powers |
(xy ) mod N are to be calculated with optimal speed (and, as we see later, the operands are not too overwhelmingly large). Observe, first of all, that “naive” multiply-mod takes one multiply and one divide (not counting subtractions), and so the spirit of the Montgomery method—as with other methods discussed in this chapter—is to lower or, if we are lucky, remove the di culty of the divide step.
The Montgomery method, which is a generalization of an old method of Hensel for computing inverses of 2-adic numbers, stems from the following theorem, leading to e cient means for the computation of quantities (xR−1) mod N , for certain conveniently chosen R:
Theorem 9.2.1 (Montgomery). Let N, R be coprime positive integers, and define N = (−N −1) mod R. Then for any integer x, the number
y = x + N ((xN ) mod R)

448 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC
is divisible by R, with
y/R |
≡ |
xR−1 (mod N ). |
(9.6) |
|
Furthermore, if 0 ≤ x < RN , the di erence y/R − ((xR−1) mod N ) is either 0 or N .
As we shall see, Theorem 9.2.1 will be most useful when there are several or many multiplications modulo N to be performed, such as in a powering ladder, in which case the computation of the auxiliary number N is only a one-time charge for the entire calculation. When N is odd and R is a power of 2, which is often the case in applications, the “mod R” operation is trivial, as is the division by R to get y. In addition, there is an alternative way to compute N using Newton’s method; see Exercise 9.12. It may help in the case N odd and R a power of 2 to cast the basic Montgomery operation in the language of bit operations. Let R = 2s, let & denote the bitwise “and” operation, and let >> c denote “right-shift by c bits.” Then the left-hand side of equation (9.6) can be cast as
y/R = (x + N |
|
((x |
|
N )&(R |
− |
1))) >> s, |
(9.7) |
|
|
|
in which the two required multiplies are explicit.
So now, for 0 ≤ x < RN , we have a way to calculate (xR−1) mod N with a small number (two) of multiplies. This is not quite the mod result x mod N of course, but the Montgomery method applies well to the calculation of powers (xy ) mod N . The reason is that multiplication by R−1 or R on the residue system of {x : 0 ≤ x < N } results in a complete residue system (mod N ). Thus, powering arithmetic can be performed in a di erent residue system, with one initial multiply-mod operation and successive calls to a Montgomery multiplication, to yield results (mod N ). To make these ideas precise, we adopt the following definition:
Definition 9.2.2. |
For gcd(R, N ) = 1 and 0 ≤ x < N , the (R, N )-residue of |
||
x is |
x |
= (xR) mod N . |
|
Definition 9.2.3. |
The Montgomery product of two integers a, b is |
M (a, b) = (abR−1) mod N .
Then the required facts can be collected in the following theorem:
Theorem 9.2.4 (Montgomery rules). Let R, N be as in Definition 9.2.2, and 0 ≤ a, b < N . Then a mod N = M (a, 1) and M (a, b) = ab.
This theorem gives rise to the Montgomery powering technique. For example, an example corollary of the theorem is that
M (M (M ( |
|
|
|
), |
|
), 1) = x3 mod N. |
(9.8) |
x, |
x |
x |
To render the notion of general Montgomery powering explicit, we next give the relevant algorithms.

450 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC
a convenient value R = Bk may be used, whence the z value in Algorithm 9.2.5 can be obtained by looping k times and doing arithmetic (mod B) that is particularly convenient for the machine. Explicit word-oriented loops that achieve the optimal asymptotic complexity are laid out nicely in [Menezes et al. 1997].
9.2.2Newton methods
We have seen in Section 9.1 that the div operation may be e ected via additions, subtractions, and bit-shifts, although, as we have also seen, the algorithm can be bested by moving away from the binary paradigm into the domain of general base representations. Then we saw that the technique of Montgomery mod gives us an asymptotically e cient means for powering with respect to a fixed modulus. It is interesting, perhaps at first surprising, that general div and mod may be e ected via multiplications alone; that is, even the small div operations attendant to optimized div methods are obviated, as are the special precomputations of the Montgomery method.
One approach to such a general div and mod scheme is to realize that the classical Newton method for solving equations may be applied to the problem of reciprocation. Let us start with reciprocation in the domain of real numbers. If one is to solve f (x) = 0, one proceeds with an (adroit) initial guess for x, call this guess x0, and iterates
xn+1 = xn − f (xn)/f (xn), |
(9.9) |
for n = 0, 1, 2 . . ., whence—if the initial guess x0 is good enough—the sequence (xn) converges to the desired solution. So to reciprocate a real number a > 0, one is trying to solve 1/x − a = 0, so that an appropriate iteration would be
xn+1 = 2xn − axn2 . |
(9.10) |
Assuming that this Newton iteration for reciprocals is successful (see Exercise 9.13), we see that the real number 1/a can be obtained to arbitrary accuracy with multiplies alone. To calculate a general real division b/a, one simply multiplies b by the reciprocal 1/a, so that general division in real numbers can be done in this way via multiplies alone.
But can the Newton method be applied to the problem of integer div? Indeed it can, provided that we proceed with care in the definition of a generalized reciprocal for integer division. We first introduce a function B(N ), defined for nonnegative integers N as the number of bits in the binary representation of N , except that B(0) = 0. Thus, B(1) = 1, B(2) = B(3) = 2, and so on. Next we establish a generalized reciprocal; instead of reciprocals 1/a for real a, we consider a generalized reciprocal of integer N as the integer part of an appropriate large power of 2 divided by N .
Definition 9.2.7. The generalized reciprocal R(N ) is defined for positive integers N as 4B(N −1)/N .
9.2 Enhancements to modular arithmetic |
451 |
The reason for the particular power in the definition is to allow our eventual general div algorithm to function. Next, we give a method for rapid computation of R(N ), based on multiplies, adds, and subtracts alone:
Algorithm 9.2.8 (Generalized reciprocation). This algorithm returns R(N ) for positive integer N .
1. [Initialize]
b = B(N − 1); r = 2b; s = r;
2. [Perform discrete Newton iteration] r = 2r − N r2/2b /2b ;
if(r ≤ s) goto [Adjust result]; s = r;
goto [Perform discrete Newton iteration];
3. [Adjust result]
y = 4b − N r; while(y < 0) { r = r − 1; y = y + N ;
}
return r;
Note that Algorithm 9.2.8 involves a possible “repair” of the final return value, in the form of the while(y < 0) loop. This is a key to making the algorithm precise, as we see in the proof of the following theorem:
Theorem 9.2.9 (Generalized reciprocal iteration). The reciprocation Algorithm 9.2.8 works; that is, the returned value is R(N ).
Proof. We have
2b−1 < N ≤ 2b.
Let c = 4b/N , so that R(N ) = c . Let
f (r) = 2r − |
N |
|
r2 |
, |
2b |
2b |
and let g(r) = 2r − N r2/4b = 2r − r2/c. Since deleting the floor functions in the definition of f (r) gives us g(r), and since N/2b ≤ 1, we have
g(r) ≤ f (r) < g(r) + 2
for every r.
Since g(r) = c − (c − r)2/c, we have
c − (c − r)2/c ≤ f (r) < c − (c − r)2/c + 2.
We conclude that f (r) < c + 2 for all r. Further, if r < c, then
f (r) ≥ g(r) = 2r − r2/c > r.