Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Челябинский Государственный Университет

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Prime Numbers

.pdf

Скачиваний:

Добавлен:

23.03.2015

Размер:

2.99 Mб

Скачать

☆

<<< < Предыдущая 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 5253 / 6153 54 55 56 57 58 59 60 61 > Следующая >>>

512 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

multiplication loops, one need not handle terms of degree higher than indicated. In convolution-theory language, we are therefore doing “half-cyclic” convolutions, so when transform methods are used, there is also gain to be realized because of the truncation.

As is typical of Newton methods, the dynamical precision degree n essentially doubles on each pass of the Newton loop. Let us give an example of the workings of the algorithm. Take

x(t) = 1 + t + t2 + 4t3

and call the algorithm to output R[x, 8]. Then the values of g(t) at the end of each pass of the Newton loop come out as

1 − t,

1 − t − 3t3,

1 − t − 3t3 + 7t4 − 4t5 + 9t6 − 33t7,

1 − t − 3t3 + 7t4 − 4t5 + 9t6 − 33t7 + 40t8,

and		indeed,		this		last		output of g(t) multiplied by the original x(t) is
1 +	43t9		− 92t	10	+ 160t		11	, showing that the last output g(t) is correct through
	8	).
O(t

Polynomial remaindering (polynomial mod operation) can be performed in much the same way as some of our mod algorithms for integers used a “reciprocal.” However, it is not always possible to divide one polynomial by another and get a unique and legitimate remainder: This can depend on the ring of coe cients for the polynomials. However, if the divisor polynomial has its high coe cient invertible in the ring, then there is no problem with divide and remainder; see the discussion in Section 2.2.1. For simplicity, we shall restrict to the case that the divisor polynomial is monic, that is, the high coe cient is 1, since generalizing is straightforward. Assume that x(t), y(t) are polynomials and that y(t) is monic. Then there are unique polynomials q(t), r(t) such that

x(t) = q(t)y(t) + r(t), and r = 0 or deg(r) < deg(x). We shall write

r(t) = x(t) mod y(t),

and view q(t) as the quotient and r(t) as the remainder. Incidentally, for some polynomial operations one demands that coe cients lie in a ﬁeld, for example in the evaluation of polynomial gcd’s, but many polynomial operations do not require ﬁeld coe cients. Before exhibiting a fast polynomial remaindering algorithm, we establish some nomenclature:

Deﬁnition 9.6.3 (Polynomial operations).		Let x(t) =			D−1	xj t	j	be a
Deﬁnition 9.6.3 (Polynomial operations).		Let x(t) =			j=0	xj t		be a
polynomial. We deﬁne the reversal of x by degree d as the				polynomial
polynomial. We deﬁne the reversal of x by degree d as the
	d
rev(x, d) =	j		tj ,
rev(x, d) =	x	d−j	tj ,
	=0	d−j
	=0

9.6 Polynomial arithmetic

513

where it is understood that xj = 0 for all j > D − 1. We also deﬁne a polynomial index function as

ind(x, d) = min{j : j ≥ d; xj = 0},

or ind(x, d) = 0 if the stated set of j is empty.

For example,

rev(1 + 3t2 + 6t3 + 9t5 + t6, 3) = 6 + 3t + t3, ind(1 + 3t2 + 6t3, 1) = 2.

A remaindering algorithm can now be given:

Algorithm 9.6.4 (Fast polynomial mod). Let x(t), y(t) be given polynomials with y(t) monic (high coe cient is 1). This algorithm returns the polynomial remainder x(t) mod y(t).

1. [Initialize]

if(deg(y) == 0) return 0; d = deg(x) − deg(y); if(d < 0) return x;

2. [Reversals]

X = rev(x, deg(x));

Y= rev(y, deg(y));

3.[Reciprocation]

q= R[Y, d];

4.[Multiplication and reduction]

q= (qX) mod td+1;

r = X − qY ;

i = ind(r, d + 1); r = r/ti;

return rev(r, deg(x) − i);

//Via Algorithm 9.6.2.

//Multiply and truncate after degree d.

The proof that this algorithm works is somewhat intricate, but it is clear that the basic idea of the Barrett integer mod is at work here; the calculation r = X − qY is similar to the manipulations done with generalized integer reciprocals in the Barrett method.

As for the complexity of Algorithm 9.6.4, note that like the Barrett method, the whole procedure is driven by polynomial multiplication. Thus, polynomial mod performed in this way has the same complexity as polynomial multiplication.

The challenge of fast polynomial gcd operations is an interesting one. There is a direct analogue to the Euclid integer gcd algorithm, namely, Algorithm 2.2.2. Furthermore, the complicated recursive Algorithm 9.4.6 is, perhaps surprisingly, actually simpler for polynomials than for integers [Aho et al. 1974, pp. 300–310]. We should point out also that some authors attribute the recursive idea, originally for polynomial gcd’s, to the paper [Moenck 1973].

514 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

Whatever method used for polynomial gcd, the fast polynomial remaindering scheme of this section can be applied as desired for the internal polynomial mod operations.

9.6.3Polynomial evaluation

We next discuss polynomial evaluation techniques. The essential problem is

to evaluate a polynomial x(t) = D−1 xj tj at, say, each of n ﬁeld values

j=0

t0, . . . , tn−1. It turns out that the entire sequence (x(t0), x(t1), . . . , x(tn−1)) can be evaluated in

O n ln2 min{n, D}

ﬁeld operations. We shall split the problem into three basic cases:

(1)The arguments t0, . . . , tn−1 lie in arithmetic progression.

(2)The arguments t0, . . . , tn−1 lie in geometric progression.

(3)The arguments t0, . . . , tn−1 are arbitrary.

Of course, case (3) covers the other two, but in (1), (2) it can happen that special enhancements apply.

Algorithm 9.6.5		(Evaluation of polynomial on arithmetic progression).
	D−1	j
d), x(a + 2d), . . . , x(a + (n − 1)d). (The method attains its best e ciency when
Let x(t) =	j=0	xj t . This algorithm returns the n evaluations x(a), x(a +

n is much greater than D.)

1. [Evaluate at ﬁrst D points]

for(0 ≤ j < D) ej = x(a + jd);

2. [Create di erence tableau] for(1 ≤ q < D) {

for(D − 1 ≥ k ≥ q) ek = ek − ek−1;

}

3. [Operate over tableau]

E0 = e0;

for(1 ≤ q < n) {

Eq = Eq−1 + e1;

for(1 ≤ k < D − 1) ek = ek + ek+1;

}

return (Eq ), q [0, n − 1];

A variant of this algorithm has been used in searches for Wilson primes (see Exercise 9.73, where computational complexity issues are also discussed).

Next, assume that evaluation points lie in geometric progression,	say
tk = T k for some constant T , so we need to evaluate every sum xj T kj	for

k [0, D − 1]. There is a so-called Bluestein trick, by which one transforms

9.6 Polynomial arithmetic

515

such sums according to

j	xj T kj = T −k2/2	j	xj T −j2/2 T (−k−j)2/2,

and thus calculates the left-hand sum via the convolution implicit in the righthand sum. However, in certain settings it is somewhat more convenient to avoid halving the squares in the exponents, relying instead on properties of the triangular numbers ∆n = n(n + 1)/2. Two relevant algebraic properties of these numbers are

∆α+β = ∆α + ∆β + αβ, ∆α = ∆−α−1.

A variant of the Bluestein trick can accordingly be derived as

xj T kj = T ∆−k j

xj T ∆j T −∆−(k−j) .

Now the implicit convolution can be performed using only integral powers of the T constant. Moreover, we can employ an e cient, cyclic convolution by carefully embedding the x signal in a longer, zero-padded signal and reindexing, as in the following algorithm.

Algorithm 9.6.6				(Evaluation of polynomial on geometric progression).
This			D−1	j		)	, k	[0, D 1].
Let x(t) =			j=0 xj t , and let T have an inverse in the arithmetic domain.
	algorithm returns the sequence of values				x(T k			−
1. [Initialize]
	Choose N = 2n such that N ≥ 2D;
	for(0 ≤ j < D) xj = xj T ∆j ;						// Weight the signal x.
	Zero-pad x = (xj ) to have length N ;
	y =	T −∆N/2−j−1 , j [0, N − 1];			// Create symmetrical signal y.

2.[Length-N cyclic convolution] z = x × y;

3.[Final assembly of evaluation results]

return x(T k) = T ∆k−1 zN/2+k−1 , k [0, D − 1];	k	)
We see that a single convolution serves to evaluate all of the values x(T		)

at once. It is clear that the complexity of the entire evaluation is O(D ln D) ﬁeld operations. One important observation is that an actual DFT is just such an evaluation over a geometric progression; namely, the DFT of (xj ) is the sequence x(g−k) , where g is the appropriate root of unity for the transform. So Algorithm 9.6.6 is telling us that evaluations over geometric progressions are, except perhaps for the minor penalty of zero-padding and so on, essentially of FFT complexity given only that g is invertible. It is likewise clear that any FFT can be embedded in a convolution of power-of-two length,

516 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

and so require at most three FFTs of that padded length (note that in some scenarios the y signal’s symmetry allows further optimization).

The third, and most general, case of polynomial evaluation starts from the observation that polynomial remaindering can be used to decimate the evaluation procedure. Say that x(t) has degree D − 1 and is to be evaluated at the points t0, t1, . . . , tD−1. Let us simplify by assuming that d is a power of two. If we deﬁne two polynomials, each of essentially half the degree of x, by

y0(t) = (t − t0)(t − t1) . . . (t − tD/2−1),

y1(t) = (t − tD/2)(t − tD/2+1) . . . (t − tD−1),

then we can write the original polynomial in quotient–remainder form as

x(t) = q0(t)y0(t) + r0(t) = q1(t)y1(t) + r1(t).

But this means that a desired evaluation x(tj ) is either r0(tj ) (for j < D/2) or r1(tj ) (for j ≥ D/2). So the problem of evaluating the degree-(D − 1) polynomial x comes down to two copies of the simpler problem: Evaluate a degree-(about D/2) polynomial at about D/2 points. The recursive algorithm runs as follows:

Algorithm 9.6.7		(Evaluation of a polynomial at arbitrary points).
Let x(t) =	D−1	xj t		j	. This algorithm, via a recursive function eval, returns
Let x(t) =	j=0	xj t			. This algorithm, via a recursive function eval, returns
all the values of x(t					for arbitrary points t0, . . . , tD−1.	Let T denote the se-
			j )		for arbitrary points t0, . . . , tD−1.		k	, yet simple options
quence (t0, . . . , tD−1). For convenience, we assume D = 2								, yet simple options
will generalize to other D (see Exercise 9.76).
1. [Set breakover]
δ = 4;					// Or whatever classical evaluation threshold is best.

2.[Recursive eval function] eval(x, T ) {

d = len(x);

3.[Check breakover threshold for recursion exit]

//Next, use literal evaluation at the ti in small cases.

4.	if(len(T ) ≤ δ) return (x(t0), x(t1), . . . , x(td−1));
	[Split the signal into halves]
	u = L(T );					// Low half of signal.
	v = H(T );					// High half.
5.	[Assemble half-polynomials]
	z(t) =			d/2−1
			m=0		(t − vm);
	w(t) =			m=0		(t − um);
				d/2−1
6.	[Modular	reduction]

	a(t) = x(t) mod w(t);
	b(t) = x(t) mod z(t);
	return		eval(a, u) eval(b, v);

}

9.6 Polynomial arithmetic

517

Note that in the calculations of w(t), z(t) the intent is that the product must be expanded, to render w, z as signals of coe cients. The operations to expand these products must be taken into account in any proper complexity estimate for this evaluation algorithm (see Exercise 9.75). Along such lines, note that an especially e cient way to implement Algorithm 9.6.7 is to preconstruct a polynomial remainder tree; that is, to exploit the fact that the polynomials in Step [Assemble half-polynomials] have been calculated from their own respective halves, and so on.

To lend support to the reader who desires to try this general evaluation Algorithm 9.6.7, let us give an example of its workings. Consider the task of calculating the number 64! not by the usual, sequential multiplication of successive integers but by evaluating the polynomial

x(t) = t(1 + t)(2 + t)(3 + t)(4 + t)(5 + t)(6 + t)(7 + t)

= 5040t + 13068t2 + 13132t3 + 6769t4 + 1960t5322t6 + 28t7 + t8

at the 8 points

T = (1, 9, 17, 25, 33, 41, 49, 57)

and then taking the product of the eight evaluations to get the factorial. Since the algorithm is fully recursive, tracing is nontrivial. However, if we assign b = 2, say, in Step [Set breakover] and print out the half-polynomials w, z and polynomial-mod results a, b right after these entities are established, then our output should look as follows. On the ﬁrst pass of eval we obtain

w(t) = 3825 − 4628t + 854t2 − 52t3,

z(t) = 3778929 − 350100t + 11990t2 − 180t3 + t4,

a(t) = x(t) mod w(t)

=−14821569000 + 17447650500t − 2735641440t2 + 109600260t3, b(t) = x(t) mod z(t)

=−791762564494440 + 63916714435140t − 1735304951520t2

+16010208900t3,

and for each of a, b there will be further recursive passes of eval. If we keep tracing in this way, the subsequent passes reveal

w(t) = 9 − 10t + t2, z(t) = 425 − 42t + t2,

a(t) = −64819440 + 64859760t,

b(t) = −808538598000 + 49305458160t,

and, continuing in recursive order,

w(t) = 1353 − 74t + t2, z(t) = 2793 − 106t + t2,

a(t) = −46869100573680 + 1514239317360t,

b(t) = −685006261415280 + 15148583316720t.

518 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

There are no more recursive levels (for our example choice b = 2) because the eval function will break over to some classical method such as an easy instance of Horner’s rule and evaluate these last a(t), b(t) values directly, each one at four t = ti values. The ﬁnal returned entity from eval turns out to be the sequence

(x(t0), . . . , x(t7)) = (40320, 518918400, 29654190720, 424097856000,

3100796899200, 15214711438080, 57274321104000, 178462987637760).

Indeed, the product of these eight values is exactly 64!, as expected. One should note that in such a “product” operation—where evaluations are eventually all multiplied together—the last phase of the eval function need not return a union of two signals, but may instead return the product eval(a, u) eval(b, v). If that is the designer’s choice, then the step [Check breakover threshold . . .] must also return the product of the indicated x(ti).

Incidentally, polynomial coe cients do not necessarily grow large as the above example seems to suggest. For one thing, when working on such as a factoring problem, one will typically be reducing all coe cients modulo some N , at every level. And there is a clean way to handle the problem of evaluating x(t) of degree D at some smaller number of points, say at t0, . . . , tn−1 with n < D. One can simply calculate a new polynomial s as the remainder

s(t) = x(t) mod	n−1(t − tj )	,
	j
	=0

whence evaluation of s (whose degree is now about n) at the n given points ti will su ce.

9.7Exercises

9.1.Show that both the base-B and balanced base-B representations are unique. That is, for any nonnegative integer x, there is one and only one collection of digits corresponding to each deﬁnition.

9.2.Although this chapter has started with multiplication, it is worthwhile to look at least once at simple addition and subtraction, especially in view of signed arithmetic.

(1) Assuming a base-B representation for each of two nonnegative integers x, y, give an explicit algorithm for calculating the sum x + y, digit by digit, so that this sum ends up also in base-B representation.

(2)Invoke the notion of signed-integer arithmetic, by arguing that to get general sums and di erences of integers of any signs, all one needs is the summation algorithm of (1), and one other algorithm, namely, to calculate the di erence x− y when x ≥ y ≥ 0. (That is, every add/subtract problem can be put into one of two forms, with an overall sign decision on the result.)

9.7 Exercises

519

(3)Write out complete algorithms for addition and subtraction of integers in base B, with signs arbitrary.

9.3.Assume that each of two nonnegative integers x, y is given in balanced base-B representation. Give an explicit algorithm for calculating the sum x + y, digit by digit, but always staying entirely within the balanced base-B representation for the sum. Then write out a such a self-consistent multiply algorithm for balanced representations.

9.4.It is known to children that multiplication can be e ected via addition alone, as in 3 · 5 = 5 + 5 + 5. This simple notion can actually have practical import in some scenarios (actually, for some machines, especially older machines where word multiply is especially costly), as seen in the following tasks, where we study how to use storage tricks to reduce the amount of calculation during a large-integer multiply. Consider the multiplication of D-digit, base-(B = 2b) integers of size 2n, so that n ≈ bD. For the tasks below, deﬁne a “word” operation (word multiply or word add) as one involving two size-B operands (each having b bits).

(1)Argue ﬁrst that standard grammar-school multiply, whereby one constructs via word multiplies a parallelogram and then adds up the columns via word adds, requires O(D2) word multiplies and O(D2) word adds.

(2)Noting that there can be at most B possible rows of the parallelogram, argue that all possible rows can be precomputed in such a way that the full multiply requires O(BD) word multiplies and O(D2) word adds.

(3)Now argue that the precomputation of all possible rows of the parallelogram can be done with successive additions and no multiplies of any kind, so that the overall multiply can be done in O(D2 + BD) word adds.

(4)Argue that the grammar-school paradigm of task (1) above can be done with O(n) bits of temporary memory. What, then, are the respective memory requirements for tasks (2), (3)?

If one desires to create an example program, here is a possible task: Express large integers in base B = 256 = 28 and implement via machine task (2) above, using a 256-integer precomputed lookup table of possible rows to create the usual parallelogram. Such a scheme may well be slower than other largeinteger methods, but as we have intimated, a machine with especially slow word multiply can beneﬁt from these ideas.

9.5. Write out an explicit algorithm (or an actual program) that uses the wn relation (9.3) to e ect multiple-precision squaring in about half a multipleprecision multiply time. Note that you do not need to subtract out the term δn explicitly, if you elect instead to modify slightly the i sum. The basic point is that the grammar-school rhombus is e ectively cut (about) in half. This exercise is not as trivial as it may sound—there are precision considerations attendant on the possibility of huge column sums.

520 Chapter 9 FAST ALGORITHMS FOR LARGE-INTEGER ARITHMETIC

9.6.Use the identity (9.4) to write a program that calculates any product xy for each of x, y having at most 15 binary bits, using only table lookups, add/subtracts, shifts, and involving no more than 221 bits of table storage. (Hint: The identity of the text can be used after one computes a certain lookup table.)

9.7.Modify the binary divide algorithm (9.1.3) so that the value x mod N is also returned. Note that one could just use equation (9.5), but there is a way to use the local variables of the algorithm itself, and avoid the multiply by N .

9.8.Prove that Arazi’s prescription (Algorithm 9.1.4) for simple modular multiplication indeed returns the value (xy) mod N .

9.9.Work out an algorithm similar to Algorithm 9.1.3 for bases B = 2k, for k > 1. Can this be done without explicit multiplies?

9.10.Prove Theorem 9.2.1. Then prove an extension: that the di erence y/R − (xR−1) mod N is one of {0, N, 2N, . . . , (1 + x/(RN ) )N }.

9.11.Prove Theorem 9.2.4. Then develop and prove a corollary for powering, of which equation (9.8) would be the special case of cubing.

9.12.In using the Montgomery rules, one has to precompute the residue N = (−N −1) mod R. In the case that R = 2s and N is odd, show that the

Newton iteration (9.10) with a set at −N , with initial value −N mod 8, and the iteration thought of as a congruence modulo R, quickly converges to N . In particular, show how the earlier iterates can be performed modulo smaller powers of 2, so that the total work involved, assuming naive multiplication and squaring, can be e ected with about 4/3 of an s-bit multiply and about 1/3 of an s-bit square operation. Since part of each product involved is obliterated by the mod reduction, show how the work involved can be reduced further. Contrast this method with a traditional inverse calculation.

9.13.We have indicated that Newton iterations, while e cient, involve adroit choices of initial values. For the reciprocation of real numbers, equation (9.10), describe rigorously the range of initial guesses for a given positive real a, such that the Newton iteration indeed causes x to converge to 1/a.

9.14.We have observed that with Newton iteration one may “divide using multiplication alone.” It turns out that one may also take square roots in the same spirit. Consider the coupled Newton iteration

x = y = 1; do {

x = x/2 + (1 + a)y/2;

y= 2y − xy2;

}

9.7 Exercises

521

where “do” simply means one repeats what is in the braces for some appropriate total iteration count. Note that the duplication of the y iteration

is intentional! Show that this scheme formally generates the binomial series of

√

1 + a via the variable x. How many correct terms obtain after k iterations of the do loop?

Next, calculate some real-valued square roots in this way, noting the important restriction that |a| cannot be too large, lest divergence occur (the formal correctness of the resulting series in powers of a does not, of course, automatically guarantee convergence).

Then, consider this question: Can one use these ideas to create an algorithm for extracting integer square roots? This could be a replacement for Algorithm 9.2.11; the latter, we note, does involve explicit division. On this question it may be helpful to consider, for given n to be square-rooted, such as n/4q = 2−q √n or some similar construct, to keep convergence under control.

Incidentally, it is of interest that the standard, real-domain, Newton iteration for the inverse square root automatically has division-free form, yet we appear to be compelled to invoke such as the above coupled-variable expedient for a positive fractional power.

9.15. The Cullen numbers are Cn = n2n + 1. Write a Montgomery powering program speciﬁcally tailored to ﬁnd composite Cullen numbers, via relations such as 2Cn −1 ≡1 (mod Cn). For example, within the powering algorithm for modulus N = C245 you would be taking say R = 2253 so that R > N . You could observe, for example, that C141 is a base-2 pseudoprime in this way (it is actually a prime). A much larger example of a Cullen prime is Wilfrid Keller’s C18496. For more on Cullen numbers see Exercise 1.83.

9.16. Say that we wish to evaluate 1/3 using the Newton reciprocation of the text (among real numbers, so that the result will be 0.3333 . . .). For initial guess x0 = 1/2, prove that for positive n the n-th iterate xn is in fact

22n − 1 xn = 3 · 22n ,

in this way revealing the quadratic-convergence property of a successful Newton loop. The fact that a closed-form expression can even be given for the Newton iterates is interesting in itself. Such closed forms are rare—can you ﬁnd any others?

9.17. Work out the asymptotic complexity of Algorithm 9.2.8, in terms of a size-N multiply, and assuming all the shifting enhancements discussed in the text. Then give the asymptotic complexity of the composite operation (xy) mod N , for 0 ≤ x, y < N , in the case that the generalized reciprocal is not yet known. What is the complexity for (xy) mod N if the reciprocal is known? (This should be asymptotically the same as the composite Montgomery operation (xy) mod N if one ignores the precomputations attendant to the latter.) Incidentally, in actual programs that invoke the Newton–Barrett ideas,

<<< < Предыдущая 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 5253 / 6153 54 55 56 57 58 59 60 61 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
15.04.2019144.38 Кб5prakticheskie_zanyatia_XIXvek_IIItret.doc
#
01.05.2025172.54 Кб1PRAKTIKA.doc
#
27.09.201958.37 Кб11Praktika_1.doc
#
23.09.2019165.38 Кб1Pravovoe.doc
#
01.04.2025645.12 Кб2Predmet_i_zadachi_istorii_psikhologii.doc
#
23.03.20152.99 Mб50Prime Numbers.pdf
#
11.11.201972.7 Кб3PRIMERN_J_PEREChEN_TEM_DOKLADOV_PO_DISTsIPLINE.doc
#
08.03.201627.33 Кб15Printsip_razdelenia_vlastey_Зиябоева А.Р.docx
#
20.11.2019407.55 Кб2Programma_Eff.doc
#
23.03.2015212.48 Кб12Programma_i_plany_semin_zan.doc
#
26.09.2019140.8 Кб0Programma_realizm_OZO_2011.doc