Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Prime Numbers

.pdf
Скачиваний:
49
Добавлен:
23.03.2015
Размер:
2.99 Mб
Скачать

402

Chapter 8 THE UBIQUITY OF PRIME NUMBERS

1998], it is shown that most of the bits of xn can be kept, and the result is still cryptographically secure. There is thus much less computation per bit.

There are many other generators in current use, such as shift-register, chaotic, and cellular-automata (CA) generators. Some generators have been cryptographically “broken,” notably the simpler congruential ones, even if the linear congruence is replaced with higher polynomial forms [Lagarias 1990]. One dilemma that besets researchers in this field is that the generators that may well be quite “secure,” such as the discrete exponential variety that in turn depends on the DL problem for its security, are sluggish. Incidentally, there are various standard randomness tests, especially as regard random generation of binary bits, which can often be invoked to demolish— alternatively to bestow some measure of confidence upon—a given generator [Menezes et al. 1997].

On the issue of security, an interesting idea due to V. Miller is to use a linear-congruential generator, but with elliptic addition. Given an elliptic curve E over a finite field, one might choose integer a and point B E and iterate

Pn+1 = [a]Pn + B,

(8.1)

where the addition is elliptic addition and now the seed will be some initial point P0 E. One might then use the x-coordinate of Pn as a random field element. This scheme is not as clearly breakable as is the ordinary linear congruential scheme. It is of interest that certain multipliers a, such as powers of two, would be relatively e cient because of the implied simplicity of the elliptic multiplication ladder. Then, too, one could perhaps use reduced operations inherent in Algorithm 7.2.8. In other words, use only x-coordinates and live with the ambiguity in [a]P ± B, never actually adding points per se, but having to take square roots.

Incidentally, a di erent approach to the use of elliptic curves for random generators appears in [Gong et al. 1999], where the older ideas of shift registers and codewords are generalized to curves over F2m (see Exercise 8.29).

Along the same lines, let us discuss for a moment the problem of random bit generation. Surely, one can contemplate using some bit—such as the lowest bit—of a “good” random-number generator. But one wonders, for example, whether the calculation of Legendre symbols appropriate to point-finding on elliptic curves,

x3 + ax + b

p

= ±1,

with x running over consecutive integers in an interval and with the (rare) zero value thrown out, say, constitute a statistically acceptable random walk of ±1 values. And one wonders further whether the input of x into a Legendresymbol machine, but from a linear-congruential or other generator, provides extra randomness in any statistical sense.

Such attempts at random bit streams should be compared statistically to the simple exclusive-or bit generators. An example given in [Press et al. 1996]

8.2 Random-number generation

403

is based on the primitive polynomial (mod 2)

x18 + x5 + x2 + x + 1.

(A polynomial over a finite field F is primitive if it is irreducible and if a root is a cyclic generator for the multiplicative group of the finite field generated by the root.) If one has a “current” bit x1, and labels the previous 17 bits x2, x3, . . . , x18, then the shifting logic appropriate to the given polynomial is to form a new bit x0 according to the logic

x0 = x18, x5 = x5 x0, x2 = x2 x0, x1 = x1 x0,

where “ ” is the exclusive-or operator (equivalent to addition in the evencharacteristic field). Then all of the indices are shifted so that the new x1—the new current bit—is the x0 from the above operations. An explicit algorithm is the following:

Algorithm 8.2.7 (Simple and fast random-bit generator). This algorithm provides seeding and random functions for a random-bit generator based on the polynomial x18 + x5 + x2 + x + 1 over F2.

1. [Procedure seed]

 

seed() {

17

;

// 100000000000000000 binary.

h = 2

 

m = 20 + 21 + 24;

// Mask is 10011 binary.

Choose starting integer seed x in [1, 218]; return;

}

2. [Function random returning 0 or 1] random() {

if((x & h) = 0) { // The bitwise “and”of x, h is compared to 0. x = ((x m) << 1) | 1; // “Exclusive-or” ( ) and “or” (|) taken. return 1;

}

x = x << 1; return 0;

}

The reference [Press et al. 1996] has a listing of other polynomials (mod 2) for selected degrees up through 100.

In any comprehensive study of random number generation, one witnesses the conceptual feedback involving prime numbers. Not only do many proposed random-number generators involve primes per se, but many of the algorithms—such as some of the ones appearing in this book—use recourse

404

Chapter 8 THE UBIQUITY OF PRIME NUMBERS

to suitable random numbers. But if one lifts the requirement of statistically testable randomness as it is usually invoked, there is quite another way to use random sequences. It is to these alternatives—falling under the rubric of quasi-Monte Carlo (qMC)—to which we next turn.

8.3Quasi-Monte Carlo (qMC) methods

Who would have guessed, back in the times of Gauss, Euler, Legendre, say, that primes would attain some practical value in the financial-market analysis of the latter twentieth century? We refer here not to cryptographic uses— which certainly do emerge whenever money is involved—but quasi-Monte Carlo science which, loosely speaking, is a specific form of Monte Carlo (i.e., statistically motivated) analysis. Monte Carlo calculations pervade the fields of applied science.

The essential idea behind Monte Carlo calculation is to sample some large continuous (or even discrete, if need be) space—in doing a multidimensional integral, say—with random samples. Then one hopes that the “average” result is close to the true result one would obtain with the uncountable samples theoretically at hand. It is intriguing that number theory—in particular primenumber study—can be brought to bear on the science of quasi-Monte Carlo (qMC). The techniques of qMC di er from traditional Monte Carlo in that one does not seek expressly random sequences of samples. Instead, one attempts to provide quasirandom sequences that do not, in fact, obey the strict statistical rules of randomness, but instead have certain uniformity features attendant on the problem at hand.

Although it is perhaps overly simplistic, a clear way to envision the di erence between random and qMC is this: Random points when dropped can be expected to exhibit “clumps” and “gaps,” whereas qMC points generally avoid each other to minimize clumping and tend to occupy previous gaps. For these reasons qMC points can be—depending on the spatial dimension and precise posing of the problem—superior for certain tasks such as numerical integration, min–max problems, and statistical estimation in general.

8.3.1Discrepancy theory

Say that one wants to know the value of an integral over some D-dimensional domain R, namely

I = · · · f (x) dDx,

R

but there is no reasonable hope of a closed-form, analytic evaluation. One might proceed in Monte Carlo fashion, by dropping a total of N “random” vectors x = (x1, . . . , xD) into the integration domain, then literally adding up the corresponding integrand values to get an average, and then multiplying by the measure of R to get an approximation, say I , for the exact integral I. On the general variance principles of statistics, we can expect the error to

8.3 Quasi-Monte Carlo (qMC) methods

 

 

 

405

behave no better than

N

,

|I − I| = O

 

1

 

 

 

 

 

 

 

where of course, the implied big-O constant depends on the dimension D, the integrand f , and the domain R. It is interesting that the power law N 1/2, though, is independent of D. By contrast, a so-called “grid” method, in which we split the domain R into grid points, can be expected to behave no better than

|I − I| = O

N 1/D

,

 

1

 

 

which growth can be quite unsatisfactory, especially for large D. In fact, a grid scheme—with few exceptions—makes practical sense only for 1- or perhaps 2- dimensional numerical integration, unless there is some special consideration like well-behaved integrand, extra reasons to use a grid, and so on. It is easy to see why Monte Carlo methods using random point sets have been used for decades on numerical integration problems in D ≥ 3 dimensions.

But there is a remarkable way to improve upon direct Monte Carlo, and

in fact obtain errors such as

lnNN

 

|I − I| = O

,

 

 

D

 

or sometimes with lnD−1 powers appearing instead, depending on the implementation (we discuss this technicality in a moment). The idea is to use low-discrepancy sequences, a class of quasi-Monte Carlo (qMC) sequences (some authors define a low-discrepancy sequence as one for which the behavior of |I − I| is bounded as above; see Exercise 8.32). We stress again, an important observation is that qMC sequences are not random in the classical sense. In fact, the points belonging to qMC sequences tend to avoid each other (see Exercise 8.12).

We start our tour of qMC methods with a definition of discrepancy, where it is understood that vectors drawn out of regions R consist of real-valued components.

Definition 8.3.1. Let P be a set of at least N points in the (unit D-cube) region R = [0, 1]D. The discrepancy of P with respect to a family F of Lebesgue-measurable subregions of R is defined (neither DN nor DN is to be confused with dimension D) by

 

N (F ; P ) =

φ F

 

N

− µ(φ) ,

D

 

sup

χ(φ; P )

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

where χ(φ; P ) is the number of points of P lying in φ, and µ denotes Lebesgue measure. Furthermore, the extreme discrepancy of P is defined by

DN (P ) = DN (G; P ),

406

Chapter 8 THE UBIQUITY OF PRIME NUMBERS

D

where G is the family of subregions of the form i=1[ui, vi). In addition, the star discrepancy of P is defined by

DN (P ) = DN (H; P ),

D

where H is the family of subregions of the form i=1[0, vi). Finally, if S R is a countably infinite sequence S = (x1, x2, . . .), we define the various discrepancies DN (S) always in terms of the first N points of S.

The definition is somewhat notation-heavy, but a little thought reveals what is being sought, an assessment of “how fairly” a set P samples a region. One might have thought on the face of it that a simple equispaced grid of points would have optimal discrepancy, but in more than one dimension such intuition is misleading, as we shall see. One way to gain insight into the meaning of discrepancy is to contemplate the theorem: A countably infinite set S is equidistributed in R = [0, 1]D if and only if the star discrepancy (alternatively, the extreme discrepancy) vanishes as N → ∞. It is also the case that the star and extreme discrepancies are not that di erent; in fact, it can be shown that for any P of the above definition we have

DN (P ) ≤ DN (P ) 2DDN (P ).

Such results can be found in [Niederreiter 1992], [Tezuka 1995].

The importance of discrepancy—in particular the star discrepancy D —is immediately apparent on the basis of the following central result, which may be taken to be the centerpiece of qMC integration theory. We shall refer here to the Hardy–Krause bounded variation, which is an estimate H(f ) on the excursions of a function f . We shall not need the precise definition for H (see [Niederreiter 1992]), since the computational aspect of qMC depends mainly on the rest of the overall variation term:

Theorem 8.3.2 (Koksma–Hlawka). If a function f has bounded variation H(f ) on R = [0, 1]D, and S is as in Definition 8.3.1, then

1

N

f (x)

x S r R

f (r) dDr

 

≤ H(f )DN (S).

 

 

 

 

 

 

 

 

 

 

 

 

What is more, this inequality is optimal in the following sense: For any N - point S R and any > 0, there exists a function f with H(f ) = 1 such that the left-hand side of the inequality is bounded below by DN (S) − .

This beautiful result connects multidimensional integration errors directly to the star discrepancy DN . The quest for accurate qMC sequences will now hinge on the concept of discrepancy. Incidentally, one of the many fascinating theoretical results beyond Theorem 8.3.2 is the assessment of Wozniakowski of “average” case error bounds on the unit cube. As discussed in [Wozniakowski 1991], the statistical ensemble average—in an appropriately rigorous sense— of the integration error is closely related to discrepancy, verifying once and for

8.3 Quasi-Monte Carlo (qMC) methods

407

all that discrepancy is of profound practical importance. Moreover, there are some surprising new results that go some distance, as we shall see, to explain why actual qMC experiments are sometimes fare much better—provide far more accuracy—than the discrepancy bounds imply.

A qMC sequence S should generally be one of low D , and it is in the construction of such S that number theory becomes involved. The first thing we need to observe is that there is a subtle distinction between a point-set discrepancy and the discrepancy of a sequence. Take D = 1 dimension for example, in which case the point set

P =

2N ,

2N , . . . , 1

2N 0

 

1

 

3

 

1

 

has DN (P ) = 1/(2N ). On the other hand, there exists no countably infinite sequence S that enjoys the property DN (S) = O(1/N ). In fact, it was shown by [Schmidt 1972] that if S is countably infinite, then for infinitely many N ,

DN (S) ≥ c lnNN ,

where c is an absolute constant (i.e., independent of N and S). Actually, the constant can be taken to be c = 3/50 [Niederreiter 1992], but the main point is that the requirement of an infinite qMC sequence, from which a researcher may draw arbitrarily large numbers of contiguous samples, gives rise to special considerations of error. The point set P above with its discrepancy 1/(2N ) is allowed because, of course, the members of the sequence themselves depend on N .

8.3.2Specific qMC sequences

We

are now prepared

to construct some low-star-discrepancy sequences.

A

primary goal will

be to define a practical low-discrepancy sequence

for any given prime p, by counting in a certain clever fashion through base-p representations of integers. We shall start with a somewhat more general description for arbitrary base-B representations. For more than one dimension, a set of pairwise coprime bases will be used.

Definition 8.3.3. For an integer base B ≥ 2, the van der Corput sequence for base B is the sequence

SB = (ρB (n)) , n = 0, 1, 2, . . . ,

where ρB is the radical-inverse function, defined on nonnegative integers n, with presumed base-B representation n = i niBi, by:

ρB (n) = niB−i−1

i

408

Chapter 8 THE UBIQUITY OF PRIME NUMBERS

These sequences are easy to envision and likewise easy to generate in practice; in fact, their generation is easier than one might suspect. Say we desire the van der Corput sequence for base B = 2. Then we simply count from n = 0, in binary

n = 0, 1, 10, 11, 100, . . . ,

and form the reversals of the bits to obtain (also in binary)

S = (0.0, 0.10, 0.01, 0.11, 0.001, . . .).

To put it symbolically, if we are counting and happen to be at integer index

n = nknk−1 . . . n1n0,

then the term ρB (n) S is given by reversing the digits thus:

ρB (n) = 0.n0n1 . . . nk.

It is known that every van der Corput sequence has

DN (SB ) = O

N

,

 

 

ln N

 

where the implied big-O constant depends only on B. It turns out that B = 3 has the smallest such constant, but the main point a ecting implementations is that the constant generally increases for larger bases B [Faure 1981].

For D > 1 dimensions, it is possible to generate qMC sequences based on the van der Corput forms, in the following manner:

¯

 

 

, . . . , BD} be a set of pairwise-coprime

Definition 8.3.4. Let B = {B1, B2

 

 

 

¯

bases, each Bi > 1. We define the Halton sequence for bases B by

S ¯ = (x

n

) ,

n = 0, 1, 2, . . . ,

B

 

 

where

xn = (ρB1 (n), . . . , ρBD (n)).

In other words, a Halton sequence involves a specific base for each vector coordinate, and the respective bases are to be pairwise coprime. Thus for example, a qMC sequence of points in the (D = 3)-dimensional unit cube can be generated by choosing prime bases {B1, B2, B3} = {2, 3, 5} and counting n = 0, 1, 2, . . . in those bases simultaneously, to obtain

x0 = (0, 0, 0),

x1 = (1/2, 1/3, 1/5),x2 = (1/4, 2/3, 2/5),x3 = (3/4, 1/9, 3/5),

and so on. The manner in which these points deposit themselves in the unit 3-cube is interesting. We can see once again the basic, qualitative aspect

8.3 Quasi-Monte Carlo (qMC) methods

409

of successful qMC sequences: The points tend to drop into regions where “they have not yet been.” Contrast this to direct Monte Carlo methods, whereby—due to unbiased randomness—points will not only sometimes “clump” together, but sometimes leave “gaps” as the points accumulate in the domain of interest.

The Halton sequences are just one family of qMC sequences, as we discuss in the next section. For the moment, we exhibit a typical theorem that reveals information about how discrepancy grows as a function of the dimension:

Theorem 8.3.5 (Halton discrepancy). Denote by S ¯ a Halton sequence for

¯

B

bases B. Then the star discrepancy of the sequence satisfies

 

 

D

 

1 D

D

(S ¯ ) <

 

+

 

i

 

 

 

N

B

N

 

N =1

 

 

 

 

2 ln Bi

2

 

 

Bi 1

ln N +

Bi + 1

.

 

 

 

A rather intricate proof can be found in [Niederreiter 1992]. We observe that the theorem provides an explicit upper bound for the implied big-O constant in

D (S ¯ ) = O lnD N ,

N B N

an error behavior foreshadowed in the introductory remarks of this section. What is more, we can see the (unfortunate) e ect of larger bases supposedly contributing more to the discrepancy (we say supposedly because this is just an upper bound); indeed, this e ect for larger bases is seen in practice. We note that there is a so-called N -point Hammersley point set, for which the leading component of xn is x0 = n/N , while the rest of xn is a (D − 1)- dimensional Halton vector. This set is now N -dependent, so that it cannot be turned into an infinite sequence. However, the Hammersley set’s discrepancy takes the slightly superior form

lnD−1 N

DN = O N ,

showing how N -dependent sets can o er a slight complexity reduction.

8.3.3 Primes on Wall Street?

Testing a good qMC sequence, say estimating the volume of the unit D-ball, is an interesting exercise. The Halton qMC sequence gives good results for moderate dimensions, say for D up to about 10. One advantage of the Halton sequence is that it is easy to jump ahead, so as to have several or many computers simultaneously sampling from disjoint segments of the sequence. The following algorithm shows how one can jump in at the n-th term, and how to continue sequentially from there. To make the procedure especially e cient, the digits of the index in the various bases under consideration are constantly updated as we proceed from one index to the next.

410

Chapter 8 THE UBIQUITY OF PRIME NUMBERS

Algorithm 8.3.6 (Fast qMC sequence generation). This algorithm generates D-dimensional Halton-sequence vectors. Let p1, . . . , pD denote the first D primes. For starting index n, a seed() procedure creates xn whose components are for clarity denoted by xn[1], . . . , xn[D]. Then a random() function may be used to generate subsequent vectors xn+1, xn+2, . . ., where we assume an upper bound of N for all indices. For high e ciency, global digits (di,j ) are initially seeded to represent the starting index n, then upon subsequent calls to a random() function, are incremented in “odometer” fashion for subsequent indices exceeding n.

1. [Procedure seed] seed(n) {

for(1 ≤ i ≤ D) {

"

Ki =

ln(N +1)

ln pi

 

#

;

//n is the desired starting index.

//A precision parameter.

qi,0 = 1;

 

 

 

k = n;

 

 

 

x[i] = 0;

// x is the vector xn.

for(1 ≤ j ≤ Ki) {

// qi,j = pi

j

.

qi,j = qi,j−1/pi;

 

di,j = k mod pi;

// The di,j start as base-pi digits of n.

k = (k − di,j )/pi;

 

 

 

x[i] = x[i] + di,j qi,j ;

 

 

 

}

 

 

 

}

// xn now available as (x[1], . . . , x[D]).

return;

}

2. [Function random] random() {

for(1 ≤ i ≤ D) { for(1 ≤ j ≤ Ki) {

di,j = di,j + 1;

x[i] = x[i] + qi,j ; if(di,j < pi) break;

di,j = 0;

x[i] = x[i] − qi,j−1;

}

}

return (x[1], . . . , x[D]);

}

//Increment the “odometer.”

//Exit loop when all carries complete.

//The new x.

It is plain upon inspection that this algorithm functions as an “odometer,” with ratcheting of base-pm digits consistent with Definition 8.3.4. Note the parameters Ki, where Ki is the maximum possible number of digits, in base pi, for an integer index j. This Ki must be set in terms of some N that is at least the value of any j that would ever be reached. This caution, or an equivalent one, is necessary to limit the precision of the reverse-radix base expansions.

8.3 Quasi-Monte Carlo (qMC) methods

411

Algorithm 8.3.6 is usually used in floating-point mode, i.e., with stored floating-point inverse powers qi,j but integer digits ni,j . However, there is nothing wrong in principle with an exact generator in which actual integer powers are kept for the qi,j . In fact, the integer mode can be used for testing of the algorithm, in the following interesting way. Take, for example, N = 1000, so vectors x0, . . . , x999 are allowed, and choose D = 2 dimensions so that the primes 2,3 are involved. Then call seed(701), which sets the variable x to be the vector

x701 = (757/1024, 719/729).

Now, calling random() exactly 9 times produces

x710 = (397/1024, 674/729),

and sure enough, we can test the integrity of the algorithm by going back and calling seed(710) to verify that starting over thus with seed value 701+9 gives precisely the x710 shown.

It is of interest that Algorithm 8.3.6 really is fast, at least in this sense: In practice, it tends to be faster even than calling a system’s built-in random-number function. And this advantage has meaning even outside the numerical-integration paradigm. When one really wants an equidistributed, random number in [0, 1), say, a system’s random function should certainly be considered, especially if the natural tendency for random samples to clump and separate is supposed to remain intact. But for many statistical studies, one simply wants some kind if irregular “coverage” of [0, 1), one might say a “fair” coverage that does not bias any particular subinterval, in which case such a fast qMC algorithm should be considered.

Now we may get a multidimensional integral by calling, in a very simple way, the procedures of Algorithm 8.3.6:

Algorithm 8.3.7 (qMC multidimensional integration). Given a dimension

D, and integrable function f : R → R, where R = [0, 1]D, this algorithm estimates the multidimensional integral

I = f (x) dDx,

x R

via the generation of N0 qMC vectors, starting with the n-th of a sequence

(x0, x1, . . . , xn, . . . , xn+N01, . . .). It is assumed that Algorithm 8.3.6 is initialized with an index bound N ≥ n + N0.

1. [Initialize via Algorithm 8.3.6]

seed(n); // Start the qMC process, to set a global x = xn.

I= 0;

2.[Perform qMC integration]

//Function random() updates a global qMC vector (Algorithm 8.3.6). for(0 ≤ j < N0) I = I + f (random());

return I/N0;

// An estimate for the integral.

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]