Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Algorithms and data structures

.pdf
Скачиваний:
43
Добавлен:
16.03.2015
Размер:
4.87 Mб
Скачать

A.2 Mathematical Concepts

265

concave: a function f is concave on an interval [a, b] if

x, y [a, b] : t [0, 1] : f (tx + (1 t)y) t f (x) + (1 t) f (y).

convex: a function f is convex on an interval [a, b] if

x, y [a, b] : t [0, 1] : f (tx + (1 t)y) t f (x) + (1 t) f (y).

equivalence relation: a transitive, reflexive, symmetric relation.

field: a set of elements that support addition, subtraction, multiplication, and division by nonzero elements. Addition and multiplication are associative and commutative, and have neutral elements analogous to zero and one for the real numbers. The prime examples are R, the real numbers; Q, the rational numbers; and Zp, the integers modulo a prime p.

iff: abbreviation for “if and only if”.

lexicographic order: the canonical way of extending a total order on a set of elements to tuples, strings, or sequences over that set. We have a1, a2, . . . , ak <

b1, b2, . . . , bk if and only if a1 < b1 or a1 = b1 and a2, . . . , ak < b2, . . . , bk .

linear order: a reflexive, transitive, weakly antisymmetric relation.

median: an element with rank n/2 among n elements.

multiplicative inverse: if an object x is multiplied by a multiplicative inverse x1 of x, we obtain x · x1 = 1 – the neutral element of multiplication. In particular, in a field, every element except zero (the neutral element of addition) has a unique multiplicative inverse.

prime number: an integer n, n 2, is a prime iff there are no integers a, b > 1 such that n = a · b.

rank: a one-to-one mapping r : S 1..n is a ranking function for the elements of a set S = {e1, . . . , en} if r(x) < r(y) whenever x < y.

reflexive: a relation A × A is reflexive if a A : (a, a) R.

relation: a set of pairs R. Often, we write relations as operators; for example, if is a relation, a b means (a, b) .

symmetric relation: a relation is symmetric if for all a and b, a b implies b a.

total order: a reflexive, transitive, antisymmetric relation.

transitive: a relation is transitive if for all a, b, c, a b and b c imply a c.

weakly antisymmetric: a relation is weakly antisymmetric if for all a, b, a b or b a. If a b and b a, we write a b. Otherwise, we write a < b or b < a.

266 A Appendix

A.3 Basic Probability Theory

Probability theory rests on the concept of a sample space S . For example, to describe the rolls of two dice, we would use the 36-element sample space {1, . . . , 6} × {1, . . . , 6}, i.e., the elements of the sample space are the pairs (x, y) with 1 x, y 6 and x, y N. Generally, a sample space is any set. In this book, all sample spaces are finite. In a random experiment, any element of s S is chosen with some elementary probability ps, where ∑s S ps = 1. A sample space together with a probability distribution is called a probability space. In this book, we use uniform probabilities almost exclusively; in this case ps = p = 1/|S |. Subsets E of the sample space are called events. The probability of an event E S is the sum of the probabilities of its elements, i.e., prob(E ) = |E |/|S | in the uniform case. So the probability of the event {(x, y) : x + y = 7} = {(1, 6), (2, 5), . . . , (6, 1)} is equal to 6/36 = 1/6, and the probability of the event {(x, y) : x + y 8} is equal to 15/36 = 5/12.

A random variable is a mapping from the sample space to the real numbers. Random variables are usually denoted by capital letters to distinguish them from plain values. For example, the random variable X could give the number shown by the first die, the random variable Y could give the number shown by the second die, and the random variable S could give the sum of the two numbers. Formally, if (x, y) S , then X((x, y)) = x, Y ((x, y)) = y, and S((x, y)) = x + y = X((x, y)) + Y ((x, y)).

We can define new random variables as expressions involving other random variables and ordinary values. For example, if V and W are random variables, then

(V + W )(s) = V (s) + W (s), (V ·W )(s) = V (s) ·W (s), and (V + 3)(s) = V (s) + 3. Events are often specified by predicates involving random variables. For exam-

ple, X 2 denotes the event {(1, y), (2, y) : 1 y 6}, and hence prob(X 2) = 1/3. Similarly, prob(X + Y = 11) = prob({(5, 6), (6, 5)}) = 1/18.

Indicator random variables are random variables that take only the values zero and one. Indicator variables are an extremely useful tool for the probabilistic analysis of algorithms because they allow us to encode the behavior of complex algorithms into simple mathematical objects. We frequently use the letters I and J for indicator variables.

The expected value of a random variable Z : S → R is

 

E[Z] = ps · Z(s) = z · prob(Z = z) ,

(A.1)

s S

z R

 

i.e., every sample s contributes the value of Z at s times its probability. Alternatively, we can group all s with Z(s) = z into the event Z = z and then sum over the z R.

In our example, E[X] = (1 + 2 + 3 + 4 + 5 + 6)/6 = 21/6 = 3.5, i.e., the expected value of the first die is 3.5. Of course, the expected value of the second die is also 3.5. For an indicator random variable I, we have

E[I] = 0 · prob(I = 0) + 1 · prob(I = 1) = prob(I = 1) .

Often, we are interested in the expectation of a random variable that is defined in terms of other random variables. This is easy for sums owing to the linearity of

A.3 Basic Probability Theory

267

expectations of random variables: for any two random variables V and W ,

E[V + W ] = E[V ] + E[W ] .

(A.2)

This equation is easy to prove and extremely useful. Let us prove it. It amounts essentially to an application of the distributive law of arithmetic. We have

E[V + W ] = ps · (V (s) + W (s))

s S

= ps ·V (s) + ps ·W (s)

s S

s S

= E[V ] + E[W ] .

 

As our first application, let us compute the expected sum of two dice. We have

E[S] = E[X + Y ] = E[X] + E[Y ] = 3.5 + 3.5 = 7 .

Observe that we obtain the result with almost no computation. Without knowing about the linearity of expectations, we would have to go through a tedious calculation:

E[S] = 2 · 361 + 3 · 362 + 4 · 363 + 5 · 364 + 6 · 365 + 7 · 366 + 8 · 365 + 9 · 364 + . . . + 12 · 361

= 2 · 1 + 3 · 2 + 4 · 3 + 5 · 4 + 6 · 5 + 7 · 6 + 8 · 5 + . . . + 12 · 1 = 7 . 36

Exercise A.1. What is the expected sum of three dice?

We shall now give another example with a more complex sample space. We consider the experiment of throwing n balls into m bins. The balls are thrown at random and distinct balls do not influence each other. Formally, our sample space is the set of all functions f from 1..n to 1..m. This sample space has size mn, and f (i), 1 i n, indicates the bin into which the ball i is thrown. All elements of the sample space are equally likely. How many balls should we expect in bin 1? We use I to denote the number of balls in bin 1. To determine E[I], we introduce indicator variables Ii, 1 i n. The variable Ii is 1, if ball i is thrown into bin 1, and is 0 otherwise. Formally, Ii( f ) = 0 iff f (i) = 1. Then I = i Ii. We have

E[I] = E[Ii] = E[Ii] = prob(Ii = 1) ,

i

i

i

where the first equality is the linearity of expectations and the second equality follows from the Ii’s being indicator variables. It remains to determine the probability that Ii = 1. Since the balls are thrown at random, ball i ends up in any bin1 with the same probability. Thus prob(Ii = 1) = 1/m, and hence

E[I] = prob(Ii = 1) =

1

=

n

.

m

 

i

i

 

m

 

 

 

 

1 Formally, there are exactly mn1 functions f with f (i) = 1.

268 A Appendix

Products of random variables behave differently. In general, we have E[X ·Y ] = E[X] · E[Y ]. There is one important exception: if X and Y are independent, equality holds. Random variables X1, . . . , Xk are independent if and only if

x1, . . . , xk : prob(X1 = x1 ··· Xk = xk) = prob(Xi = xi) .

(A.3)

1ik

 

As an example, when we roll two dice, the value of the first die and the value of the second die are independent random variables. However, the value of the first die and the sum of the two dice are not independent random variables.

Exercise A.2. Let I and J be independent indicator variables and let X = (I + J) mod 2, i.e., X is one iff I and J are different. Show that I and X are independent, but that I, J, and X are dependent.

Assume now that X and Y are independent. Then

E[X]

E[Y ] =

x

prob(X = x)

y

prob(X = y)

·

x ·

 

·

y ·

 

 

= x · y · prob(X = x) · prob(X = y)

 

x,y

 

 

 

 

 

= x · y · prob(X = x Y = y)

 

 

x,y

 

 

 

 

 

=

z · prob(X = x Y = y)

 

z

x,y with z=x·y

 

 

 

 

= z ·

prob(X = x Y = y)

 

z

x,y with z=x·y

 

 

 

 

 

 

 

= z · prob(X ·Y = z)

 

 

 

z

 

 

 

 

 

 

= E[X ·Y ] .

 

 

 

How likely is it that a random variable will deviate substantially from its expected value? Markov’s inequality gives a useful bound. Let X be a nonnegative random

variable and let c be any constant. Then

 

1

 

prob(X c · E[X])

 

.

(A.4)

c

The proof is simple. We have

E[X] = z · prob(X = z)

z R

z · prob(X = z)

zc·E[X]

c · E[X] · prob(X c · E[X]) ,

A.4 Useful Formulae

269

where the first inequality follows from the fact that we sum over a subset of the possible values and X is nonnegative, and the second inequality follows from the fact that the sum in the second line ranges only over z such that z cE[X].

Much tighter bounds are possible for some special cases of random variables. The following situation arises several times, in the book. We have a sum X = X1 +

··· + Xn of n independent indicator random variables X1,. . . , Xn and want to bound the probability that X deviates substantially from its expected value. In this situation,

the following variant of the Chernoff bound is useful. For any ε > 0, we have

 

prob(X < (1

− ε )E[X]) e−ε 2E[X]/2 ,

 

E[X] .

(A.5)

prob(X > (1

+ ε )E[X])

 

eε

 

 

(A.6)

(1 + ε )

(1+

)

 

 

ε

 

 

 

A bound of the form above is called a tail bound because it estimates the “tail” of the probability distribution, i.e., the part for which X deviates considerably from its expected value.

Let us see an example. If we throw n coins and let Xi be the indicator variable

for the i-th coin coming up heads, X = X1 + ··· + Xn is the total number of heads. Clearly, E[X] = n/2. The bound above tells us that prob(X (1 −ε )n/2) e−ε 2n/4. In particular, for ε = 0.1, we have prob(X 0.9·n/2) e0.01·n/4. So, for n = 10 000,

the expected number of heads is 5 000 and the probability that the sum is less than 4 500 is smaller than e25, a very small number.

Exercise A.3. Estimate the probability that X in the above example is larger than 5 050.

If the indicator random variables are independent and identically distributed with prob(Xi = 1) = p, X is binomially distributed, i.e.,

prob(X = k) =

k pk(1 p)(nk) .

(A.7)

 

n

 

Exercise A.4 (balls and bins continued). in bin 1. Show

prob(I = k) =

n k

Let, as above, I denote the number of balls

m

 

1 m

(n

,

1

k

 

1

 

 

k)

 

 

 

 

 

and then attempt to compute E[I] as ∑k prob(I = k)k.

A.4 Useful Formulae

We shall first list some useful formulae and then prove some of them.

270 A Appendix

A simple approximation to the factorial:

n n

n! nn.

e

Stirling’s approximation to the factorial:

n! =

1 + O n

2π n

e

.

 

1

 

 

 

n

n

An approximation to the binomial coefficients:

k

 

n

k·

e

.

n

 

 

 

k

The sum of the first n integers:

 

 

 

 

 

 

 

n

 

n(n +

1)

 

i =

.

 

 

i=1

 

 

 

2

 

 

 

 

 

 

 

 

 

 

The harmonic numbers:

(A.8)

(A.9)

(A.10)

(A.11)

 

 

 

 

 

 

 

n

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ln n Hn =

 

ln n + 1.

 

 

(A.12)

 

 

 

 

 

 

 

i

 

 

 

 

 

 

 

 

 

i=1

 

 

 

 

 

 

 

 

 

 

 

 

The geometric series:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n1 qi =

1 qn

 

for q = 1 and

qi =

 

 

1

 

for 0

q < 1.

(A.13)

 

 

1

 

 

 

1

q

 

 

 

 

 

q

 

 

 

i=0

 

 

 

 

i0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2i = 2 and

i · 2i

= i · 2i = 2.

 

(A.14)

 

 

 

 

i0

 

i0

 

 

 

 

 

 

i1

 

 

 

Jensen’s inequality:

n

 

 

 

 

in=1 xi

 

 

 

 

 

 

 

 

 

f (xi) n · f

 

 

n

 

 

 

(A.15)

 

 

 

 

 

 

i=1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

for any concave function f . Similarly, for any convex function f ,

 

 

 

 

 

 

 

n

 

 

 

 

in=1 xi

 

 

 

 

 

 

 

 

 

f (xi) n · f

 

 

n

.

 

 

(A.16)

 

 

 

 

 

 

i=1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A.4

Useful Formulae

271

A.4.1 Proofs

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Fori

(A.8), we first observe that n! = n(n 1) ···1 nn. Also, for all i 2, ln i

i1 ln x dx, and therefore

 

 

 

n

 

 

 

 

 

 

x=n

 

 

 

 

 

 

 

 

ln n! = ln i

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ln x dx = x(ln x

1)

 

x=1

n(ln n

1) .

 

 

2

i

n

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Thus

 

 

 

 

 

 

 

 

 

 

 

 

n

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n! en(ln n1) = (eln n1)n =

 

 

.

 

 

 

 

 

 

 

 

 

 

 

e

 

 

 

 

 

 

 

 

Equation (A.10) follows almost immediately from (A.8). We have

 

 

n

 

 

n n

 

1

)

(n

 

k + 1)

 

nk

 

 

 

 

n

e

 

k

 

 

 

 

k =

(

 

···

 

(k/e)k =

·

 

 

 

.

 

 

 

 

 

 

 

k!

 

 

k

 

 

 

 

Equation (A.11) can be computed by a simple trick:

1

1 + 2 + . . . + n = 2 ((1 + 2 + . . . + n 1 + n) + (n + n 1 + . . . + 2 + 1))

1

= 2 ((n + 1) + (2 + n 1) + . . . + (n 1 + 2) + (n + 1))

= n(n + 1) .

2

The sums of higher powers are estimated easily; exact summation formulae are also available. For example, ii1 x2 dx i2 ii+1 x2 dx, and hence

 

 

 

 

 

 

 

 

 

 

n+1

 

 

 

 

 

x

3

 

x=n+1

 

 

3

 

 

 

i2

 

 

 

 

 

x2 dx =

 

 

 

 

 

 

=

(n + 1)

1

 

1

 

3

x=1

 

 

3

 

i

n

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

and

 

 

 

 

 

 

 

 

 

 

 

 

n

 

 

 

 

 

 

x

3

 

x=n

 

n

3

 

 

 

 

 

 

 

i2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

x2 dx =

 

 

 

x=0 =

 

.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

3

 

 

 

 

 

 

 

i

n

 

 

0

 

 

 

 

 

 

 

 

3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

i

For (A.12), we also use estimation by integral. We have ii+1(1/x) dx 1/i

i1(1/x) dx, and hence

n 1

 

 

 

1

 

 

 

 

 

 

 

 

n 1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ln n = 1

 

dx

 

 

 

1 + 1

 

dx = 1 + ln n .

 

x

 

 

i

x

 

 

 

 

 

 

 

 

 

 

 

 

1in

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Equation (A.13) follows from

(1 q) · qi = qi qi = 1 qn .

0in1 0in1 1in

272 A Appendix

Letting n pass to infinity yields ∑i0 qi = 1/(1 q) for 0 q < 1. For q = 1/2, we obtain ∑i0 2i = 2. Also,

i · 2i = 2i + 2i + 2i + . . .

i1 i1 i2 i3

= (1 + 1/2 + 1/4 + 1/8 + . . .) · 2i

i1

= 2 · 1 = 2 .

For the first equality, observe that the term 2i occurs in exactly the first i sums of the right-hand side.

Equation (A.15) can be shown by induction on n. For n = 1, there is nothing

to show. So assume n 2. Let x = 1in xi/n and x¯ = 1in1 xi/(n 1). Then x = ((n 1)x¯ + xn)/n, and hence

f (xi) = f (xn) + f (xi)

1in

 

 

 

1in1

 

·

n ·

 

 

n ·

 

 

 

n

 

 

·

 

n

 

 

f

(x

) + (n

 

1)

 

f (x¯) = n

 

1

 

f (x

) +

n 1

 

f (x¯)

 

 

 

 

 

 

 

n

·

f (x ) ,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

where the first inequality uses the induction hypothesis and the second inequality uses the definition of concavity with x = xn, y = x¯, and t = 1/n. The extension to convex functions is immediate, since convexity of f implies concavity of f .

References

[1]E. H. L. Aarts and J. Korst. Simulated Annealing and Boltzmann Machines. Wiley, 1989.

[2]J. Abello, A. Buchsbaum, and J. Westbrook. A functional approach to external graph algorithms. Algorithmica, 32(3):437–458, 2002.

[3]W. Ackermann. Zum hilbertschen Aufbau der reellen Zahlen. Mathematische Annalen, 99:118–133, 1928.

[4]G. M. Adel’son-Vel’skii and E. M. Landis. An algorithm for the organization of information. Soviet Mathematics Doklady, 3:1259–1263, 1962.

[5]A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116–1127, 1988.

[6]A. V. Aho, J. E. Hopcroft, and J. D. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974.

[7]A. V. Aho, B. W. Kernighan, and P. J. Weinberger. The AWK Programming Language. Addison-Wesley, 1988.

[8]R. K. Ahuja, R. L. Magnanti, and J. B. Orlin. Network Flows. Prentice Hall, 1993.

[9]R. K. Ahuja, K. Mehlhorn, J. B. Orlin, and R. E. Tarjan. Faster algorithms for the shortest path problem. Journal of the ACM, 3(2):213–223, 1990.

[10]N. Alon, M. Dietzfelbinger, P. B. Miltersen, E. Petrank, and E. Tardos. Linear hash functions. Journal of the ACM, 46(5):667–683, 1999.

[11]A. Andersson, T. Hagerup, S. Nilsson, and R. Raman Sorting in linear time?

Journal of Computer and System Sciences, 57(1):74–93, 1998.

[12]F. Annexstein, M. Baumslag, and A. Rosenberg. Group action graphs and parallel architectures. SIAM Journal on Computing, 19(3):544–569, 1990.

[13]D. L. Applegate, R. E. Bixby, V. Chvátal, and W. J. Cook. The Traveling Salesman Problem: A Computational Study. Princeton University Press, 2007.

[14]G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela, and M. Protasi. Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer, 1999.

[15]H. Bast, S. Funke, P. Sanders, and D. Schultes. Fast routing in road networks with transit nodes. Science, 316(5824):566, 2007.

274References

[16]R. Bayer and E. M. McCreight. Organization and maintenance of large ordered indexes. Acta Informatica, 1(3):173–189, 1972.

[17]R. Beier and B. Vöcking. Random knapsack in expected polynomial time.

Journal of Computer and System Sciences, 69(3):306–329, 2004.

[18]R. Bellman. On a routing problem. Quarterly of Applied Mathematics, 16(1):87–90, 1958.

[19]M. A. Bender, E. D. Demaine, and M. Farach-Colton. Cache-oblivious B- trees. In 41st Annual Symposium on Foundations of Computer Science, pages 399–409, 2000.

[20]J. L. Bentley and M. D. McIlroy. Engineering a sort function. Software Practice and Experience, 23(11):1249–1265, 1993.

[21]J. L. Bentley and T. A. Ottmann. Algorithms for reporting and counting geometric intersections. IEEE Transactions on Computers, pages 643–647, 1979.

[22]J. L. Bentley and R. Sedgewick. Fast algorithms for sorting and searching strings. In 8th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 360–369, 1997.

[23]D. Bertsimas and J. N. Tsitsiklis. Introduction to Linear Optimization. Athena Scientific, 1997.

[24]G. E. Blelloch, C. E. Leiserson, B. M. Maggs, C. G. Plaxton, S. J. Smith, and M. Zagha. A comparison of sorting algorithms for the connection machine CM-2. In 3rd ACM Symposium on Parallel Algorithms and Architectures, pages 3–16, 1991.

[25]M. Blum, R. W. Floyd, V. R. Pratt, R. L. Rivest, and R. E. Tarjan. Time bounds for selection. Journal of Computer and System Sciences, 7(4):448, 1972.

[26]N. Blum and K. Mehlhorn. On the average number of rebalancing operations in weight-balanced trees. Theoretical Computer Science, 11:303–320, 1980.

[27]Boost.org. Boost C++ Libraries. www.boost.org.

[28]O. Boruvka. O jistém problému minimálním. Pràce, Moravské Prirodovedecké Spolecnosti, pages 1–58, 1926.

[29]F. C. Botelho, R. Pagh, and N. Ziviani. Simple and space-efficient minimal perfect hash functions. In 10th Workshop on Algorithms and Data Structures, volume 4619 of Lecture Notes in Computer Science, pages 139–150. Springer, 2007.

[30]G. S. Brodal. Worst-case efficient priority queues. In 7th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 52–58, 1996.

[31]G. S. Brodal and J. Katajainen. Worst-case efficient external-memory priority queues. In 6th Scandinavian Workshop on Algorithm Theory, volume 1432 of Lecture Notes in Computer Science, pages 107–118. Springer, 1998.

[32]M. R. Brown and R. E. Tarjan. Design and analysis of a data structure for representing sorted lists. SIAM Journal of Computing, 9:594–614, 1980.

[33]R. Brown. Calendar queues: A fast O(1) priority queue implementation for the simulation event set problem. Communications of the ACM, 31(10):1220– 1227, 1988.

[34]J. L. Carter and M. N. Wegman. Universal classes of hash functions. Journal of Computer and System Sciences, 18(2):143–154, Apr. 1979.