Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Discrete math with computers_3

.pdf
Скачиваний:
87
Добавлен:
16.03.2016
Размер:
2.29 Mб
Скачать

98

CHAPTER 5. TREES

inorder (reflect BinLeaf)

 

= inorder BinLeaf

{ reflect.1 }

= []

{ inorder.1 }

= reverse []

{ reverse.1 }

= reverse (inorder BinLeaf)

{ inorder.1 }

Induction case. Let t1, t2 :: BinTree a be arbitrary trees, and let x :: a be an arbitrary data value. Assume the inductive hypotheses:

inorder (reflect t1) = reverse (inorder t1) inorder (reflect t2) = reverse (inorder t2)

Then

inorder (reflect (BinNode x t1 t2))

={ reflect.2 }

inorder (BinNode x (reflect t2) (reflect t1))

={ inorder.2 }

inorder (reflect t2) ++ [x] ++ inorder (reflect t1) = { hypotheses }

reverse (inorder t2) ++ [x] ++ reverse (inorder t1) = { reverse lemma 1 }

reverse ([x] ++ inorder t2) ++ reverse (inorder t1) = { reverse lemma 2 }

reverse (inorder t1 ++ [x] ++ inorder t2) = { inorder.2 }

reverse (inorder (BinNode x t1 t2))

With the base case and induction cases proved, the theorem holds by tree induction.

5.4.3The Height of a Balanced Tree

If a binary tree is balanced then its shape is determined, and the number of nodes is determined by the height. The following theorem states this relationship precisely.

Theorem 31. Let h = height t. If balanced t, then size t = 2h 1.

Proof. The proposition we want to prove is balanced t → size t = 2h 1. The proof is an induction over the tree structure. For the base case, we need to prove that the theorem holds for a leaf.

balanced BinLeaf = True h = height BinLeaf = 0

size BinLeaf = 0 2h 1 = 0

5.4. INDUCTION ON TREES

99

For the inductive case, let t = Node x l r, and let hl = height l and hr = height r. Assume P (l) and P (r); the aim is to prove P (t). There are two cases to consider. If t is not balanced, then the implication balanced t → P (t) is vacuously true. If t is balanced, however, then the implication is true if and only if P (t) is true. Therefore we need to prove P (t) given the following three assumptions: (1) P (l) (inductive hypothesis), (2) P (r) (inductive hypothesis), and balanced t (premise of implication to be proved).

h = height (Node x l r)

{ height.2 }

= 1 + max (height l) (height r)

= 1 + height l

 

{ assumption }

= 1 + hl

 

 

 

{ def hl }

size t

 

 

 

 

 

 

{ def t }

 

= size (Node x l r)

 

= 1 +

 

hl

 

+ hr

{

size

.2 }

 

 

size l

 

size r

 

 

 

+ 2

 

hr1 + 2

1

{ hypothesis }

= 1hl

 

 

 

= 2hl

+ 2hl 1

 

{ arithmetic }

= 2

+hl

1

 

{

hl = hr

}

 

2

2

 

 

 

 

 

 

 

1

 

 

{ algebra }

= 2hl×+1

 

 

 

= 2h

 

 

1

 

 

{ algebra }

= 2

1

 

 

 

{ def h }

 

The base and inductive cases have been proved, so the theorem holds by the principle of tree induction.

5.4.4Length of a Flattened Tree

The following theorem says that if you flatten a tree, then the length of the resulting list is the same as the number of nodes in the tree.

Theorem 32. Let t :: BinTree a be any finite binary tree. Then length (inorder t) = size t.

Proof. The proof is a tree induction over t.

Base case.

length (inorder BinLeaf)

=length []

=0

=size BinLeaf

Induction case. Assume the induction hypotheses:

length (inorder t1) = size t1 length (inorder t2) = size t2

Then

100

CHAPTER 5. TREES

length (inorder (BinNode x t1 2t))

=length (inorder t1 ++ [x] ++ inorder t2)

=length (inorder t1) + length [x] + length (inorder t2)

=size t1 + 1 + size t2

=size (Node x t1 t2)

Therefore the theorem holds by tree induction.

5.5Improving Execution Time

A function definition consists of a type, along with a set of equations. These equations serve two purposes:

They give a mathematical specification of required properties. Since they are mathematical equations (not assignment statements), they can be used for formal proofs using equational reasoning.

They serve as an executable computer program. The computer executes the program by a sequence of substitutions and simplifications. In e ect, the Haskell compiler translates a set of equations into a machine language program that simplifies expressions using automated equational reasoning.

In general, inductive equations specifying properties of a function can be successfully interpreted to compute the value of the function, given a particular argument, provided the equations have three essential characteristics. The equations must be (1) consistent with properties of the function being defined,

(2) cover all relevant cases, and (3) supply simpler arguments to invocations of the function that appear on the right-hand side than were supplied on the left-hand side. What we mean by “simpler” is not easy to say in fully general terms. However, it amounts to making sure the amount of computation involved in each invocation of the function on the right-hand side of the equation is significantly smaller than the amount of computation required by the invocation on the left-hand side of the equation.

Consider, for example, the equations that define inorder:

inorder :: BinTree a -> [a] inorder BinLeaf = []

inorder (BinNode x t1 t2) = inorder t1 ++ [x] ++ inorder t2

The trees supplied as arguments in the invocations of inorder on the righthand side of equation {inorder.2} have heights that are at least one smaller than the height of the tree supplied as an argument. Because the height of a tree must be at least zero, and the height is reduced by at least one in each level of recursion, it eventually gets down to zero. Therefore the application inorder t will terminate, provided that the tree is finite.

5.5. IMPROVING EXECUTION TIME

101

It makes sense to ask questions about the resources required for the specified computation. For example, we might ask how much time it would take to compute inorder t for some tree t.

It is a delightful fact that we can derive equations to answer this question about time directly from the equations specifying properties of the function— that is, the equations that our computational interpretation uses to carry out the work. Let time e denote the number of steps in the computation represented by the formula e. Our aim is to learn something about time (inorder t) for an arbitrary finite tree t. We will just estimate the time, by counting the number of basic operations, but will not seek an exact and precise analysis.

Assume that time (inorder BinLeaf) is zero. This is a harmless simplifying assumption; if the argument to inorder is a leaf, the machine will take a small amount of time to notice that fact and return. In practice, using a modern Haskell compiler, only a few machine instructions are needed. Since the time is so small, we ignore it.

Executing the second equation of inorder requires two recursive applications, and a concatenation (++). Some time is also needed to set up the equation (noticing that the argument is a node, doing the recursive calls, and returning); this requires time 1 (where the exact measurement unit is unspecified; it depends on how e cient the compiler is, how fast your computer is, etc.)

A minor optimisation to the function is to replace [x] ++ inorder t2 by x : inorder t2. This is just a single equational reasoning step, and the compiler might even do this for us automatically. The time for the (:) operation is small (just a few machine instructions) and we will include that time in the one unit for the equation.

Before continuing, we need to know how many steps are involved in concatenation operations. This can be worked out from the equations in the definition of concatenation, using an analysis method like the one we are now using to work out the timing for inorder. For present purposes, we are going to skip

that analysis and just state the result, which is this:

 

time (xs ++ ys) = length xs

{ time (++) }

That is, the time to perform a concatenation is proportional to the length of the first argument, and completely independent of the second. To understand why, consider that the equations defining (++) never look inside ys, but they perform a linear traversal over xs.

Armed with this information, we can continue with the time analysis of inorder t, for an arbitrary tree t.

time BinLeaf = 0

time (inorder (BinNode x t1 t2))

=1 + time (inorder t1 ++ [x] ++ inorder t2)

=1 + time (inorder t1) + time (inorder t2)

102

CHAPTER 5. TREES

+length (inorder t1)

=1 + time (inorder t1) + time (inorder t2) + size t1

This result is not a simple formula that gives the execution time directly. Instead, it is a set of recursive equations. Such equations are commonly obtained in the performance analysis of algorithms, and they are known as recurrence equations. A recurrence equation is simply an inductive equation in which the values being equated are numbers.

There are many mathematical tricks that make it possible to solve various kinds of recurrence equations. To solve recurrence equations is to reduce them to equations that do not involve a recurrence; that is, equations in which none of the terms on the right-hand side refer to the function being defined on the left-hand side. Books on the analysis of algorithms often follow a systematic procedure: first, the algorithm is studied in order to derive a set of recurrence equations that describe its performance, and then a variety of mathematical techniques are used to solve the recurrence equation.

The study of solution methods for recurrence equations is a big topic, and we are not going to delve into it here. Instead, we will glean information from recurrence equations in ways that depend on circumstances. That is, we will rely on ad hoc analysis to derive information from recurrence equations.

In this case, it is the (size t1) term on the right-hand side of the recurrence equation that we want to focus on. Suppose that the tree is badly unbalanced in the sense that its left subtree contains all the nodes and its right subtree is empty. Furthermore, suppose this badly unbalanced condition persists all the way down to the leaves. That is, all the right subtrees are empty. In this case, the recurrence equations for time, height, and size specialize to the following form.

time (inorder (BinNode x t1 BinLeaf))

=1 + time (inorder t1) + time (inorder BinLeaf) + size t1

=1 + time (inorder t1) + 0 + size t1

=1 + time (inorder t1) + size t1

height (BinNode x t1 BinLeaf)

=1 + max (height t1) (height t2)

=1 + max (height t1) 0

=1 + height t1

size (BinNode x t1 BinLeaf)

=1 + size t1 + size BinLeaf

=1 + size t1

The equations for the height and size are identical, so we can conclude that, in the case where all the right-hand subtrees are empty, the height of a tree is the same as the number of nodes in the tree. Therefore, the number of recurrence steps for time needed to reach the empty tree case is just the number of nodes in the tree being flattened. At each deeper level, the size term

5.6. FLATTENING TREES IN LINEAR TIME

103

is one less than it was at the previous level. So, working down to the empty tree level amounts to adding all the integers starting with n = size t1 and ending with zero, plus the number of levels (since the number 1 is added at each level). Thus the result is

n + (n − 1) + . . . + 0.

n

As we proved already in Chapter 4, i=0 i = n(n + 1)/2. From these observations, we deduce the following formula for the time required to flatten a tree

in which all the right-hand subtrees are empty:

time(inorder (BinNode x t1 BinLeaf)) = n(n+1)/2 + n, where n = size t1

This result is not good. It means that the number of steps needed to flatten a tree is proportional to the square of the number of nodes in the tree. That is too much time. One might hope that the number of computational steps needed to flatten a tree would be proportional to the number of nodes in the tree. Of course, this is a very special case, because all of the right-hand subtrees are empty. But, the formulas suggest that whenever the tree tends to have most of its nodes in its left subtrees instead of its right subtrees, flattening is going to take a long time. In the next section, we will consider another set of inductive equations for the flattening function that lead to a flattening time proportional to the number of nodes in the tree.

5.6Flattening Trees in Linear Time

The reason that the inorder function is so slow is that it recopies lists repeatedly as it concatenates the partial results together. The trick for reducing the time required to flatten a tree is to accumulate the result-list in a collection of partial computations that permit pasting the results together directly, avoiding the expensive concatenations.

Without knowing the exact definition of the improved function, we can still write some equations that express properties it should have. The unknown function—call it g—will be similar to inorder, except it will take an extra argument ks of data values to be concatenated to the end of its result. That is, g will not simply return its result; it will return the concatenation of its result to a further list provided by some other source. This extra list is called the continuation.

g :: BinTree a -> [a] -> [a] g BinLeaf ks = ks

g (BinNode x t1 t2) ks = g t1 (x : g t2 ks)

These equations surely do not express the first properties of a flattening function that a person would think of. However, they do express properties

104

CHAPTER 5. TREES

that a person would expect a flattening function to have, and they avoid use of concatenation. Moreover, they have the three characteristics required to turn inductive equations into full specifications for a function (consistency, coverage of cases, and reduced computation on right-hand sides).

The question is, how long would it take a computing system interpreting these equations to produce a flattened version of a tree? We can use the equations defining g as a starting point to derive its execution time, just as we did for inorder in the preceding section.

Our hope is that the number of computation steps required to flatten a tree is on the order of the number of nodes in the tree. It turns out we can verify this conjecture using the principle of induction for trees. To be precise, we are trying to prove the validity of the following equation:

time (g t ks) = size tr

Now that we have guessed the equations for g, we need to verify that it actually works correctly. We conjecture the following theorem.

Theorem 33. Let t :: BinTree a be an arbitrary finite tree. Then g t ks = inorder t ++ ks.

Proof. Base case.

g BinLeaf ks

=ks

=[] ++ ks

=inorder BinLeaf ++ ks

Induction case. Assume that

g t1 ks1 = inorder t1 ++ ks1 g t2 ks2 = inorder t2 ++ ks2

Then

g (BinNode x t1 t2) ks

={ g.2 }

g t1 (x : g t2 ks)

={ hypothesis.2 }

g t1 (x : inorder t2 ++ ks)

={ (++).2 }

g (t1 ([x] ++ inorder t2 ++ ks))

={ hypothesis.1 }

inorder t1 ++ ([x] ++ inorder t2 ++ ks)

={ (++) associative }

(inorder t1 ++ [x] ++ inorder t2) ++ ks

By tree induction, the theorem holds.

5.6. FLATTENING TREES IN LINEAR TIME

105

To summarise the situation: we haven’t done a conventional optimisation of an ine cient program. Instead, we have conjectured that there should exist an e cient program, and we have guessed the form it should have. Finally, we proved that the more e cient program does indeed compute the correct answer.

Functions like g, which use a continuation, are common in practical applications. However, it is not exactly equivalent to inorder—its type is di erent! If this is a concern, we can always define a new function that hides the continuation:

inorderEfficient :: BinTree a -> [a] inorderEfficient t = g t []

Now that the new function has been shown to be correct, we should also try to verify our guess that it is e cient. Our aim now is to prove the following.

Theorem 34. Let t :: BinTree a be an arbitrary finite tree. Then time (g t ks) = size t.

Proof. Induction over the tree.

Base case.

time(g BinLeaf ks)

 

=

0

{ a reasonable assumption }

=

size BinLeaf

{ size.1 }

Inductive case. Assume

time (g t1 ks1) = size t1 time (g t2 ks2) = size t2

According to the assumption, time (g t1 ks1) = time (g t1 []); that is, the time depends only on the tree argument, but not the continuation. (This point is crucial, and is the fundamental reason why this algorithm is e cient.) Then

time (g (BinNode x t1 t2) ks)

=time (g t1 (x : g t2 ks))

=time (g t1 []) + 1 + time (g t2 [])

=size t1 + 1 + size t2

=size (BinNode x t1 t2)

So the theorem holds by the principle of tree induction.

In summary, we have verified that g is mathematically equivalent to inorder: it computes the same result, given an empty continuation. Furthermore, it requires time proportional to the number of nodes in the tree. Thus g can be used as a faster replacement for inorder.

106

CHAPTER 5. TREES

Programmers who use equations to specify software, rather than imperative procedures, must pay considerable attention to the form of the properties they specify for the functions they want their software to compute. Di erent properties of the equations lead to di erent computations, some of which are more e cient than others. We have been mostly concerned with correctness properties of functions, rather than resource utilization, but the same framework for reasoning can be used to analyze both correctness and resources. Serious software designers have to pay attention to both aspects of the programs they write.

Exercise 12. Write appendTree, a function that takes a binary tree and a list, and appends the contents of the tree (traversed from left to right) to the front of the list. For example,

appendTree (BinNode 2 (BinNode 1 BinLeaf BinLeaf) (BinNode 3 BinLeaf BinLeaf))

[4,5]

evaluates to [1,2,3,4,5]. Try to find an e cient solution that minimises recopying.

Chapter 6

Propositional Logic

Logic provides a powerful tool for reasoning correctly about mathematics, algorithms, and computers. It is used extensively throughout computer science, and you need to understand its basic concepts in order to study many of the more advanced subjects in computing. Here are just a few examples, spanning the entire range of computing applications, from practical commercial software to esoteric theory:

In software engineering, it is good practice to specify what a system should do before starting to code it. Logic is frequently used for software specifications.

In safety-critical applications, it is essential to establish that a program is correct. Conventional debugging isn’t enough—what we want is a proof of correctness. Formal logic is the foundation of program correctness proofs.

In information retrieval, including Web search engines, logical propositions are used to specify the properties that should (or should not) be present in a piece of information in order for it to be considered relevant.

In artificial intelligence, formal logic is sometimes used to simulate intelligent thought processes. People don’t do their ordinary reasoning using mathematical logic, but logic is a convenient tool for implementing certain forms of reasoning.

In digital circuit design and computer architecture, logic is the language used to describe the signal values that are produced by components. A common problem is that a first-draft circuit design written by an engineer is too slow, so it has to be transformed into an equivalent circuit that is more e cient. This process is often quite tricky, and logic provides the framework for doing it.

109

Соседние файлы в предмете Дискретная математика