Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Discrete math with computers_3

.pdf
Скачиваний:
84
Добавлен:
16.03.2016
Размер:
2.29 Mб
Скачать

12.5. SEARCH TREES AND OCCURRENCE OF KEYS

317

encountered, and the search cannot continue. The maximum number of stages in this search, for all possible search paths in the tree, is known as the height of the tree. AVL trees maintain a balance among the heights of subtrees, at all levels, as nodes are inserted and deleted. By maintaining balance, the AVL tree preserves a high ratio between number of folders it contains and the height of the tree.

In fact, the height of an AVL tree is approximately the base-2 logarithm of the number of folders it contains. That means every search will terminate within log2 n steps, where n is the number of folders stored in the tree.

If you want to get a feeling for just how ingenious the AVL solution is, try to find a way to insert and delete folders into a tree that maintains order (left subtrees have folders with smaller identifying numbers, right subtrees larger) and balance (left subtrees have about the same number of nodes as right subtrees, at all levels). To match the e ectiveness of AVL trees, your method will have to be able to insert or delete a folder in about log(n) steps, where n is the number of folders stored in the tree. After a few hours work, you’ll see why we call this method the “AVL miracle”.

12.5Search Trees and Occurrence of Keys

It’s a long road from here to the complete AVL solution. As usual, the road starts with formulas and equations that make the ideas amenable to mathematical reasoning. As a first step, we define a formal representation of a tree. The AVL method will be described in terms of a Haskell data type called

SearchTree.

To avoid unnecessary details, the definition of the SearchTree represents folder information generically. The folder contains any type of data, and there is a di erent type of search tree for each possible type of folder. The identifying number for a folder is an integer. Each node in the tree is either a leaf node (constructed by Nub) or an interior node (constructed by Cel) containing an identifying number, a folder, and two subtrees.

data SearchTree d = Nub |

Cel Integer d (SearchTree d) (SearchTree d)

Figure 12.1 shows a formula for a search tree in which the folders are strings. The figure also displays a conventional diagram of the tree the formula represents. The formula is the formal representation of the tree, and the diagram is an informal presentation. This chapter will rely extensively on diagrams to illustrate ideas expressed formally in terms of formulas. To understand the chapter, you will need to develop an ability to convert between diagrams and formulas.

So far, most of the terminology has been cast in terms of the original motivating example of identifying numbers and folders of information. In the usual search tree terminology, the identifying number on which the search is based

318 CHAPTER 12. THE AVL TREE MIRACLE

Cel 5120 "PDA Cam"

(Cel 1143 "Ink Jet" Nub Nub) (Cel 9605 "Palm Pilot" Nub Nub)

 

5120

 

"PDA Cam"

1143

9605

"Ink Jet"

"Palm Pilot"

Figure 12.1: Search Tree and Corresponding Diagram

is called the key, and the information associated with the key is called, simply, the data. Most of the following discussion will describe trees in these more commonly used terms.

Formal descriptions of properties of search trees and operations on them depend on subtrees, proper subtrees, concepts of equality between trees, and the occurrence of keys in trees. The following equations provide formal definitions of these predicates. Because the concepts are similar to the ideas of subset and being an element of a set, the usual symbols for those concepts are re-used here in this new context.

The definition of search-tree equality may seem strange because it ignores the data stored in the node. This is because the key is presumed to uniquely identify the data, so if two keys are the same, the data associated with them must be the same. There is no need to compare the data. This provides a subtle advantage: the data may be of a kind for which the equality operator is not defined. For example, it may be an aggregate including some functions, for which the equality operator cannot be defined.

tree equality

 

(==) :: SearchTree d SearchTree d Bool

{N == N }

Nub == Nub = True

(Cel k d lf rt) == Nub = False

{C == N }

Nub == (Cel k d lf rt) = False

{N == C}

(Cel x a xl xr) == (Cel y b yl yr)

{C == C}

= (x == y) (xl == yl) (xr == yr)

12.5. SEARCH TREES AND OCCURRENCE OF KEYS

319

subtree

 

 

 

 

 

 

 

 

( ) :: SearchTree d SearchTree d Bool

{N }

Nub s = True

 

 

 

 

 

(Cel k d lf rt)

Nub

=

False

 

{C N }

(Cel x a xl xr)

(Cel y b yl yr)

 

{C C}

= ((Cel x a xl xr) ==

(Cel y b yl yr))

((Cel x a xl xr)

yl) ((Cel x a xl xr)

yr)

 

proper subtree

 

 

 

 

 

 

 

 

( ) :: SearchTree d

SearchTree d Bool

{ N }

 

s Nub = False

 

 

lf) (s

rt)

 

s (Cel k d lf rt)

=

(s

{ C}

 

key occurs in tree

 

 

 

 

 

 

 

( ) :: Integer SearchTree d

Bool

 

 

{ N }

k Nub = False

 

 

 

x) (k

xl) (k xr)

k (Cel x a xl xr)

=

(k

==

{ C}

12.5.1Ordered Search Trees and Tree Induction

A search tree is ordered if the key in each non-leaf node is greater than all the keys that occur in the left subtree of the node and is less than all the keys that occur in the right subtree. A leaf is ordered by default. That is, the predicate ordered, on the domain of discourse consisting of all search trees, satisfies the following equations.

ordered search tree

ordered (Nub) = True

{ord N }

ordered (Cel k d lf rt)

{ord C}

= ( x lf. x < k) ( y rt. y > k)

ordered (lf) ordered (rt)

From the definition of the predicate ordered, it’s not a big step to guess that an ordered search tree cannot contain duplicate keys. However, saying exactly what that means turns out to be tricky. One approach is to define a function that extracts from a search tree a sequence containing all the data elements in the tree that are associated with a given key, and then to prove that the sequence contains exactly one element if the key occurs in an ordered tree. If the key doesn’t occur in the tree, the sequence is empty.

sequence of matching keys

320

CHAPTER 12.

THE AVL TREE MIRACLE

dataElems :: SearchTree d -> Integer -> [d]

dataElems Nub x = []

 

{dataElems N}

dataElems (Cel k d lf rt) x

{dataElems C}

= if k == x

then (dataElems lf x) ++ [d] ++ (dataElems rt x) else (dataElems lf x) ++ (dataElems rt x)

Theorem 81. (unique keys, part 1).

s.(ordered (s) k s) (length (dataElems s k) == 1)

Theorem 82. (unique keys, part 2).

s.(ordered (s) k s) ((dataElems s k) == [])

We will prove Theorem 82 first, and we will use a new form of induction that we will call tree induction.

Principle of Tree Induction.

( t. ( s t. P (s)) → P (t)) ( t. P (t))

Note: The domain of discourse of the for-alls is the set of all search trees. The principle of induction derives from basic elements of set theory, and all forms of inductive proof are equivalent when taken back to this basic framework. In practice, the form of induction used in a particular proof depends on the domain of discourse. Verifying the equivalence of various inductive forms to ordinary, mathematical induction on the natural numbers would require a

major digression into details of set theory.

Using the principle of tree induction, we can prove that a predicate is true for every search tree if we can prove a certain implication. The implication is this: if the predicate is true for every proper subtree of a particular, chosen tree, then it must also be true for the chosen tree. The implication must be proved for an arbitrarily chosen tree, but once this implication is proved, the principle of tree induction delivers the conclusion that the predicate is true for every search tree.

The statement of the principle of tree induction is identical to the statement of strong induction for the domain of natural numbers, except that where the relation “less than” appears in the principle of strong induction, the relation “proper subset” appears in the principle of tree induction. The two forms of induction share the implicit requirement that the predicate must be proved directly for the simplest element.

In the case of strong induction, the simplest element is zero. There are no natural numbers smaller than zero. Therefore, when the chosen element is zero, the universe of discourse for the for-all in the hypothesis of the implication ( s < 0. P (s)) is empty. A for-all over an empty universe of discourse is true by default, so for the case when the chosen element is zero, the implication to be proved is (True → P (0)). The hypothesis in this implication can be of no help in proving its conclusion.

12.5. SEARCH TREES AND OCCURRENCE OF KEYS

321

The same is true for tree induction. When the chosen element is a Nub, the universe of discourse for the hypothesis of the implication to be proved is empty. That is, we must prove (( s Nub. P (s)) → P (Nub)), which is the same as (True → P (Nub)). The hypothesis of this implication (namely, “true”) cannot help in arriving at its conclusion (namely, P (Nub)). We will use tree induction to prove many properties of software that operates on search trees, and a few properties of the search trees themselves. For starters, we use tree induction to prove Theorem 82.

Proof. of Theorem 82 (unique keys, part 2):

s.(ordered (s) k s) ((dataElems s k) == [])

Proof: by tree induction.

Base Case.

(ordered (Nub) k Nub) ((dataElems Nub k) == []) = {ord N, N }

(False False) ((dataElems Nub k) == []) = { idem}

False ((dataElems Nub k) == []) = {False → any = True}

True

Inductive Case.

First, we work with just the hypothesis of the implication we’re trying to

prove.

 

 

 

(ordered (Cel x a lf rt)

k

(Cel x a lf rt))

= { C}

¬(x

 

k k lf k rt))

(ordered (Cel x a lf rt)

=

= {DM }

(x = k)

(k lf) (k rt))

(ordered (Cel x a lf rt)

We are trying to prove that when the above formula is true, the formula in the conclusion of the theorem is also true. That is, we want to prove that the dataElems function delivers an empty sequence in this case.

dataElems (Cel x a lf rt) k = {dataElems C, x = k}

(dataElems lf k) ++ (dataElems rt k)

= {ord C, k lf, induction hypothesis, applied twice}

[] ++ [] = {++ []}

[]

322 CHAPTER 12. THE AVL TREE MIRACLE

The induction step in the proof occurred when we observed that with respect to the formula (dataElems lf k), the hypotheses of the theorem are true. That is, the tree lf is ordered (by the definition of ordered, since lf is a subtree of an ordered tree) and the key k does not occur in that tree. As lf is a proper subtree of the tree we started with, the principle of induction allows us to assume that the theorem is true for the tree lf. (Remember, induction doesn’t require a direct proof in the inductive case. It only requires that you prove an implication whose hypothesis is that the theorem is true for every proper subtree of the one you started with.) In this case, we apply the induction hypothesis twice: once for the left subtree and again for the right subtree.

Now, what about Theorem 81? Induction also provides the mechanism for its proof.

Proof. of Theorem 81 (unique keys, part 1), by tree induction.

Base Case.

 

(ordered (Nub)

k Nub) (length (dataElems Nub k) == 1)

= { N }

False) (length (dataElems Nub k) == 1)

(ordered (Nub)

= { null}

 

False (length (dataElems Nub k) == 1) = {False → any = True}

True

Inductive Case.

First, we work with just the hypothesis of the implication we’re trying to prove.

(ordered (Cel x a lf rt)

k (Cel x a lf rt))

= { C}

(x = k k lf k rt))

(ordered (Cel x a lf rt)

= { over }

(x = k))

(ordered (Cel x a lf rt)

(ordered (Cel x a lf rt)

k lf)

(ordered (Cel x a lf rt)

k rt)

We are trying to prove that when the above formula is true, the formula in the conclusion of the theorem is also true. That is, we want to prove that the dataElems function delivers a sequence with exactly one element in this case. The implication we are trying to verify has the following form: (a b c) → d, where d is the conclusion of the theorem (that is, d = (length (dataElems s k) == 1)), and a, b, and c are the terms in the

12.5. SEARCH TREES AND OCCURRENCE OF KEYS

323

above formula. For example, a = (ordered (Cel x a lf rt) (x

= k)).

Using the Boolean algebra of propositions, one can verify that

 

((a b c) → d) = ((a → d) (b → d) (c → d))

 

That is, the formula ((a b c) → d) can be verified by proving that each of the terms, (a → d), (b → d), and (c → d), is true.

Proof of (a → d):

(ordered (Cel x a lf rt) (x = k))

(length (dataElems (Cel x a lf rt) k) == 1)

Again, we work with the hypothesis of the implication first. Since the tree (Cel x a lf rt) is ordered, x lf and x rt. (All the keys in the left subtree must be smaller than x, and all those in the right subtree must be larger than x. Because x = k, we conclude that k lf and k rt.) These observations take us to the conclusion of the theorem by the following logic.

length (dataElems (Cel x a lf rt) k) = {dataElems C, x = k}

length ((dataElems lf k) ++ [a] ++ (dataElems rt k)) = {Thm 82, k lf}

length ([] ++[a] ++ (dataElems rt k))

={Thm 82, k rt} length ([] ++ [a] ++ [])

={++.1, ++[]} length ([a])

={Thm len}

1

It turns out that the induction hypothesis was not needed for the proof of (a → d). It will be needed for the other two proofs, however.

Proof of (b → d):

(ordered (Cel x a lf rt) k lf)

(length (dataElems (Cel x a lf rt) k) == 1)

Again, we work with the hypothesis of the implication first.

324

CHAPTER 12. THE AVL TREE MIRACLE

(ordered (Cel x a lf rt)

k lf)

→ {definition of ordered }

(ordered (Cel x a lf rt)

k lf k < x)

→ {def ord , since k < x k rt}

(ordered (Cel x a lf rt)

k lf k < x k rt)

→ {dataElems C, k < x} ((dataElems (Cel x a lf rt) k) =

(dataElems lf k) ++ (dataElems rt k))

(ordered (Cel x a lf rt) k lf k rt)

→ {Thm 82}

((dataElems (Cel x a lf rt) k) = (dataElems lf k) ++ [])

(ordered (Cel x a lf rt) k lf k rt)

→ {++ []}

((dataElems (Cel x a lf rt) k) = (dataElems lf k))(ordered (Cel x a lf rt) k lf k rt)

Now, because lf is a subtree of (Cel x a lf rt), k lf, and lf is ordered, the induction hypothesis leads to the desired conclusion.

length (dataElems (Cel x a lf rt) k)

=length (dataElems lf k)

=1 {induction hypothesis}

The proof of (c → d) is similar to the proof of (b → d), except that the induction goes down the right side of the tree instead of the left.

12.5.2Retrieving Data from a Search Tree

According to Theorem 81, a key that occurs in an ordered search tree occurs exactly once. It occurs as one of the parameters of a Cel constructor, and that constructor will also have a data item as a parameter. Retrieving data from an ordered search tree amounts to finding the Cel constructor where a specified key occurs, then delivering the data item from that same Cel constructor.

The retrieval operation needs a way to signal whether or not the specified key is present in the tree. In our implementation, this will be done using the Maybe data type, which has two constructors: Just and Nothing. The Just constructor will be used to deliver the data item associated with the given key in the tree. For example, (Just d) delivers the data item d.

If the key is not present, the Nothing constructor will be used to signal that it is missing. So, if the retrieval function delivers Nothing, the specified key is not present in the tree.

getItem :: SearchTree d -> Integer -> Maybe d getItem (Cel k d lf rt) x =

if x < k

then (getItem lf x)

else if x > k then (getItem rt x)

12.5. SEARCH TREES AND OCCURRENCE OF KEYS

325

else (Just d) getItem Nub x = Nothing

We can conclude that getItem works properly if we can prove that whenever the key specified in its second argument is present in the tree given in its first argument, then getItem delivers the data item associated with that key and that if the key is not present, getItem delivers Nothing. We will use tree induction to prove both of these facts.

Theorem 83. s.(ordered (s) k s) ((getItem s k) = (Just d)), where d is the data item parameter of the Cel constructor in s for which k is the key parameter.

Proof.

(k s) (s = (Cel x a lf rt)) ((k = x) k lf k rt)

{ C}

Now we are in the same situation as in the proof of Theorem 81 (not surprising, as the hypotheses of Theorem 83 are the same as those of Theorem 81). We want to prove an implication whose hypothesis is a three-way disjunction. We follow the same strategy: separate the implication into a conjunction of three implications, and prove each of them separately.

1.Proof of

(ordered (Cel x a lf rt) k (Cel x a lf rt) (k = x)) ((getItem (Cel x a lf rt) k) = (Just d))

where d is the data item parameter of the Cel constructor in the tree (Cel x a lf rt) for which k is the key parameter.

(getItem (Cel x a lf rt) k

= {k = x}

(getItem (Cel k a lf rt) k = {getItem C}

(Just a)

Since a is the data item parameter of the Cel constructor in the tree (Cel x a lf rt) for which k = x is the key parameter, the desired conclusion has been reached.

2.Proof of

(ordered (Cel x a lf rt) k (Cel x a lf rt) (k lf)) ((getItem (Cel x a lf rt) k) = (Just d))

where d is the data item parameter of the Cel constructor in the tree (Cel x a lf rt) for which k is the key parameter.

326

CHAPTER 12. THE AVL TREE MIRACLE

(ordered (Cel x a lf rt)

k (Cel x a lf rt)

(k lf))

= {def ordered }

k (Cel x a lf rt)

(k < x))

(ordered (Cel x a lf rt)

→ {getItem C}

 

 

 

 

((getItem (Cel x a lf rt) k)

=

(getItem lf k))

= {induction hypothesis}

 

 

 

((getItem (Cel x a lf rt) k)

=

(Just d))

 

where d is the data item parameter of the Cel constructor in the tree (Cel x a lf rt) for which k is the key parameter.

3.The proof of the third implication required by the three-part disjunction is like this last proof, except that the induction goes down the right-hand side of the tree instead of the left.

12.5.3Search Time in the Equational Model

We can use the equations for getItem as a prescription for computation. To compute the value represented by a formula involving getItem, we simply scan the formula for subformulas that match the left-hand side of one of the equations. When we find a match, we replace the subformula by the right-hand side of the equation (with appropriate substitutions for the parameters), and we continue this procedure until no subformulas match the left-hand side of either equation. This procedure of repeated substitution for subformulas is the equational model of computation.

If we want to figure out how many computational steps the equational model of computation requires to deliver the value of a formula, we count the process of matching a formula with an equation as a single computational step. We also count any use of an intrinsic operator, such as logical and ( ), if-then-else selection, the Just and Nothing constructors, and the like, as computational steps.

Consider the computation of the formula (getItem s k). At each step, the equations either deliver the value directly, requiring only the computational step of matching the formula with the second equation (this occurs if s is Nub), or the formula is replaced by an if-then-else selection. In the latter case, the process of matching the formula with the first equation counts as one computational step, and the if-then-else selection counts as another computational step.

In the case in which the first equation is the one that matches, the tree s must have the form (Cel x a lf rt), and the matching subformula is replaced by the if-then-else selection on the right-hand side of the equation. Where the computation goes after this substitution depends on the result of the test in the if-then-else selection. If k, the key specified in (getItem s k), is less than x (the key in (Cel x a lf rt), which constructs the tree s), the formula (getItem s k) is replaced by (getItem lf k), and the computation proceeds from there by

Соседние файлы в предмете Дискретная математика