Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Data-Structures-And-Algorithms-Alfred-V-Aho

.pdf
Скачиваний:
123
Добавлен:
09.04.2015
Размер:
6.91 Mб
Скачать

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/images/fig2_17.gif

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/images/fig2_17.gif [1.7.2001 19:02:30]

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/images/fig2_20.gif

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/images/fig2_20.gif [1.7.2001 19:02:46]

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/images/fig2_21.gif

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/images/fig2_21.gif [1.7.2001 19:02:55]

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/images/fig2_27.gif

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/images/fig2_27.gif [1.7.2001 19:03:22]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

Basic Operations on Sets

The set is the basic structure underlying all of mathematics. In algorithm design, sets are used as the basis of many important abstract data types, and many techniques have been developed for implementing set-based abstract data types. In this chapter we review the basic operations on sets and introduce some simple implementations for sets. We present the "dictionary" and "priority queue," two abstract data types based on the set model. Implementations for these abstract data types are covered in this and the next chapter.

4.1 Introduction to Sets

A set is a collection of members (or elements); each member of a set either is itself a set or is a primitive element called an atom. All members of a set are different, which means no set can contain two copies of the same element.

When used as tools in algorithm and data structure design, atoms usually are integers, characters, or strings, and all elements in any one set are usually of the same type. We shall often assume that atoms are linearly ordered by a relation, usually denoted "<" and read "less than" or "precedes." A linear order < on a set S satisfies two properties:

1.For any a and b in S, exactly one of a < b, a = b, or b < a is true.

2.For all a, b, and c in S, if a < b and b < c, then a < c (transitivity).

Integers, reals, characters, and character strings have a natural linear ordering for which < is used in Pascal. A linear ordering can be defined on objects that consist of sets of ordered objects. We leave as an exercise how one develops such an ordering. For example, one question to be answered in constructing a linear order for a set of integers is whether the set consisting of integers 1 and 4 should be regarded as being less than or greater than the set consisting of 2 and 3.

Set Notation

A set of atoms is generally exhibited by putting curly brackets around its members, so {1, 4} denotes the set whose only members are 1 and 4. We should bear in mind that a set is not a list, even though we represent sets in this manner as if they were lists. The order in which the elements of a set are listed is irrelevant, and we could just as well have written {4, 1} in place of {1, 4}. Note also that in a set each element

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (1 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

appears exactly once, so {1, 4, 1} is not a set.

Sometimes we represent sets by set formers, which are expressions of the form

{ x | statement about x }

where the statement about x is a predicate that tells us exactly what is needed for an arbitrary object x to be in the set. For example, {x | x is a positive integer and x £ 1000} is another way of representing {1, 2, . . . , 1000}, and {x | for some integer y, x = y2} denotes the set of perfect squares. Note that the set of perfect squares is infinite and cannot be represented by listing its members.

The fundamental relationship of set theory is membership, which is denoted by the symbol Î. That is, x Î A means that element x is a member of set A; the element x could be an atom or another set, but A cannot be an atom. We use x Ï A for "x is not a member of A." There is a special set, denoted Ø and called the null set or empty set, that has no members. Note that Ø is a set, not an atom, even though the set Ø does not have any members. The distinction is that x Î Ø is false for every x, whereas if y is an atom, then x Î y doesn't even make sense; it is syntactically meaningless rather than false.

We say set A is included (or contained) in set B, written A Í B, or B Ê A, if every member of A is also a member of B. We also say A is a subset of B and B is a superset of A, if A Í B. For example, {1, 2} Í {1, 2, 3}, but {1, 2, 3} is not a subset of {1, 2} since 3 is a member of the former but not the latter. Every set is included in itself, and the empty set is included in every set. Two sets are equal if each is included in the other, that is, if their members are the same. Set A is a proper subset or proper superset of set B if A ¹ B, and A Í B or A Ê B, respectively.

The most basic operations on sets are union, intersection, and difference. If A and B are sets, then A È B, the union of A and B, is the set of elements that are members of A or B or both. The intersection of A and B, written A Ç B, is the set of elements in both A and B, and the difference, A - B, is the set of elements in A that are not in B. For example, if A = {a, b, c} and B = {b, d}, then A È B = {a, b, c, d}, A Ç B = {b}, and A - B = {a, c}.

Abstract Data Types Based on Sets

We shall consider ADT's that incorporate a variety of set operations. Some collections of these operations have been given special names and have special implementations of high efficiency. Some of the more common set operations are the

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (2 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

following.

1.-3. The three procedures UNION(A, B, C), INTERSECTION(A, B, C), and

DIFFERENCE(A, B, C) take set-valued arguments A and B, and assign the result, A È B, A Ç B, or A - B, respectively, to the set variable C.

4.We shall sometimes use an operation called merge, or disjoint set union, that is no different from union, but that assumes its operands are disjoint (have no

members in common). The procedure MERGE(A, B, C) assigns to the set variable C the value A È B, but is not defined if A Ç B ¹ Ø, i.e., if A and B are

not disjoint.

5.The function MEMBER(x, A) takes set A and object x, whose type is the type of elements of A, and returns a boolean value -- true if x Î A and false if x Ï A.

6.The procedure MAKENULL(A) makes the null set be the value for set variable A.

7.The procedure INSERT(x, A), where A is a set-valued variable, and x is an

element of the type of A's members, makes x a member of A. That is, the new value of A is A È {x}. Note that if x is already a member of A, then INSERT(x,

A) does not change A.

8.DELETE(x, A) removes x from A, i.e., A is replaced by A - {x}. If x is not in A originally, DELETE(x, A) does not change A.

9.ASSIGN(A, B) sets the value of set variable A to be equal to the value of set variable B.

10.The function MIN(A) returns the least element in set A. This operation may be applied only when the members of the parameter set are linearly ordered. For example, MIN({2, 3, 1}) = 1 and MIN({'a','b','c'}) = 'a'. We also use a function MAX with the obvious meaning.

11.EQUAL(A, B) is a function whose value is true if and only if sets A and B consist of the same elements.

12.The function FIND(x) operates in an environment where there is a collection of disjoint sets. FIND(x) returns the name of the (unique) set of which x is a member.

4.2An ADT with Union, Intersection, and Difference

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (3 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

We begin by defining an ADT for the mathematical model "set" with the three basic set-theoretic operations, union, intersection, and difference. First we give an example where such an ADT is useful and then we discuss several simple implementations of this ADT.

Example 4.1. Let us write a program to do a simple form of "data-flow analysis" on flowcharts that represent procedures. The program will use variables of an abstract data type SET, whose operations are UNION, INTERSECTION, DIFFERENCE, EQUAL, ASSIGN, and MAKENULL, as defined in the previous section.

In Fig. 4.1 we see a flowchart whose boxes have been named B1, . . . , B8, and for which the data definitions (read and assignment statements) have been numbered 1, 2,

. . . , 9. This flowchart happens to implement the Euclidean algorithm, to compute the greatest common divisor of inputs p and q, but the details of the algorithm are not relevant to the example.

In general, data-flow analysis refers to that part of a compiler that examines a flowchart-like representation of a source program, such as Fig. 4.1, and collects information about what can be true as control reaches each box of the flowchart. The boxes are often called blocks or basic blocks, and they represent collections of statements through which the flow-of-control proceeds sequentially. The information collected during data-flow analysis is used to help improve the code generated by the compiler. For example, if data-flow analysis told us that each time control reached block B, variable x had the value 27, then we could substitute 27 for all uses of x in block B, unless x were assigned a new value within block B. If constants can be accessed more quickly than variables, this change could speed up the code produced by the compiler.

In our example, we want to determine where a variable could last have been given a new value. That is, we want to compute for each block Bi the set DEFIN[i] of data definitions d such that there is a path from B1 to Bi in which d appears, but is not followed by any other definition of the same variable as d defines. DEFIN[i] is called the set of reaching definitions for Bi.

To see how such information could be useful, consider Fig. 4.1. The first block B1 is a "dummy" block of three data definitions, making the three variables t, p, and q have "undefined" values. If we discover, for example, that DEFIN[7] includes definition 3, which gives q an undefined value, then the program might contain a bug, as apparently it could print q without first assigning a valid value to q. Fortunately, we shall discover that it is impossible to reach block B7 without assigning to q; that is,

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (4 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

3 is not in DEFIN[7].

The computation of the DEFIN[i]'s is aided by several rules. First, we precompute for each block i two sets GEN[i] and KILL[i]. GEN[i] is the set of data definitions in block i, with the exception that if Bi contains two or more definitions of variable x, then only the last is in GEN[i]. Thus, GEN[i] is the set of definitions in Bi that are "generated" by Bi; they reach the end of Bi without having their variables redefined.

The set KILL[i] is the set of definitions d not in Bi such that Bi has a definition of the same variable as d. For example, in Fig. 4.1, GEN[4] = {6}, since definition 6 (of variable t) is in B4 and there are no subsequent definitions of t in B4. KILL[4] = {l, 9}, since these are the definitions of variable t that are not in B4.

Fig. 4.1. A flowchart of the Euclidean algorithm.

In addition to the DEFIN[i]'s, we compute the set DEFOUT[i] for each block Bi. Just as DEFIN[i] is the set of definitions that reach the beginning of Bi, DEFOUT[i] is the set of definitions reaching the end of Bi. There is a simple formula relating DEFIN and DEFOUT, namely

That is, definition d reaches the end of Bi if and only if it either reaches the beginning of Bi and is not killed by Bi, or it is generated in Bi. The second rule relating DEFIN and DEFOUT is that DEFIN[i] is the union, over all predecessors p of Bi, of

DEFOUT[p], that is:

Rule (4.2) says that a data definition enters Bi if and only if it reaches the end of one of Bi's predecessors. As a special case, if Bi has no predecessors, as B1 in Fig. 4.1, then DEFIN[i] = .

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (5 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

Because we have introduced a variety of new concepts in this example, we shall not try to complicate matters by writing a general algorithm for computing the reaching definitions of an arbitrary flowgraph. Rather, we shall write a part of a program that assumes GEN[i] and KILL[i] are available for i = 1, . . . , 8 and computes DEFIN[i] and DEFOUT[i] for 1, . . . , 8, assuming the particular flowgraph of Fig. 4.1. This program fragment assumes the existence of an ADT SET with operations UNION, INTERSECTION, DIFFERENCE, EQUAL, ASSIGN, and MAKENULL; we shall give alternative implementations of this ADT later.

The procedure propagate( GEN, KILL, DEFIN, DEFOUT) applies rule (4.1) to compute DEFOUT for a block, given DEFIN. If a program were loop-free, then the calculation of DEFOUT would be straightforward. The presence of a loop in the program fragment of Fig. 4.2 necessitates an iterative procedure. We approximate DEFIN[i] by starting with DEFIN[i] = Ø and DEFOUT[i] = GEN[i] for all i, and then repeatedly apply (4.1) and (4.2) until no more changes to DEFIN's and DEFOUT's occur. Since each new value assigned to DEFIN[i] or DEFOUT[i] can be shown to be a superset (not necessarily proper) of its former value, and there are only a finite number of data definitions in any program, the process must converge eventually to a solution to (4.1) and (4.2).

The successive values of DEFIN[i] after each iteration of the repeat-loop are shown in Fig. 4.3. Note that none of the dummy assignments 1, 2, and 3 reaches a block where their variable is used, so there are no undefined variable uses in the program of Fig. 4.1. Also note that by deferring the application of (4.2) for Bi until just before we apply (4.1) for Bi would make the process of Fig. 4.2 converge in fewer iterations in general.

4.3 A Bit-Vector Implementation of Sets

The best implementation of a SET ADT depends on the operations to be performed and on the size of the set. When all sets in our domain of discourse are subsets of a small "universal set" whose elements are the integers 1, . . . , N for some fixed N, then we can use a bit-vector (boolean array) implementation. A set is represented by a bit vector in which the ith bit is true if i is an element of the set. The major advantage of this representation

var

GEN, KILL, DEFIN, DEFOUT: array[l..8] of SET;

{ we assume GEN and KILL are computed externally }

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (6 of 52) [1.7.2001 19:04:14]

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]