Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Data-Structures-And-Algorithms-Alfred-V-Aho

.pdf
Скачиваний:
122
Добавлен:
09.04.2015
Размер:
6.91 Mб
Скачать

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

i: integer; changed: boolean;

procedure propagate ( G, K, I: SET; var

O: SET );

{ apply (4.1) and set changed to true if a change in DEFOUT is detected }

var

TEMP: SET; begin

DIFFERENCE(I, K, TEMP);

UNION(TEMP, G, TEMP);

if not EQUAL(TEMP, O) do begin

ASSIGN(O, TEMP); changed := true

end

end; { propagate }

begin

for i:= 1 to 8 do

ASSIGN(DEFOUT[i], GEN[i]); repeat

changed := false;

{the next 8 statements apply (4.2) for the graph of Fig. 4.1 only}

MAKENULL(DEFIN[1]); ASSIGN(DEFIN[2], DEFOUT[1]); ASSIGN(DEFIN[3], DEFOUT[2]); ASSIGN(DEFIN[4], DEFOUT[3]);

UNION(DEFOUT[4], DEFOUT[8], DEFIN[5]); UNION(DEFOUT[3], DEFOUT[5], DEFIN[6]); ASSIGN(DEFIN[7], DEFOUT[6]); ASSIGN(DEFIN[8], DEFOUT[6]);

for i:= 1 to 8 do propagate(GEN[i], KILL[i],

DEFIN[i], DEFOUT[i]); until

not changed

end.

Fig. 4.2. Program to compute reaching definitions.

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (7 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

Fig. 4.3. DEFIN[i] after each iteration.

is that MEMBER, INSERT, and DELETE operations can be performed in constant time by directly addressing the appropriate bit. UNION, INTERSECTION, and DIFFERENCE can be performed in time proportional to the size of the universal set.

If the universal set is sufficiently small so that a bit vector fits in one computer word, then UNION, INTERSECTION, and DIFFERENCE can be performed by single logical operations in the language of the underlying machine. Certain small sets can be represented directly in Pascal using the set construct. The maximum size of such sets depends on the particular compiler used and, unfortunately, it is often too small to be used in typical set problems. However, in writing our own programs, we need not be constrained by any limit on our set size, as long as we can treat our sets as subsets of some universal set {1, . . . , N}. We intend that if A is a set represented as a boolean array, then A[i] is true if and only if element i is a member of A. Thus, we can define an ADT SET by the Pascal declaration

const

N = { whatever value is appropriate }; type

SET = packed array[1..N] of boolean;

We can then implement the procedure UNION as shown in Fig. 4.4. To implement INTERSECTION and DIFFERENCE, we replace "or" in Fig. 4.4 by "and" and "and not," respectively. The reader can implement the other operations mentioned in Section 4.1 (except MERGE and FIND, which make little sense in this context) as easy exercises.

It is possible to use the bit-vector implementation of sets when the universal set is a finite set other than a set of consecutive integers. Normally, we would then need a way to translate between members of the universal set and the integers 1, . . . , N. Thus, in Example 4.1 we assumed that the data definitions were assigned numbers from 1 to 9. In general, the translations in

procedure UNION ( A, B: SET; var C: SET ); var

i: integer;

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (8 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

begin

for i := 1 to N do

C[i] := A[i] or B[i]

end

Fig. 4.4. Implementation of UNION.

both directions could be performed by the MAPPING ADT described in Chapter 2. However, the inverse translation from integers to elements of the universal set can be accomplished better using an array A, where A[i] is the element corresponding to integer i.

4.4 A Linked-List Implementation of Sets

It should also be evident that sets can be represented by linked lists, where the items in the list are the elements of the set. Unlike the bit-vector representation, the list representation uses space proportional to the size of the set represented, not the size of the universal set. Moreover, the list representation is somewhat more general since it can handle sets that need not be subsets of some finite universal set.

When we have operations like INTERSECTION on sets represented by linked lists, we have several options. If the universal set is linearly ordered, then we can represent a set by a sorted list. That is, we assume all set members are comparable by a relation "<" and the members of a set appear on a list in the order e1, e2, . . . , en, where e1 < e2 < e3 < . . . < en. The advantage of a sorted list is that we do not need to search the entire list to determine whether an element is on the list.

An element is in the intersection of lists L1 and L2 if and only if it is on both lists. With unsorted lists we must match each element on L1 with each element on L2, a

process that takes O(n2) steps on lists of length n. The reason that sorting the lists makes intersection and some other operations easy is that if we wish to match an element e on one list L1 with the elements of another list L2, we have only to look down L2 until we either find e, or find an element greater than e; in the first case we have found the match, while in the second case, we know none exists. More importantly, if d is the element on L1 that immediately precedes e, and we have found onL2 the first element, say f, such that d f, then to search L2 for an occurrence of e we can begin with f. The conclusion from this reasoning is that we can find matches for all the elements of L1, if they exist, by scanning L1 and L2 once, provided we advance the position markers for the two lists in the proper order, always advancing the one with the smaller element. The routine to implement INTERSECTION is

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (9 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

shown in Fig. 4.5. There, sets are represented by linked lists of "cells" whose type is defined

type

celltype = record element: elementtype; next: − celltype

end

Figure 4.5 assumes elementtype is a type, such as integer, that can be compared by <. If not, we have to write a function that determines which of two elements precedes the other.

The linked lists in Fig. 4.5 are headed by empty cells that serve as entry points to the lists. The reader may, as an exercise, write this program in a more general abstract form using list primitives. The program in Fig. 4.5, however, may be more efficient than the more abstract program. For example, Fig. 4.5 uses pointers to particular cells rather than "position" variables that point to previous cells. We can do so because we only append to list C, and A and B are only scanned, with no insertions or deletions done on those lists.

The operations of UNION and DIFFERENCE can be written to look surprisingly like the INTERSECTION procedure of Fig. 4.5. For UNION, we must attach all elements from either the A or B list to the C list, in their proper, sorted order, so when the elements are unequal (lines 12-14), we add the smaller to the C list just as we do when the elements are equal. We also append to list C all elements on the list not exhausted when the test of line (5) fails. For DIFFERENCE we do not add an element to the C list when equal elements are found. We only add the current A list element to the C list when it is smaller than the current B list element; for then we know the former cannot be found on the B list. Also, we add to C those elements on A when and if the test of line (5) fails because B is exhausted.

The operator ASSIGN(A, B) copies list A into list B. Note that, this operator cannot be implemented simply by making the header cell of A point to the same place as the header cell of B, because in that case, subsequent changes to B would cause unexpected changes to A. The MIN operator is especially easy; just return the first element on the list. DELETE and FIND can be implemented by finding the target item as discussed for general lists and in the case of a DELETE, disposing of its cell.

Lastly, insertion is not difficult to implement, but we must arrange to insert the new element into the proper position. Figure 4.6 shows a procedure INSERT that takes as parameters an element and a pointer to the header cell of a list, and inserts the

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (10 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

element into the list. Figure 4.7 shows the crucial cells and pointers just before (solid) and after (dashed) insertion.

procedure INTERSECTION ( ahead, bhead: − celltype; var pc: − celltype );

{ computes the intersection of sorted lists A and B with header cells ahead and bhead, leaving the result

as a sorted list whose header is pointed to by pc }

var

acurrent, bcurrent, ccurrent: − celltype;

{the current cells of lists A and B, and the last cell added list C } begin

(1)new(pc); { create header for list C }

(2)acurrent := ahead−.next;

(3)bcurrent := bhead −.next;

(4)ccurrent := pc;

(5)while (acurrent <> nil) and (bcurrent <> nil) do begin

{compare current elements on lists A and B }

(6)if acurrent −.element =

bcurrent −.element then begin

 

{ add to intersection }

(7)

new( ccurrent −.next );

(8)

ccurrent := ccurrent −.next;

(9)

ccurrent −.element := acurrent

−.element;

(10

acurrent := acurrent−.next;

(11

bcurrent := bcurrent−.next

 

end

 

else { elements unequal }

(12)

if acurrent−.element <

bcurrent−.element then

(13)

acurrent := acurrent−.next

 

else

(14)

bcurrent := bcurrent−.next

 

end;

(15)

ccurrent−.next := nil

end; { INTERSECTION }

Fig. 4.5. Intersection procedure using sorted lists.

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (11 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

4.5 The Dictionary

When we use a set in the design of an algorithm, we may not need powerful operations like union and intersection. Often, we only need to keep a set of "current" objects, with periodic insertions and deletions from the set. Also, from time to time we may need to know whether a particular element is in the set. A set ADT with the operations INSERT, DELETE, and MEMBER has been given the name dictionary. We shall also include MAKENULL as a dictionary operation to initialize whatever data structure is used in the implementation. Let us consider an example application of the dictionary, and then discuss implementations that are well suited for representing dictionaries.

procedure INSERT ( x: elementtype; p: − celltype );

{ inserts x onto list whose header is pointed to by p } var

current, newcell: − celltype; begin

current := p;

while current−.next <> nil do begin

if current−.next−.element = x then

return; { if x is already on the list, return } if current−.next−.element > x then

goto add; { break } current: = current−.next

end;

add: { current is now the cell after which x is to be inserted } new(newcell);

newcell−.element := x; newcell−.next :=

current−.next;

current−.next := newcell end; { INSERT }

Fig. 4.6. Insertion procedure.

Example 4.2. The Society for the Prevention of Injustice to Tuna (SPIT) keeps a database recording the most recent votes of legislators on issues of importance to tuna lovers. This database is, conceptually, two sets of legislators' names, called goodguys

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (12 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

and badguys. The society is very forgiving of past mistakes, but also tends to forget former friends easily. For example, if a vote to declare Lake Erie off limits to tuna fishermen is taken, all legislators voting in favor will be inserted into goodguys and deleted from badguys, while the opposite will happen to those voting against. Legislators not voting remain in whatever set they were in, if any.

In operation, the database system accepts three commands, each represented by a single character, followed by a ten character string denoting the name of a legislator. Each command is on a separate line. The commands are:

1.F (a legislator voting favorably follows)

2.U (a legislator voting unfavorably follows)

3.? (determine the status of the legislator that follows).

We also allow the character 'E' on the input line to signal the end of processing. Figure 4.8 shows the sketch of the program, written in terms of the as-yet-undefined ADT DICTIONARY, which in this case is intended to be a set of strings of length 10.

Fig. 4.7. The insertion picture.

4.6 Simple Dictionary Implementations

A dictionary can be implemented by a sorted or unsorted linked list. Another possible implementation of a dictionary is by a bit vector, provided the elements of the underlying set are restricted to the integers 1, . . . , N for some N, or are restricted to a set that can be put in correspondence with such a set of integers.

A third possible implementation of a dictionary is to use a fixedlength array with a pointer to the last entry of the array in use. This implementation is only feasible if we can assume our sets never get larger than the length of the array. It has the advantage of simplicity over the linked-list representation, while its disadvantages are that (1) sets cannot grow arbitrarily, (2 deletion is slower, and (3) space is not utilized efficiently if sets are of varying sizes.

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (13 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

It is for the last of these reasons that we did not discuss the array implementation in connection with sets whose unions and intersections were taken frequently. Since arrays as well as lists can be sorted, however, the reader could consider the array implementation we now describe for dictionaries as a possible implementation for sets in general. Figure 4.9 shows the declarations and procedures necessary to supplement Fig. 4.8 to make it a working program.

program tuna ( input, output ); { legislative database }

type

nametype = array[l..10] of char; var

command: char; legislator: nametype;

goodguys, badguys: DICTIONARY;

procedure favor (friend: nametype ); begin

INSERT(friend, goodguys) ; DELETE(friend, badguys)

end; { favor }

procedure unfavor ( foe: nametype ); begin

INSERT(foe, badguys ) ; DELETE(foe, goodguys )

end; { unfavor }

procedure report ( subject: nametype ); begin

if MEMBER(subject, goodguys) then writeln(subject, 'is a friend')

else if MEMBER(subject, badguys) then writeln(subject, 'is a foe')

else

writeln('we have no information about ', subject) end; { report }

begin { main program } MAKENULL(goodguys); MAKENULL(badguys); read(command);

while command < > 'E' do begin readln (legislator);

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (14 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

if command = 'F' then favor(legislator)

else if command = 'U' then unfavor(legislator)

else if command = '?' then report(legislator)

else

error('unknown command'); read(command)

end

end. { tuna }

Fig. 4.8. Outline of the SPIT database program.

const

maxsize = { some suitable number }; type

DICTIONARY = record last: integer;

data: array[l..maxsize] of nametype end;

procedure MAKENULL ( var A: DICTIONARY ); begin

A.last := 0

end; { MAKENULL }

function MEMBER ( x: nametype; var A: DICTIONARY ): boolean;

var

i: integer; begin

for i := 1 to A.last do

if A.data [i] = x then return (true); return (false) { if x is not found }

end; { MEMBER }

procedure INSERT ( x: nametype; var A: DICTIONARY ); begin

if not MEMBER(x, A ) then

If A.last < maxsize then begin A.last := A.last + 1; A.data[A.last ] := x

end

else error ('database is full')

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (15 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

end; { INSERT }

procedure DELETE ( x: nametype; var A: DICTIONARY ); var

i: integer; begin

if A.last > 0 then begin i := 1;

while (A.data[i] <> x) and (i <

A.last) do

i := i + 1;

{ when we reach here, either x has been found, or we are at the last element in set A, or both }

if A.data[i] = x then begin

A.data[i] := A.data[A.last];

{ move the last element into the place of x; Note that if i = A.last, this step does nothing, but the next step will delete x }

A.last := A.last - 1 end

end

end; { DELETE }

Fig. 4.9. Type and procedure declarations for array dictionary.

4.7 The Hash Table Data Structure

The array implementation of dictionaries requires, on the average, O(N) steps to execute a single INSERT, DELETE, or MEMBER instruction on a dictionary of N elements; we get a similar speed if a list implementation is used. The bit-vector implementation takes constant time to do any of these three operations, but we are limited to sets of integers in some small range for that implementation to be feasible.

There is another important and widely useful technique for implementing dictionaries called "hashing." Hashing requires constant time per operation, on the average, and there is no requirement that sets be subsets of any particular finite universal set. In the worst case, this method requires time proportional to the size of the set for each operation, just as the array and list implementations do. By careful design, however, we can make the probability of hashing requiring more than a constant time per operation be arbitrarily small.

We shall consider two somewhat different forms of hashing. One, called open or external hashing, allows the set to be stored in a potentially unlimited space, and

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (16 of 52) [1.7.2001 19:04:14]

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]