Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Data-Structures-And-Algorithms-Alfred-V-Aho

.pdf
Скачиваний:
122
Добавлен:
09.04.2015
Размер:
6.91 Mб
Скачать

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

Let us also observe that DELETEMIN applied to a set of n elements takes O(logn) time. This follows since no path in the tree has more than 1 + logn nodes, and the process of forcing an element down the tree takes a constant time per node. Observe that for any constant c, the quantity c(1+logn) is at most 2clogn for n ³ 2. Thus c(1+logn) is O(logn).

Now let us consider how INSERT should work. First, we place the new element as far left as possible on the lowest level, starting a new level if the current lowest level is all filled. Fig. 4.22(a) shows the result of placing an element with priority 4 into Fig. 4.20. If the new element has priority lower than its parent, exchange it with its parent. The new element is now at a position where it is of lower priority than either of its children, but it may also be of lower priority than its parent, in which case we must exchange it with its parent, and repeat the process, until the new element is either at the root or has larger priority than its parent. Figures 4.22(b) and (c) show how 4 moves up by this process.

We could prove that the above steps result in a partially ordered tree. While we shall not attempt a rigorous proof, let us observe that an element with priority a can become the parent of an element with priority b in three ways. (In what follows, we identify an element with its priority.)

1.a is the new element and moves up the tree replacing the old parent of b. Let

the old parent of b have priority c. Then a < c, else the exchange would not take place. But c £ b, since the original tree was partially ordered. Thus a < b.

For example in Fig. 4.22(c) 4 became the parent of 6, replacing a parent of larger priority, 5.

2.a was pushed down the tree due to an exchange with the new element. In this

case a must have been an ancestor of b in the original partially ordered tree. Thus a £ b. For example, in Fig. 4.22(c), 5 becomes the

Fig. 4.21. Pushing an element down the tree.

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (37 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

Fig. 4.22. Inserting an element.

parent of elements with priority 8 and 9. Originally, 5 was the parent of the former and the "grandparent" of the latter.

3.b might be the new element, and it moves up to become a child of a. If a > b, then a and b will be exchanged at the next step, so this violation of the partially ordered property will be removed.

The time to perform an insertion is proportional to the distance up the tree that the new element travels. As for DELETEMIN, we observe that this distance can be no greater than 1 + logn, so both INSERT and DELETEMIN take O(log n) steps.

An Array Implementation of Partially

Ordered Trees

The fact that the trees we have been considering are binary, balanced as much as possible, and have leaves at the lowest level pushed to the left means that we can use a rather unusual representation for these trees, called a heap. If there are n nodes, we use the first n positions of an array A. A [1] holds the root. The left child of the node in A[i], if it exists, is at A [2i], and the right child, if it exists, is at A [2i + 1]. An equivalent viewpoint is that the parent of A [i] is A [i div 2], for i > 1. Still another equivalent observation is that the nodes of a tree fill up A[1], A[2], . . . , A [n] level-by- level, from the top, and within a level, from the left. For example, Fig. 4.20 corresponds to an array containing the elements 3, 5, 9, 6, 8, 9, 10, 10, 18, 9.

We can declare a priority queue of elements of some type, say processtype as in Example 4.9, to consist of an array of processtype and an integer last, indicating the currently last element of the array that is in use. If we assume maxsize is the desired size of arrays for a priority queue, we can declare:

type

PRIORITYQUEUE = record

contents: array[1..maxsize] of processtype; last: integer

end;

The priority queue operations are implemented below in Fig. 4.23.

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (38 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

procedure MAKENULL ( var A: PRIORITYQUEUE ); begin

A .last := 0

end; { MAKENULL }

procedure INSERT ( x: processtype; var A: PRIORITYQUEUE );

var

i: integer;

temp : processtype; begin

if A .last >= maxsize then error ('priority queue is full')

else begin

A .last := A .last + 1;

A .contents [A .last] := x;

i := A .last; { i is index of current position of x } while (i > 1) and

(p(A .contents[i]) < p(A .contents[i div 2])) do

begin { push x up the tree by exchanging it with its parent of larger priority. Recall p computes the priority of

a processtype element } temp := A .contents [i];

A .contents [i] := A .contents [i

div 2];

A .contents [i div 2] := temp; i := i div 2

end end

end; { INSERT }

function DELETEMIN ( var A: PRIORITYQUEUE ): − processtype; var

i, j: integer;

temp : processtype; minimum: − processtype;

begin

if A .last = 0 then

error('priority queue is empty') else begin

new (minimum);

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (39 of 52) [1.7.2001 19:04:15]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

minimum − := A .contents

[1];

{we shall return a pointer to a copy of the root of A }

A .contents [1] := A .contents [A .last]; A .last := A .last - 1;

{move the last element to the beginning }

i:= 1; { i is the current position of the old last element } while i < = A .last div 2 do begin

{push old last element down tree }

if (p (A .contents [2*i]) < p

(A .contents [2*i + 1]))

or (2*i = A.last) then j := 2 * i

else

j := 2 * i + 1;

{j will be the child of i having the smaller priority or if 2*i = A.last, then j is the only child of i }

if p (A .contents [i]) > p (A

.contents [j]) then begin

{ exchange old last element with smaller priority child }

temp := A.contents[i]; A.contents[i] := A.contents[j]; A.contents[j] := temp;

i := j end else

return (minimum) { cannot push further }

end;

return (minimum) { pushed all the way to a leaf } end

end; { DELETEMIN }

Fig. 4.23. Array implementation of priority queue.

4.12 Some Complex Set Structures

In this section we consider two more complex uses of sets to represent data. The first problem is that of representing many-many relationships, as might occur in a database system. A second case study exhibits how a pair of data structures representing the same object (a mapping in our example) can be a more efficient representation than either acting alone.

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (40 of 52) [1.7.2001 19:04:15]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

Many-Many Relationships and the

Multilist Structure

An example of a many-many relationship between students and courses is suggested in Fig. 4.24. It is called a "many-many" relationship because there can be many students taking each course and many courses taken by each student.

Fig. 4.24. An example relationship between students and courses.

From time to time the registrar may wish to insert or delete students from courses, to determine which students are taking a given course, or to know which courses a given student is taking. The simplest data structure with which these questions can be answered is obvious; just use the 2-dimensional array suggested by Fig. 4.24, where the value 1 (or true) replaces the X's and 0 (or false) replaces the blanks.

For example, to insert a student into a course, we need a mapping, MS, perhaps implemented as a hash table, that translates student names into array indices, and another, MC, that translates course names into array indices. Then, to insert student s into course c, we simply set

Enrollment[MS(s ), MC(c)] := 1.

Deletions are performed by setting this element to 0. To find the courses taken by the student with name s, run across the row MS(s), and similarly run down the column MC(c) find the students in course c.

Why might we wish to look further for a more appropriate data structure? Consider a large university with perhaps 1000 courses and 20,000 students, taking an average of three courses each. The array suggested by Fig. 4.24 would have 20,000,000 elements, of which 60,000, or 0.3%, would be 1. Such an array, which is called sparse to indicate that almost all its elements are zero, can be represented in far less space if we simply list the nonzero entries. Moreover, we can spend a great deal of time scanning a column of 20,000 entries searching for, on the average, 60 that are nonzero; row scans take considerable time as well.

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (41 of 52) [1.7.2001 19:04:15]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

One way to do better is to begin by phrasing the problem as one of maintaining a collection of sets. Two of these sets are S and C, the sets of all students and all courses. Each element of S is actually a record of a type such as

type

studenttype = record id: integer;

name: array[l..30] of char; end

and we would invent a similar record type for courses. To implement the structure we have in mind, we need a third set of elements, E, standing for enrollments. The elements of E each represent one of the boxes in the array of Fig. 4.24 in which there is an X. The E elements are records of a fixed type. As of now, we don't know any fields that belong in these records, although we shall soon learn of them. For the moment, let us simply postulate that there is an enrollment record for each X-entry in the array and that enrollment records are distinguishable from one another somehow.

We also need sets that represent the answers to the crucial questions: given a student or course, what are the related courses or students, respectively. It would be nice if for each student s there were a set Cs of all the courses s was taking and conversely, a set Sc of the students taking course c.

Such sets would be hard to implement because there would be no limit on the number of sets any element could be in, forcing us to complicate student and course records. We could instead let Sc and Cs be sets of pointers to course and student records, rather than the records themselves, but there is a method that saves a significant fraction of the space and allows equally fast answers to questions about students and courses.

Let us make each set Cs be the set of enrollment records corresponding to student s and some course c. That is, if we think of an enrollment as a pair (s, c), then

Cs = {(s, c) | s is taking course c}

We can similarly define

Sc = {(s, c) | s is taking course c}

Note the only difference in the meaning of the two set formers above is that in the first case s is constant and in the second c is. For example, based on Fig. 4.24, CAlex = {(Alex, CS101), (Alex, CS202)} and SCS101 = {(Alex, CS101), (Amy, CS101), (Ann, CS101)}.

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (42 of 52) [1.7.2001 19:04:15]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

Multilist Structures

In general, a multilist structure is any collection of cells some of which have more than one pointer and can therefore be on more than one list simultaneously. For each type of cell in a multilist structure, it is important to distinguish among pointer fields so we can follow one particular list and not get confused about which of several pointers in a particular cell we are following.

As a case in point, we can put one pointer field in each student and course record, pointing to the first enrollment record in the set Cs or Sc, respectively. Each enrollment record needs two pointer fields, one, which we shall call cnext, for the next enrollment on the list for the Cs set to which it belongs and the other, snext, for the Sc set to which it belongs.

It turns out that an enrollment record indicates explicitly neither the student nor the course that it represents. The information is implicit in the lists on which the enrollment record appears. Call the student and course records heading these lists the owners of the enrollment record. Thus, to tell what courses student s is taking, we must scan the enrollment records in Cs and find for each one the owner course record. We could do that by placing a pointer in each enrollment record to the owning course record, and we would also need a pointer to the owning student record.

While we might wish to use these pointers and thereby answer questions in the minimum possible time, we can save substantial spaceat the cost of slowing down some computations, if we eliminate these pointers and instead place at the end of each Sc list a pointer to the owning course and at the end of each Cs list a pointer to the owning student. Thus, each student and course record becomes part of a ring that includes all the enrollment records it owns. These rings are depicted in Fig. 4.25 for the data of Fig. 4.24. Note that each enrollment record has its cnext pointer first and its snext pointer second.

Fig. 4.25. Multilist representation of Fig. 4.24.

Example 4.11. To answer a question like "which students are taking CS101," we find the course record for CS101. How we find such a record depends on how the set of courses is maintained. For example, there might be a hash table containing all such

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (43 of 52) [1.7.2001 19:04:15]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

records, and we obtain the desired record by applying some hash function to "CS101".

We follow the pointer in the CS101 record to the first enrollment record in the ring for CS101. This is the second enrollment record from the left. We must then find the student owner of this enrollment record, which we do by following cnext pointers (the first pointer in enrollment records) until we reach a student record.In this case, we reach the third enrollment record, then the student record for Alex; we now know that Alex is taking CS101.

Now we must find the next student in CS101. We do so by following the snext pointer (second pointer) in the second enrollment record, which leads to the fifth enrollment record. The cnext pointer in that record leads us directly to the owner, Amy, so Amy is in CS101. Finally, we follow the snext pointer in the fifth enrollment record to the eighth enrollment record. The ring of cnext pointers from that record leads to the ninth enrollment record, then to the student record for Ann, so Ann is in CS101. The snext pointer in the eighth enrollment record leads back to CS101, so we are done.

Abstractly, we can express the operation in Example 4.11 above as

for each enrollment record in the set for CS101 do begin s := the student owner of the enrollment record; print(s)

end

The above assignment to s can be elaborated as

f := e; repeat

f := f−.cnext until

f is a pointer to a student record;

s := studentname field of record pointed to by f;

where e is a pointer to the first enrollment record in the set for CS101.

In order to implement a structure like Fig. 4.25 in Pascal, we need to have only one record type, with variants to cover the cases of student, course, and enrollment records. Pascal forces this arrangement on us since the fields cnext and snext each have the ability to point to different record types. This arrangement, however, does solve one of our other problems. It is now easy to tell what type of record we have reached as we travel around a ring. Figure 4.26 shows a possible declaration of the

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (44 of 52) [1.7.2001 19:04:15]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

records and a procedure that prints the students taking a particular class.

Dual Data Structures for Efficiency

Often, a seemingly simple representation problem for a set or mapping presents a difficult problem of data structure choice. Picking one data structure for the set makes certain operations easy, but others take too much time, and it seems that there is no one data structure that makes all the operations easy. In that case, the solution often turns out to be the use of two or more different structures for the same set or mapping.

Suppose we wish to maintain a "tennis ladder," in which each player is on a unique "rung." New players are added to the bottom, that is, the highestnumbered rung. A player can challenge the player on the rung above, and if the player below wins the match, they trade rungs. We can represent this situation as an abstract data type, where the underlying model is a mapping from names (character strings) to rungs (integers 1, 2, . . . ). The three operations we perform are

1.ADD(name) adds the named person to the ladder at the highest-numbered rung.

2.CHALLENGE(name) is a function that returns the name of the person on rung i- 1 if the named person is on rung i, i > 1.

3.EXCHANGE(i) swaps the names of the players on rungs i and i-l, i > l.

type

stype = array[l..20] of char; ctype = array[1..5] of char;

recordkinds = (student, course, enrollment); recordtype = record

case kind: recordkinds of

student: (studentname: stype; firstcourse: − recordtype); course: (coursename: ctype; firststudent: − recordtype);

enrollment: (cnext, snext: − recordtype)

end;

procedure printstudents ( cname: ctype ); var

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (45 of 52) [1.7.2001 19:04:15]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

c, e, f: − recordtype; begin

c := pointer to course record with c−.coursename = cname;

{above depends on how course set is implemented } e := c −.firststudent;

{e runs around the ring of enrollments pointed to by c }

while e −.kind = enrollment do begin

f := e; repeat

f := f−.cnext until

f−.kind = student;

{ f now points to student owner of enrollment e − } writeln (f−.studentname ) ;

e := e−.snext end

end

Fig. 4.26. Implementation of search through a multilist.

Notice that we have chosen to pass only the higher rung number to EXCHANGE, while the other two operations take a name as argument.

We might, for example, choose an array LADDER, where LADDER[i] is the name of the person on rung i. If we also keep a count of the number of players, adding a player to the first unoccupied rung can be done in some small constant number of steps.

EXCHANGE is also easy, as we simply swap two elements of the array. However, CHALLENGE(name) requires that we examine the entire array for the name, which takes O(n) time, if n is the number of players on the ladder.

On the other hand, we might consider a hash table to represent the mapping from names to rungs. On the assumption that we can keep the number of buckets roughly proportional to the number of players, ADD takes O(1) time, on the average.

Challenging takes O(1) time on the average to look up the given name, but O(n) time to find the name on the next lower-numbered rung, since the entire hash table may have to be searched. Exchanging requires O(n) time to find the players on rungs i and i - 1.

Suppose, however, that we combine the two structures. The cells of the hash table

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (46 of 52) [1.7.2001 19:04:15]

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]