Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Data-Structures-And-Algorithms-Alfred-V-Aho

.pdf
Скачиваний:
122
Добавлен:
09.04.2015
Размер:
6.91 Mб
Скачать

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

the cost of inserting the next element, but the cost of the membership test for an element in the set is the average cost of all insertions made so far, which is substantially less if the table is fairly full. Deletions have the same average cost as membership testing. But unlike open hashing, deletions from a closed hash table do not help speed up subsequent insertions or membership tests. It should be emphasized that if we never fill closed hash tables to more than any fixed fraction less than one, the average cost of operations is a constant; the constant grows as the permitted fraction of capacity grows. Figure 4.15 graphs the cost of insertions, deletions and membership tests, as a function of the percentage of the table that is full at the time the operation is performed.

Fig. 4.15. Average operation cost.

"Random" Strategies for Collision Resolution

We have observed that the linear rehashing strategy tends to group full buckets into large consecutive blocks. Perhaps we could get more "random" behavior if we probed at a constant interval greater than one. That is, let hi(x) = (h(x)+ci) mod B for some c

> 1. For example, if B = 8, c = 3, and h(x) = 4, we would probe buckets 4, 7, 2, 5, 0, 3, 6, and 1, in that order. Of course, if c and B have a common factor greater than one, this strategy doesn't even allow us to search all buckets; try B = 8 and c = 2, for example. But more significantly, even if c and B are relatively prime (have no common factors), we have the same "bunching up" problem as with linear hashing, although here it is sequences of full buckets separated by difference c that tend to occur. This phenomenon slows down operations as for linear hashing, since an attempted insertion into a full bucket will tend to travel down a chain of full buckets separated by distance c, and the length of this chain will increase by one.

In fact, any rehashing strategy where the target of a probe depends only on the target of the previous probe (as opposed to depending on the number of unsuccessful probes so far, the original bucket h(x), or the value of the key x itself) will exhibit the bunching property of linear hashing. Perhaps the simplest strategy in which the problem does not occur is to let hi(x) = (h(x)+di) mod B where d1, d2, . . . , dB-1 is a "random" permutation of the integers 1, 2, . . . , B-1. Of course, the same sequence d1,

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (27 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

. . . , dB-1 is used for all insertions, deletions and membership tests; the "random" shuffle of the integers is decided upon once, when we design the rehash algorithm.

The generation of "random" numbers is a complicated subject, but fortunately, many common methods do produce a sequence of "random" numbers that is actually a permutation of integers from 1 up to some limit. These random number generators, if reset to their initial values for each operation on the hash table, serve to generate the desired sequence d1, . . . , dB-1.

One effective approach uses "shift register sequences." Let B be a power of 2 and k a constant between 1 and B-1. Start with some number d1 in the range 1 to B - 1, and generate successive numbers in the sequence by taking the previous value, doubling it, and if the result exceeds B, subtracting B and taking the bitwise modulo 2 sum of the result and the selected constant k. The bitwise modulo 2 sum of x and y, written x Å y, is computed by writing x and y in binary, with leading 0's if necessary so both are of the same length, and forming the numbers whose binary representation has a 1 in those positions where one, but not both, of x and y have a 1.

Example 4.7. 25 Å 13 is computed by taking

25 = 11001

13 = 01101

______

25 Å 13 = 10100

Note that this "addition" can be thought of as ordinary binary addition with carries from place to place ignored.

Not every value of k will produce a permutation of 1, 2, . . . , B-1; sometimes a number repeats before all are generated. However, for given B, there is a small but finite chance that any particular k will work, and we need only find one k for each B.

Example 4.8. Let B = 8. If we pick k = 3, we succeed in generating all of 1, 2, . . . , 7. For example, if we start with d1 = 5, then we compute d2 by first doubling d1 to get

10. Since 10 > 8, we subtract 8 to get 2, and then compute d2 = 2 Å 3 = 1. Note that x

Å 3 can be computed by complementing the last two bits of x.

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (28 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

It is instructive to see the 3-bit binary representations of d1, d2, . . . , d7. These are shown in Fig. 4.16, along with the method of their calculation. Note that multiplication by 2 corresponds to a shift left in binary. Thus we have a hint of the origin of the term "shift register sequence."

Fig. 4.16. Calculating a shift register sequence.

The reader should check that we also generate a permutation of 1, 2, . . . ,7 if we choose k = 5, but we fail for other values of k.

Restructuring Hash Tables

If we use an open hash table, the average time for operations increases as N/B, a quantity that grows rapidly as the number of elements exceeds the number of buckets. Similarly, for a closed hash table, we saw from Fig. 4.15 that efficiency goes down as N approaches B, and it is not possible that N exceeds B.

To retain the constant time per operation that is theoretically possible with hash tables, we suggest that if N gets too large, for example N ³ .9B for a closed table or N ³ 2B for an open one, we simply create a new hash table with twice as many buckets. The insertion of the current members of the set into the new table will, on the average, take less time than it took to insert them into the smaller table, and we more than make up this cost when doing subsequent dictionary operations.

4.9 Implementation of the Mapping ADT

Recall our discussion of the MAPPING ADT from Chapter 2 in which we defined a mapping as a function from domain elements to range elements. The operations for this ADT are:

1.MAKENULL(A) initializes the mapping A by making each domain element have no assigned range value.

2.ASSIGN(A, d, r) defines A(d) to be r.

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (29 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

3.COMPUTE(A, d, r) returns true and sets r to A(d) if A(d) is defined; false is returned otherwise.

The hash table is an effective way to implement a mapping. The operations ASSIGN and COMPUTE are implemented much like INSERT and MEMBER operations for a dictionary. Let us consider an open hash table first. We assume the hash function h(d) hashes domain elements to bucket numbers. While for the dictionary, buckets consisted of a linked list of elements, for the mapping we need a list of domain elements paired with their range values. That is, we replace the cell definition in Fig. 4.12 by

type

celltype = record domainelement: domaintype; range: rangetype;

next: − celltype end

where domaintype and rangetype are whatever types domain and range elements have in this mapping. The declaration of a MAPPING is

type

MAPPING = array[0..B-1] of − celltype

This array is the bucket array for a hash table. The procedure ASSIGN is written in Fig. 4.17. The code for MAKENULL and COMPUTE are left as an exercise.

Similarly, we can use a closed hash table as a mapping. Define cells to consist of domain element and range fields and declare a MAPPING to be an array of cells. As for open hash tables, let the hash function apply to domain elements, not range elements. We leave the implementation of the mapping operations for a closed hash table as an exercise.

4.10 Priority Queues

The priority queue is an ADT based on the set model with the operations INSERT and DELETEMIN, as well as the usual MAKENULL for initialization of the data structure. To define the new operation, DELETEMIN, we first assume that elements of the set have a "priority" function defined on

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (30 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

procedure ASSIGN ( var A: MAPPING; d: domaintype; r: rangetype );

var

bucket: integer; current: − celltype;

begin

bucket := h(d); current := A[bucket];

while current <> nil do

if current−.domainelement = d then begin current−.range := r;

{ replace old value for d } return

end else

current := current−.next;

{ at this point, d was not found on the list }

current := A[bucket]; { use current to remember first cell

}

new(A[bucket]);

A [bucket]−.domainelement

:= d;

A [bucket]−.range: =

r;

A [bucket]−.next :=

current

end; { ASSIGN }

Fig. 4.17. The procedure ASSIGN for an open hash table.

them; for each element a, p(a), the priority of a, is a real number or, more generally, a member of some linearly ordered set. The operation INSERT has the usual meaning, while DELETEMIN is a function that returns some element of smallest priority and, as a side effect, deletes it from the set. Thus, as its name implies, DELETEMIN is a combination of the operations DELETE and MIN discussed earlier in the chapter.

Example 4.9. The term "priority queue" comes from the following sort of use for this ADT. The word "queue" suggests that people or entities are waiting for some service, and the word "priority" suggests that the service is given not on a "first-come-first- served" basis as for the QUEUE ADT, but rather that each person has a priority based on the urgency of need. An example is a hospital waiting room, where patients having potentially fatal problems will be taken before any others, no matter how long the

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (31 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

respective waits are.

As a more mundane example of the use of priority queues, a timeshared computing system needs to maintain a set of processes waiting for service. Usually, the system designers want to make short processes appear to be instantaneous (in practice, response within a second or two appears instantaneous), so these are given priority over processes that have already consumed substantial time. A process that requires several seconds of computing time cannot be made to appear instantaneous, so it is sensible strategy to defer these until all processes that have a chance to appear instantaneous have been done. However, if we are not careful, processes that have taken substantially more time than average may never get another time slice and will wait forever.

One possible way to favor short processes, yet not lock out long ones is to give process P a priority 100tused(P) - tinit(P). The parameter tused gives the amount of time consumed by the process so far, and tinit gives the time at which the process initiated, measured from some "time zero." Note that priorities will generally be large negative integers, unless we choose to measure tinit from a time in the future. Also note that 100 in the above formula is a "magic number"; it is selected to be somewhat larger than the largest number of processes we expect to be active at once. The reader may observe that if we always pick the process with the smallest priority number, and there are not too many short processes in the mix, then in the long run, a process that does not finish quickly will receive 1% of the processor's time. If that is too much or too little, another constant can replace 100 in the priority formula.

We shall represent processes by records consisting of a process identifier and a priority number. That is, we define

type

processtype = record id: integer; priority: integer

end;

The priority of a process is the value of the priority field, which here we have defined to be an integer. We can define the priority function as follows.

function p ( a: processtype ): integer; begin

return (a.priority) end;

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (32 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

In selecting processes to receive a time slice, the system maintains a priority queue WAITING of processtype elements and uses two procedures, initial and select, to manipulate the priority queue by the operations INSERT and DELETEMIN. Whenever a process is initiated, procedure initial is called to place a record for that process in WAITING. Procedure select is called when the system has a time slice to award to some process. The record for the selected process is deleted from WAITING, but retained by select for reentry into the queue with a new priority; the priority is increased by 100 times the amount of time used.

We make use of function currenttime, which returns the current time, in whatever time units are used by the system, say microseconds, and we use procedure execute(P) to cause the process with identifier P to execute for one time slice. Figure

4.18shows the procedures initial and select.

procedure initial ( P: integer );

{initial places process with id P on the queue } var

process: processtype; begin

process.id := P;

process.priority := - currenttime; INSERT (process, WAITING)

end; { initial }

procedure select;

{ select allocates a time slice to process with highest priority } var

begintime, endtime: integer; process: processtype;

begin

process := − DELETEMIN(WAITING);

{ DELETEMIN returns a pointer to the deleted element } begintime := currenttime;

execute (process.id); endtime := currenttime;

process.priority := process.priority + 100*(endtime - begintime);

{adjust priority to incorporate amount of time used } INSERT (process, WAITING)

{put selected process back on queue with new priority } end; { select }

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (33 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

Fig. 4.18. Allocating time to processes.

4.11 Implementations of Priority Queues

With the exception of the hash table, the set implementations we have studied so far are also appropriate for priority queues. The reason the hash table is inappropriate is that there is no convenient way to find the minimum element, so hashing merely adds complications, and does not improve performance over, say, a linked list.

If we use a linked list, we have a choice of sorting it or leaving it unsorted. If we sort the list, finding a minimum is easy -- just take the first element on the list. However, insertion requires scanning half the list on the average to maintain the sorted list. On the other hand, we could leave the list unsorted, which makes insertion easy and selection of a minimum more difficult.

Example 4.10. We shall implement DELETEMIN for an unsorted list of elements of type processtype, as defined in Example 4.9. The list is headed by an empty cell. The implementations of INSERT and MAKENULL are straightforward, and we leave the implementation using sorted lists as an exercise. Figure 4.19 gives the declaration for cells, for the type PRIORITYQUEUE, and for the procedure DELETEMIN.

Partially Ordered Tree Implementation of

Priority Queues

Whether we choose sorted or unsorted lists to represent priority queues, we must spend time proportional to n to implement one or the other of INSERT or DELETEMIN on sets of size n. There is another implementation in which DELETEMIN and INSERT each require O(logn) steps, a substantial improvement for large n (say n ³ 100). The basic idea is to organize the elements of the priority queue in a binary tree that is as balanced as possible; an example is in Fig. 4.20. At the lowest level, where some leaves may be missing, we require that all missing leaves are to the right of all leaves that are present on the lowest level.

Most importantly, the tree is partially ordered, meaning that the priority of node v is no greater than the priority of the children of v, where the priority of a node is the priority number of the element stored at the node. Note from Fig. 4.20 that small priority numbers need not appear at higher levels than larger priority numbers. For example, level three has 6 and 8, which are less than the priority number 9 appearing on level 2. However, the parent of 6 and 8 has priority 5, which is, and must be, at least as small as the priorities of its children.

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (34 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

To execute DELETEMIN, we return the minimum-priority element, which, it is easily seen, must be at the root. However, if we simply remove the root, we no longer have a tree. To maintain the property of being a partially ordered tree, as balanced as possible, with leaves at the lowest level as far left as can be, we take the rightmost leaf at the lowest level and temporarily put it at the root. Figure 4.21(a) shows this change from Fig. 4.20. Then we push this element as far down the tree as it will go, by exchanging it with the one of its children having smaller priority, until the element is either at a leaf or at a position where it has priority no larger than either of its children.

In Fig. 4.21(a) we must exchange the root with its smaller-priority child, which has priority 5. The result is shown in Fig. 4.21(b). The element being pushed down is still larger than its children, with priorities 6 and 8. We exchange it with the smaller of these, 6, to reach the tree of Fig. 4.21(c). This tree has the partially ordered property.

In this percolation, if a node v has an element with priority a, if its children have elements with priorities b and c, and if at least one of b and c is smaller than a, then exchanging a with the smaller of b and c results in v having an element whose priority is smaller than either of its children. To prove this, suppose b c. After the exchange, node v acquires b, and its

type

celltype = record element: processtype; next: − celltype

end;

PRIORITYQUEUE = − celltype;

{ cell pointed to is a list header }

function DELETEMIN ( var A: PRIORITYQUEUE ): − celltype; var

current: − celltype; { cell before one being "scanned" }

lowpriority: integer; { smallest priority found so far } prewinner: − celltype; { cell before one with

element

of smallest priority } begin

if A −.next = nil

then

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (35 of 52) [1.7.2001 19:04:14]

Data Structures and Algorithms: CHAPTER 4: Basic Operations on Sets

error('cannot find minimum of empty list') else begin

lowpriority := p(A−.next−.element);

{ p returns priority of the first element. Note A points to a header cell that does not contain an element }

prewinner := A; current := A−.next; while current−.next <>

nil do begin

{ compare priorities of current winner and next element }

if p( current−.next−.element) < lowpriority then begin

prewinner := current;

lowpriority : = p (current−.next−. element)

end;

current := current−.next end;

DELETEMIN := prewinner−.next; { return pointer to the winner }

prewinner−.next := prewinner−.next−.next

{ remove winner from list } end

end; { DELETEMIN }

Fig. 4.19. Linked list priority queue implementation.

children hold a and c. We assumed b c, and we were told that a is larger than at least one of b and c. Therefore b ≤ a surely holds. Thus the insertion procedure outlined above percolates an element down the tree until there

Fig. 4.20. A partially ordered tree.

are no more violations of the partially ordered property.

http://www.ourstillwaters.org/stillwaters/csteaching/DataStructuresAndAlgorithms/mf1204.htm (36 of 52) [1.7.2001 19:04:14]

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]