Steven S. Skiena
Who's Number 2?
In most sports playoffs, a single elimination tournament is used to decide the championship.
The Marlins were clearly the best team in the 1997 World Series, since they were the only one without a loss. But who is number 2? The Giants, Braves, and Indians all have equal claims, since only the champion beat them!
Each game can be thought of as a comparison. Given n keys, we would like to determine the k largest values. Can we do better than just sorting all of them?
In the tournament example, each team represents an leaf of the tree and each game is an internal node of the tree. Thus there are n-1 games/comparisons for n teams/leaves.
Note that the champion is identified even though no team plays more than games!
Lewis Carroll, author of ``Alice in Wonderland'', studied this problem in the 19th century in order to design better tennis tournaments!
We will seek a data structure which will enable us to repeatedly identify the largest key, and then delete it to retrieve the largest remaining key.
This data structure is called a heap, as in ``top of the heap''.
Binary Heaps
A binary heap is defined to be a binary tree with a key in each node such that:
Conditions 1 and 2 specify the shape of the tree, while condition 3 describes the labeling of the nodes tree.
Unlike the tournament example, each label only appears on one node.
Note that heaps are not binary search trees, but they are binary trees.
Heap Test
Where is the largest element in a heap?
Answer - the root.
Answer - as the root's left or right child.
Answer - it is one of the leaves.
Answer - No! A heap is not a binary search tree, and cannot be effectively used for searching.
Why Do Heaps Lean Left?
As a consequence of the structural definition of a heap, each of the n items can be assigned a number from 1 to n with the property that the left child of node number k has a number 2k and the right child number 2k+1.
Thus we can store the heap in an n element array without pointers!
If we did not enforce the left constraint, we might have holes, and need room for elements to store n things.
This implicit representation of trees saves memory but is less flexible than using pointers. For this reason, we will not be able to use them when we discuss binary search trees.
Constructing Heaps
Heaps can be constructed incrementally, by inserting new elements into the left-most open spot in the array.
If the new element is greater than its parent, swap their positions and recur.
Since at each step, we replace the root of a subtree by a larger one, we preserve the heap order.
Since all but the last level is always filled, the height h of an n element heap is bounded because:
so .
Doing n such insertions takes , since each insertion takes at most time.
Deleting the Root
The smallest (or largest) element in the heap sits at the root.
Deleting the root can be done by replacing the root by the nth key (which must be a leaf) and letting it percolate down to its proper position!
The smallest element of (1) the root, (2) its left child, and (3) its right child is moved to the root. This leaves at most one of the two subtrees which is not in heap order, so we continue one level down.
After steps of O(1) time each, we reach a leaf, so the deletion is completed in time.
This percolate-down operation is called often Heapify, for it merges two heaps with a new root.
Heapsort
An initial heap can be constructed out on n elements by incremental insertion in time:
Build-heap(A)
for i = 2 to n do
HeapInsert(A[i], A)
Exchanging the maximum element with the last element and calling heapify repeatedly gives an sorting algorithm, named Heapsort.
Heapsort(A)
Build-heap(A)
for i = n to 1 do
swap(A[1],A[i])
n = n - 1
Heapify(A,1)
Advantages of heapsort include:
The Lesson of Heapsort
Always ask yourself, ``Can we use a different data structure?''
Selection sort scans throught the entire array, repeatedly finding the smallest remaining element.
For i = 1 to n
A: Find the smallest of the first n-i+1 items.
B: Pull it out of the array and put it first.
Using arrays or unsorted linked lists as the data structure, operation A takes O(n) time and operation B takes O(1).
Using heaps, both of these operations can be done within time, balancing the work and achieving a better tradeoff.
Priority Queues
A priority queue is a data structure on sets of keys supporting the operations: Insert(S, x) - insert x into set S, Maximum(S) - return the largest key in S, and ExtractMax(S) - return and remove the largest key in S
These operations can be easily supported using a heap.
Application: Heaps as stacks or queues
Both stacks and queues can be simulated by using a heap, when we add a new time field to each item and order the heap according it this time field.
This simulation is not as efficient as a normal stack/queue implementation, but it is a cute demonstration of the flexibility of a priority queue.
Discrete Event Simulations
In simulations of airports, parking lots, and jai-alai - priority queues can be used to maintain who goes next.
In a simulation, we often need to schedule events according to a clock. When someone is born, we may then immediately decide when they will die, and we will have to be reminded when to bury them!
The stack and queue orders are just special cases of orderings. In real life, certain people cut in line.
Sweepline Algorithms in Computational Geometry
In the priority queue, we will store the points we have not yet encountered, ordered by x coordinate. and push the line forward one stop at a time.
Greedy Algorithms
In greedy algorithms, we always pick the next thing which locally maximizes our score. By placing all the things in a priority queue and pulling them off in order, we can improve performance over linear search or sorting, particularly if the weights change.
Example: Sequential strips in triangulations.