In Spring 1996, I taught my Analysis of Algorithms course via EngiNet, the SUNY Stony Brook distance learning program. Each of my lectures that semester was videotaped, and the tapes made available to off-site students. I found it an enjoyable experience.
As an experiment in using the Internet for distance learning, we have digitized the complete audio of all 23 lectures, and have made this available on the WWW. We partitioned the full audio track into sound clips, each corresponding to one page of lecture notes, and linked them to the associated text and images.
In a real sense, listening to all the audio is analogous to sitting through a one-semester college course on algorithms! Properly compressed, the full semester's audio requires less than 300 megabytes of storage, which is much less than I would have imagined. The entire semesters lectures, over thirty hours of audio files, fit comfortably on The Algorithm Design Manual CD-ROM, which also includes a hypertext version of the book and a substantial amount of software.
Lecture Schedule
subject | topics | reading |
Preliminaries | Analyzing algorithms | 1-32 |
" | Asymptotic notation | 32-37 |
" | Recurrence relations | 53-64 |
Sorting | Heapsort | 140-150 |
" | Quicksort | 153-167 |
" | Linear Sorting | 172-182 |
Searching | Data structures | 200-215 |
" | Binary search trees | 244-245 |
" | Red-Black trees:insertion | 262-272 |
`` | Red-Black trees:deletion | 272-277 |
MIDTERM 1 | ||
Comb. Search | Backtracking | |
" | Elements of dynamic programming | 301-314 |
" | Examples of dynamic programming | 314-323 |
Graph Algorithms | Data structures | 465-477 |
for graphs | ||
" | Breadth/depth-first search | 477-483 |
" | Topological Sort/Connectivity | 485-493 |
" | Minimum Spanning Trees | 498-510 |
" | Single-source shortest paths | 514-532 |
" | All-pairs shortest paths | 550-563 |
MIDTERM 2 | ||
Intractability | P and NP | 916-928 |
" | NP-completeness | 929-939 |
" | NP-completeness proofs | 939-951 |
" | Further reductions | 951-960 |
" | Approximation algorithms | 964-974 |
" | Set cover / knapsack heuristics | 974-983 |
FINAL EXAM |
What Is An Algorithm?
Algorithms are the ideas behind computer programs.
An algorithm is the thing which stays the same whether the program is in Pascal running on a Cray in New York or is in BASIC running on a Macintosh in Kathmandu!
To be interesting, an algorithm has to solve a general, specified problem. An algorithmic problem is specified by describing the set of instances it must work on and what desired properties the output must have.
Example: Sorting
Output: the permutation (reordering) of the input sequence such as .
We seek algorithms which are correct and efficient.
Correctness
For sorting, this means even if (1) the input is already sorted, or (2) it contains repeated elements.
Correctness is Not Obvious!
The following problem arises often in manufacturing and transportation testing applications.
Suppose you have a robot arm equipped with a tool, say a soldering iron. To enable the robot arm to do a soldering job, we must construct an ordering of the contact points, so the robot visits (and solders) the first contact point, then visits the second point, third, and so forth until the job is done.
Since robots are expensive, we need to find the order which minimizes the time (ie. travel distance) it takes to assemble the circuit board.
Nearest Neighbor Tour
A very popular solution starts at some point and then walks to its nearest neighbor first, then repeats from , etc. until done.
Pick and visit an initial point
i = 0
While there are still unvisited points
i = i+1
Let be the closest unvisited point to
Visit
Return to from
This algorithm is simple to understand and implement and very efficient. However, it is not correct!
Closest Pair Tour
Always walking to the closest point is too restrictive, since that point might trap us into making moves we don't want.
Another idea would be to repeatedly connect the closest pair of points whose connection will not cause a cycle or a three-way branch to be formed, until we have a single chain with all the points in it.
Let n be the number of points in the set
For i=1 to n-1 do
For each pair of endpoints (x,y) of partial paths
If then
, , d = dist(x,y)
Connect by an edge
Connect the two endpoints by an edge.
Although it works correctly on the previous example, other data causes trouble:
A Correct Algorithm
We could try all possible orderings of the points, then select the ordering which minimizes the total length:
For each of the n! permutations of the n points
If then
and
Return
Since all possible orderings are considered, we are guaranteed to end up with the shortest possible tour.
Because it trys all n! permutations, it is extremely slow, much too slow to use when there are more than 10-20 points.
No efficient, correct algorithm exists for the traveling salesman problem, as we will see later.
Efficiency
"Why not just use a supercomputer?"
Supercomputers are for people too rich and too stupid to design efficient algorithms!
A faster algorithm running on a slower computer will always win for sufficiently large instances, as we shall see.
Usually, problems don't have to get that large before the faster algorithm wins.
Expressing Algorithms
In order of increasing precision, we have English, pseudocode, and real programming languages. Unfortunately, ease of expression moves in the reverse order.
I prefer to describe the ideas of an algorithm in English, moving to pseudocode to clarify sufficiently tricky details of the algorithm.
The RAM Model
Algorithms are the only important, durable, and original part of computer science because they can be studied in a machine and language independent way.
The reason is that we will do all our design and analysis for the RAM model of computation:
We measure the run time of an algorithm by counting the number of steps.
This model is useful and accurate in the same sense as the flat-earth model (which is useful)!
Best, Worst, and Average-Case
The worst case complexity of the algorithm is the function defined by the maximum number of steps taken on any instance of size n.
The average-case complexity of the algorithm is the function defined by an average number of steps taken on any instance of size n.
Each of these complexities defines a numerical function - time vs. size!
Insertion Sort
One way to sort an array of n elements is to start with empty list, then successively insert new elements in the proper position:
At each stage, the inserted element leaves a sorted list, and after n insertions contains exactly the right elements. Thus the algorithm must be correct.
But how efficient is it?
Note that the run time changes with the permutation instance! (even for a fixed size problem)
How does insertion sort do on sorted permutations?
How about unsorted permutations?
Exact Analysis of Insertion Sort
Count the number of times each line of pseudocode will be executed.
Line | InsertionSort(A) | #Inst. | #Exec. |
1 | for j:=2 to len. of A do | c1 | n |
2 | key:=A[j] | c2 | n-1 |
3 | /* put A[j] into A[1..j-1] */ | c3=0 | / |
4 | i:=j-1 | c4 | n-1 |
5 | while do | c5 | tj |
6 | A[i+1]:= A[i] | c6 | |
7 | i := i-1 | c7 | |
8 | A[i+1]:=key | c8 | n-1 |
Within the for statement, "key:=A[j]" is executed n-1 times.
Steps 5, 6, 7 are harder to count.
Let the number of elements that have to be slide right to insert the jth item.
Step 5 is executed times.
Step 6 is .
Add up the executed instructions for all pseudocode lines to get the run-time of the algorithm:
What are the ? They depend on the particular input.
Best Case
Hence, the best case time is
where C and D are constants.
Worst Case
Problem 1.2-6: How can we modify almost any algorithm to have a good best-case running time?
For sorting, we can check if the values are already ordered, and if so output them. For the traveling salesman, we can check if the points lie on a line, and if so output the points in that order.
The supercomputer people pull this trick on the linpack benchmarks!
Because it is usually very hard to compute the average running time, since we must somehow average over all the instances, we usually strive to analyze the worst case running time.
The worst case is usually fairly easy to analyze and often close to the average or real running time.
Exact Analysis is Hard!
We have agreed that the best, worst, and average case complexity of an algorithm is a numerical function of the size of the instances.
Thus it is usually cleaner and easier to talk about upper and lower bounds of the function.
This is where the dreaded big O notation comes in!
Since running our algorithm on a machine which is twice as fast will effect the running times by a multiplicative constant of 2 - we are going to have to ignore constant factors anyway.
Names of Bounding Functions
Now that we have clearly defined the complexity functions we are talking about, we can talk about upper and lower bounds on it:
Got it? C, , and are all constants independent of n.
All of these definitions imply a constant beyond which they are satisfied. We do not care about small values of n.
O, , and
(a) if there exist positive constants , , and such that to the right of , the value of f(n) always lies between and inclusive.
(b) f(n) = O(g(n)) if there are positive constants and c such that to the right of , the value of f(n) always lies on or below .
(c) if there are positive constants and c such that to the right of , the value of f(n) always lies on or above .
Asymptotic notation are as well as we can practically deal with complexity functions.
What does all this mean?
Think of the equality as meaning in the set of functions.
Note that time complexity is every bit as well defined a function as or you bank account as a function of time.
Testing Dominance
f(n) dominates g(n) if , which is the same as saying g(n)=o(f(n)).
Note the little-oh - it means ``grows strictly slower than''.
Knowing the dominance relation between common functions is important because we want algorithms whose time complexity is as low as possible in the hierarchy. If f(n) dominates g(n), f is much larger (ie. slower) than g.
dominates if a > b since
Complexity | 10 | 20 | 30 | 40 | 50 | 60 |
n | 0.00001 sec | 0.00002 sec | 0.00003 sec | 0.00004 sec | 0.00005 sec | 0.00006 sec |
0.0001 sec | 0.0004 sec | 0.0009 sec | 0.016 sec | 0.025 sec | 0.036 sec | |
0.001 sec | 0.008 sec | 0.027 sec | 0.064 sec | 0.125 sec | 0.216 sec | |
0.1 sec | 3.2 sec | 24.3 sec | 1.7 min | 5.2 min | 13.0 min | |
0.001 sec | 1.0 sec | 17.9 min | 12.7 days | 35.7 years | 366 cent | |
0.59 sec | 58 min | 6.5 years | 3855 cent | cent | cent |
Logarithms
It is important to understand deep in your bones what logarithms are and where they come from.
A logarithm is simply an inverse exponential function. Saying is equivalent to saying that .
Exponential functions, like the amount owed on a n year mortgage at an interest rate of per year, are functions which grow distressingly fast, as anyone who has tried to pay off a mortgage knows.
Thus inverse exponential functions, ie. logarithms, grow refreshingly slowly.
Binary search is an example of an algorithm. After each comparison, we can throw away half the possible number of keys. Thus twenty comparisons suffice to find any name in the million-name Manhattan phone book!
If you have an algorithm which runs in time, take it, because this is blindingly fast even on very large instances.
Properties of Logarithms
Recall the definition, .
Asymptotically, the base of the log does not matter:
Thus, , and note that is just a constant.
Asymptotically, any polynomial function of n does not matter:
since , and .
Any exponential dominates every polynomial. This is why we will seek to avoid exponential time algorithms.
Federal Sentencing Guidelines
2F1.1. Fraud and Deceit; Forgery; Offenses Involving Altered or Counterfeit Instruments other than Counterfeit Bearer Obligations of the United States.
(a) Base offense Level: 6
(b) Specific offense Characteristics
(1) If the loss exceeded $2,000, increase the offense level as follows:
Loss(Apply the Greatest) | Increase in Level |
(A) $2,000 or less | no increase |
(B) More than $2,000 | add 1 |
(C) More than $5,000 | add 2 |
(D) More than $10,000 | add 3 |
(E) More than $20,000 | add 4 |
(F) More than $40,000 | add 5 |
(G) More than $70,000 | add 6 |
(H) More than $120,000 | add 7 |
(I) More than $200,000 | add 8 |
(J) More than $350,000 | add 9 |
(K) More than $500,000 | add 10 |
(L) More than $800,000 | add 11 |
(M) More than $1,500,000 | add 12 |
(N) More than $2,500,000 | add 13 |
(O) More than $5,000,000 | add 14 |
(P) More than $10,000,000 | add 15 |
(Q) More than $20,000,000 | add 16 |
(R) More than $40,000,000 | add 17 |
(Q) More than $80,000,000 | add 18 |
The federal sentencing guidelines are designed to help judges be consistent in assigning punishment. The time-to-serve is a roughly linear function of the total level.
However, notice that the increase in level as a function of the amount of money you steal grows logarithmically in the amount of money stolen.
This very slow growth means it pays to commit one crime stealing a lot of money, rather than many small crimes adding up to the same amount of money, because the time to serve if you get caught is much less.
The Moral: ``if you are gonna do the crime, make it worth the time!''
Working with the Asymptotic Notation
Suppose and .
What do we know about g'(n) = f(n)+g(n)? Adding the bounding constants shows .
What do we know about g''(n) = f(n)-g(n)? Since the bounding constants don't necessary cancel,
We know nothing about the lower bounds on g'+g'' because we know nothing about lower bounds on f, g.
What do we know about g'(n) = f(n)+g(n)? Adding the lower bounding constants shows .
What do we know about g''(n) = f(n)-g(n)? We know nothing about the lower bound of this!
The Complexity of Songs
Suppose we want to sing a song which lasts for n units of time. Since n can be large, we want to memorize songs which require only a small amount of brain space, i.e. memory.
Let S(n) be the space complexity of a song which lasts for n units of time.
The amount of space we need to store a song can be measured in either the words or characters needed to memorize it. Note that the number of characters is since every word in a song is at most 34 letters long - Supercalifragilisticexpialidocious!
What bounds can we establish on S(n)?
The Refrain
Most popular songs have a refrain, which is a block of text which gets repeated after each stanza in the song:
Bye, bye Miss American pie
Drove my chevy to the levy but the levy was dry
Them good old boys were drinking whiskey and rye
Singing this will be the day that I die.
Refrains made a song easier to remember, since you memorize it once yet sing it O(n) times. But do they reduce the space complexity?
Not according to the big oh. If
Then the space complexity is still O(n) since it is only halved (if the verse-size = refrain-size):
The k Days of Christmas
To reduce S(n), we must structure the song differently.
Consider ``The k Days of Christmas''. All one must memorize is:
On the kth Day of Christmas, my true love gave to me,
On the First Day of Christmas, my true love gave to me,
a partridge in a pear tree
But the time it takes to sing it is
If , then , so .
100 Bottles of Beer
What do kids sing on really long car trips?
n bottles of beer on the wall,
n bottles of beer.
You take one down and pass it around
n-1 bottles of beer on the ball.
All you must remember in this song is this template of size , and the current value of n. The storage size for n depends on its value, but bits suffice.
This for this song, .
That's the way, uh-huh, uh-huh
I like it, uh-huh, huh
Reference: D. Knuth, `The Complexity of Songs', Comm. ACM, April 1984, pp.18-24
Problem 2.1-2: Show that for any real constants a and b, b > 0,
Note the need for absolute values.
Problem 2.1-4:
(a) Is ?
(b) Is ?
Is ?
Yes, if for all n
(b) Is
Is ?
note
Is ?
Is ?
No! Certainly for any constant c we can find an n such that this is not true.
Recurrence Relations
Many algorithms, particularly divide and conquer algorithms, have time complexities which are naturally modeled by recurrence relations.
A recurrence relation is an equation which is defined in terms of itself.
Why are recurrences good things?
Recursion is Mathematical Induction!
In both, we have general and boundary conditions, with the general condition breaking the problem into smaller and smaller pieces.
The initial or boundary condition terminate the recursion.
As we will see, induction provides a useful tool to solve recurrences - guess a solution and prove it by induction.
n | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
0 | 1 | 3 | 7 | 15 | 31 | 63 | 127 |
Prove by induction:
height6pt width4pt
Solving Recurrences
No general procedure for solving recurrence relations is known, which is why it is an art. My approach is:
Realize that linear, finite history, constant coefficient recurrences always can be solved
Consider , ,
It has history = 2, degree = 1, and coefficients of 2 and 1. Thus it can be solved mechanically! Proceed:
Systems like Mathematica and Maple have packages for doing this.
Guess a solution and prove by induction
To guess the solution, play around with small values for insight.
Note that you can do inductive proofs with the big-O's notations - just be sure you use it right.
Example: .
Show that for large enough c and n. Assume that it is true for n/2, then
Starting with basis cases T(2)=4, T(3)=5, lets us complete the proof for .
Try backsubstituting until you know what is going on
Also known as the iteration method. Plug the recurrence back into itself until you see a pattern.
Example: .
Try backsubstituting:
The term should now be obvious.
Although there are only terms before we get to T(1), it doesn't hurt to sum them all since this is a fast growing geometric series:
Recursion Trees
Drawing a picture of the backsubstitution process gives you a idea of what is going on.
We must keep track of two things - (1) the size of the remaining argument to the recurrence, and (2) the additive stuff to be accumulated during this call.
Example:
Although this tree has height , the total sum at each level decreases geometrically, so:
The recursion tree framework made this much easier to see than with algebraic backsubstitution.
See if you can use the Master theorem to provide an instant asymptotic solution
The Master Theorem: Let and b>1 be constants, let f(n) be a function, and let T(n) be defined on the nonnegative integers by the recurrence
where we interpret n/b as or . Then T(n) can be bounded asymptotically as follows:
Examples of the Master Theorem
Which case of the Master Theorem applies?
Reading from the equation, a=4, b=2, and f(n) = n.
Is ?
Yes, so case 1 applies and .
Reading from the equation, a=4, b=2, and .
Is ?
No, if , but it is true if , so case 2 applies and .
Reading from the equation, a=4, b=2, and .
Is ?
Yes, for , so case 3 might apply.
Is ?
Yes, for , so there exists a c < 1 to satisfy the regularity condition, so case 3 applies and .
Why should the Master Theorem be true?
Consider T(n) = a T(n/b) + f(n).
Suppose f(n) is small enough
Then we have a recursion tree where the only contribution is at the leaves.
There will be levels, with leaves at level l.
Suppose f(n) is large enough
Example: . In fact this holds unless !
In case 3 of the Master Theorem, the additive term dominates.
In case 2, both parts contribute equally, which is why the log pops up. It is (usually) what we want to have happen in a divide and conquer algorithm.
Famous Algorithms and their Recurrence
Matrix Multiplication
Since dwarfs , case 1 of the master theorem applies and .
This has been ``improved'' by more and more complicated recurrences until the current best in .
Polygon Triangulation
The simplest algorithm might be to try each pair of points and check if they see each other. If so, add the diagonal and recur on both halves, for a total of .
However, Chazelle gave an algorithm which runs in time. Since , by case 1 of the Master Theorem, Chazelle's algorithm is linear, ie. T(n) = O(n).
Sorting
Since but not , Case 2 of the Master Theorem applies and .
In case 2, the divide and merge steps balance out perfectly, as we usually hope for from a divide-and-conquer algorithm.
Mergesort Animations
Approaches to Algorithms Design
Incremental
A good example of this approach is insertion sort
Divide-and-Conquer
A good example of this approach is Mergesort.
4.2-2 Argue the solution to
is by appealing to the recursion tree.
The shortest path to a leaf occurs when we take the heavy branch each time. The height k is given by , meaning or .
The longest path to a leaf occurs when we take the light branch each time. The height k is given by , meaning or .
The problem asks to show that , meaning we are looking for a lower bound
On any full level, the additive terms sums to n. There are full levels. Thus
4.2-4 Use iteration to solve T(n) = T(n-a) + T(a) + n, where is a constant.
Why don't CS profs ever stop talking about sorting?!
You should have seen most of the algorithms - we will concentrate on the analysis.
Applications of Sorting
One reason why sorting is so important is that once a set of items is sorted, many other problems become easy.
Searching
Speeding up searching is perhaps the most important application of sorting.
Closest pair
Once the numbers are sorted, the closest pair will be next to each other in sorted order, so an O(n) linear scan completes the job.
Element uniqueness
Sort them and do a linear scan to check all adjacent pairs.
This is a special case of closest pair above.
Frequency distribution - Mode
Sort them and do a linear scan to measure the length of all adjacent runs.
Median and Selection
Once the keys are placed in sorted order in an array, the kth largest can be found in constant time by simply looking in the kth position of the array.
Convex hulls
Convex hulls are the most important building block for more sophisticated geometric algorithms.
Once you have the points sorted by x-coordinate, they can be inserted from left to right into the hull, since the rightmost point is always on the boundary.
Without sorting the points, we would have to check whether the point is inside or outside the current hull.
Adding a new rightmost point might cause others to be deleted.
Huffman codes
If you are trying to minimize the amount of space a text file is taking up, it is silly to assign each letter the same length (ie. one byte) code.
Example: e is more common than q, a is more common than z.
If we were storing English text, we would want a and e to have shorter codes than q and z.
To design the best possible code, the first and most important step is to sort the characters in order of frequency of use.
Character | Frequency | Code |
f | 5 | 1100 |
e | 9 | 1101 |
c | 12 | 100 |
b | 13 | 101 |
d | 16 | 111 |
a | 45 | 0 |
Selection Sort
A simple sorting algorithm is selection sort.
Sweep through all the elements to find the smallest item, then the smallest remaining item, etc. until the array is sorted.
Selection-sort(A)
for i = 1 to n
for j = i+1 to n
if (A[j] < A[i]) then swap(A[i],A[j])
It is clear this algorithm must be correct from an inductive argument, since the ith element is in its correct position.
It is clear that this algorithm takes time.
It is clear that the analysis of this algorithm cannot be improved because there will be n/2 iterations which will require at least n/2 comparisons each, so at least comparisons will be made. More careful analysis doubles this.
Thus selection sort runs in time.
Binary Heaps
A binary heap is defined to be a binary tree with a key in each node such that:
Conditions 1 and 2 specify shape of the tree, and condition 3 the labeling of the tree.
The ancestor relation in a heap defines a partial order on its elements, which means it is reflexive, anti-symmetric, and transitive.
Partial orders can be used to model heirarchies with incomplete information or equal-valued elements. One of my favorite games with my parents is fleshing out the partial order of ``big'' old-time movie stars.
The partial order defined by the heap structure is weaker than that of the total order, which explains
Constructing Heaps
Heaps can be constructed incrementally, by inserting new elements into the left-most open spot in the array.
If the new element is greater than its parent, swap their positions and recur.
Since at each step, we replace the root of a subtree by a larger one, we preserve the heap order.
Since all but the last level is always filled, the height h of an n element heap is bounded because:
so .
Doing n such insertions takes , since the last n/2 insertions require time each.
Heapify
The bottom up insertion algorithm gives a good way to build a heap, but Robert Floyd found a better way, using a merge procedure called heapify.
Given two heaps and a fresh element, they can be merged into one by making the new one the root and trickling down.
Build-heap(A)
n = |A|
For do
Heapify(A,i)
Heapify(A,i)
left = 2i
right = 2i+1
if then
max = left
else max = i
if and (A(right] > A[max]) then
max = right
if then
swap(A[i],A[max])
Heapify(A,max)
Rough Analysis of Heapify
Heapify on a subtree containing n nodes takes
The 2/3 comes from merging heaps whose levels differ by one. The last row could be exactly half filled. Besides, the asymptotic answer won't change so long the fraction is less than one.
Solve the recurrence using the Master Theorem.
Let a = 1, b= 3/2 and f(n) = 1.
Note that , since .
Thus Case 2 of the Master theorem applies.
where we interpret n/b to mean either or . Then T(n) can be bounded asymptotically as follows:
Exact Analysis of Heapify
In fact, Heapify performs better than , because most of the heaps we merge are extremely small.
In general, there are at most nodes of height h, so the cost of building a heap is:
Since this sum is not quite a geometric series, we can't apply the usual identity to get the sum. But it should be clear that the series converges.
Proof of Convergence
Series convergence is the ``free lunch'' of algorithm analysis.
The identify for the sum of a geometric series is
If we take the derivative of both sides, ...
Multiplying both sides of the equation by x gives the identity we need:
Substituting x = 1/2 gives a sum of 2, so Build-heap uses at most 2n comparisons and thus linear time.
The Lessons of Heapsort, I
"Are we doing a careful analysis? Might our algorithm be faster than it seems?"
Typically in our analysis, we will say that since we are doing at most x operations of at most y time each, the total time is O(x y).
However, if we overestimate too much, our bound may not be as tight as it should be!
Heapsort
Heapify can be used to construct a heap, using the observation that an isolated element forms a heap of size 1.
Heapsort(A)
Build-heap(A)
for i = n to 1 do
swap(A[1],A[i])
n = n - 1
Heapify(A,1)
If we construct our heap from bottom to top using Heapify, we do not have to do anything with the last n/2 elements.
With the implicit tree defined by array positions, (i.e. the ith position is the parent of the 2ith and (2i+1)st positions) the leaves start out as heaps.
Exchanging the maximum element with the last element and calling heapify repeatedly gives an sorting algorithm, named Heapsort.
Lecture Sound../sounds/lec4-17a.au
Heapsort Animations
The Lessons of Heapsort, II
Always ask yourself, ``Can we use a different data structure?''
Selection sort scans throught the entire array, repeatedly finding the smallest remaining element.
For i = 1 to n
A: Find the smallest of the first n-i+1 items.
B: Pull it out of the array and put it first.
Using arrays or unsorted linked lists as the data structure, operation A takes O(n) time and operation B takes O(1).
Using heaps, both of these operations can be done within time, balancing the work and achieving a better tradeoff.
Priority Queues
A priority queue is a data structure on sets of keys supporting the following operations:
These operations can be easily supported using a heap.
Applications of Priority Queues
Heaps as stacks or queues
Both stacks and queues can be simulated by using a heap, when we add a new time field to each item and order the heap according it this time field.
This simulation is not as efficient as a normal stack/queue implementation, but it is a cute demonstration of the flexibility of a priority queue.
Discrete Event Simulations
The stack and queue orders are just special cases of orderings. In real life, certain people cut in line.
Sweepline Algorithms in Computational Geometry
Greedy Algorithms
Example: Sequential strips in triangulations.
Danny Heep
4-2 Find the missing integer from 0 to n using O(n) ``is bit[j] in A[i]'' queries.
Also note, the problem is asking us to minimize the number of bits we read. We can spend as much time as we want doing other things provided we don't look at extra bits.
How can we find the last bit of the missing integer?
Ask all the n integers what their last bit is and see whether 0 or 1 is the bit which occurs less often than it is supposed to. That is the last bit of the missing integer!
How can we determine the second-to-last bit?
Ask the numbers which ended with the correct last bit! By analyzing the bit patterns of the numbers from 0 to n which end with this bit.
By recurring on the remaining candidate numbers, we get the answer in T(n) = T(n/2) + n =O(n), by the Master Theorem.
Quicksort
Although mergesort is , it is quite inconvenient for implementation with arrays, since we need space to merge.
In practice, the fastest sorting algorithm is Quicksort, which uses partitioning as its main idea.
Example: Pivot about 10.
17 12 6 19 23 8 5 10 - before
6 8 5 10 23 19 12 17 - after
Partitioning places all the elements less than the pivot in the left part of the array, and all elements greater than the pivot in the right part of the array. The pivot fits in the slot between them.
Note that the pivot element ends up in the correct place in the total order!
Partitioning the elements
Once we have selected a pivot element, we can partition the array in one linear scan, by maintaining three sections of the array: < pivot, > pivot, and unexplored.
Example: pivot about 10
| 17 12 6 19 23 8 5 | 10
| 5 12 6 19 23 8 | 17
5 | 12 6 19 23 8 | 17
5 | 8 6 19 23 | 12 17
5 8 | 6 19 23 | 12 17
5 8 6 | 19 23 | 12 17
5 8 6 | 23 | 19 12 17
5 8 6 ||23 19 12 17
5 8 6 10 19 12 17 23
As we scan from left to right, we move the left bound to the right when the element is less than the pivot, otherwise we swap it with the rightmost unexplored element and move the right bound one step closer to the left.
Since the partitioning step consists of at most n swaps, takes time linear in the number of keys. But what does it buy us?
Thus we can sort the elements to the left of the pivot and the right of the pivot independently!
This gives us a recursive sorting algorithm, since we can use the partitioning approach to sort each subproblem.
Quicksort Animations
Pseudocode
Sort(A)
Quicksort(A,1,n)
Quicksort(A, low, high)
if (low < high)
pivot-location = Partition(A,low,high)
Quicksort(A,low, pivot-location - 1)
Quicksort(A, pivot-location+1, high)
Partition(A,low,high)
pivot = A[low]
leftwall = low
for i = low+1 to high
if (A[i] < pivot) then
leftwall = leftwall+1
swap(A[i],A[leftwall])
swap(A[low],A[leftwall])
Best Case for Quicksort
Since each element ultimately ends up in the correct position, the algorithm correctly sorts. But how long does it take?
The best case for divide-and-conquer algorithms comes when we split the input as evenly as possible. Thus in the best case, each subproblem is of size n/2.
The partition step on each subproblem is linear in its size. Thus the total effort in partitioning the problems of size is O(n).
The recursion tree for the best case looks like this:
Worst Case for Quicksort
Suppose instead our pivot element splits the array as unequally as possible. Thus instead of n/2 elements in the smaller half, we get zero, meaning that the pivot element is the biggest or smallest element in the array.
Thus the worst case time for Quicksort is worse than Heapsort or Mergesort.
To justify its name, Quicksort had better be good in the average case. Showing this requires some fairly intricate analysis.
The divide and conquer principle applies to real life. If you will break a job into pieces, it is best to make the pieces of equal size!
Intuition: The Average Case for Quicksort
Suppose we pick the pivot element at random in an array of n keys.
Whenever the pivot element is from positions n/4 to 3n/4, the larger remaining subarray contains at most 3n/4 elements.
If we assume that the pivot element is always in this range, what is the maximum number of partitions we need to get from n elements down to 1 element?
What have we shown?
At most levels of decent partitions suffices to sort an array of n elements.
But how often when we pick an arbitrary element as pivot will it generate a decent partition?
Since any number ranked between n/4 and 3n/4 would make a decent pivot, we get one half the time on average.
If we need levels of decent partitions to finish the job, and half of random partitions are decent, then on average the recursion tree to quicksort the array has levels.
More careful analysis shows that the expected number of comparisons is .
Average-Case Analysis of Quicksort
To do a precise average-case analysis of quicksort, we formulate a recurrence given the exact expected time T(n):
Each possible pivot p is selected with equal probability. The number of comparisons needed to do the partition is n-1.
We will need one useful fact about the Harmonic numbers , namely
It is important to understand (1) where the recurrence relation comes from and (2) how the log comes out from the summation. The rest is just messy algebra.
rearranging the terms give us:
substituting gives
We are really interested in A(n), so
What is the Worst Case?
The worst case for Quicksort depends upon how we select our partition or pivot element. If we always select either the first or last element of the subarray, the worst-case occurs when the input is already sorted!
A B D F H J K
B D F H J K
D F H J K
F H J K
H J K
J K
K
Having the worst case occur when they are sorted or almost sorted is very bad, since that is likely to be the case in certain applications.
To eliminate this problem, pick a better pivot:
Whichever of these three rules we use, the worst case remains . However, because the worst case is no longer a natural order it is much more difficult to occur.
Is Quicksort really faster than Heapsort?
Since Heapsort is and selection sort is , there is no debate about which will be better for decent-sized files.
But how can we compare two algorithms to see which is faster? Using the RAM model and the big Oh notation, we can't!
When Quicksort is implemented well, it is typically 2-3 times faster than mergesort or heapsort. The primary reason is that the operations in the innermost loop are simpler. The best way to see this is to implement both and experiment with different inputs.
Since the difference between the two programs will be limited to a multiplicative constant factor, the details of how you program each algorithm will make a big difference.
If you don't want to believe me when I say Quicksort is faster, I won't argue with you. It is a question whose solution lies outside the tools we are using.
Randomization
Suppose you are writing a sorting program, to run on data given to you by your worst enemy. Quicksort is good on average, but bad on certain worst-case instances.
If you used Quicksort, what kind of data would your enemy give you to run it on? Exactly the worst-case instance, to make you look bad.
But instead of picking the median of three or the first element as pivot, suppose you picked the pivot element at random.
Now your enemy cannot design a worst-case instance to give to you, because no matter which data they give you, you would have the same probability of picking a good pivot!
Randomization is a very important and useful idea. By either picking a random pivot or scrambling the permutation before sorting it, we can say:
``With high probability, randomized quicksort runs in time.''
Where before, all we could say is:
``If you give me random input data, quicksort runs in expected time.''
Since the time bound how does not depend upon your input distribution, this means that unless we are extremely unlucky (as opposed to ill prepared or unpopular) we will certainly get good performance.
Randomization is a general tool to improve algorithms with bad worst-case but good average-case complexity.
The worst-case is still there, but we almost certainly won't see it.
7.1-2: Show that an n-element heap has height .
The height is defined as the number of edges in the longest simple path from the root.
Thus the height increases only when , or in other words when is an integer.
7.1-5 Is a reverse sorted array a heap?
In the array representation of a heap, the descendants of the ith element are the 2ith and (2i+1)th elements.
If A is sorted in reverse order, then implies that .
Since 2i > i and 2i+1 > i then and .
Thus by definition A is a heap!
Can we sort in better than ?
Any comparison-based sorting program can be thought of as defining a decision tree of possible executions.
Running the same program twice on the same permutation causes it to do exactly the same thing, but running it on different permutations of the same data causes a different sequence of comparisons to be made on each.
Once you believe this, a lower bound on the time complexity of sorting follows easily.
Since any two different permutations of n elements requires a different sequence of steps to sort, there must be at least n! different paths from the root to leaves in the decision tree, ie. at least n! different leaves in the tree.
Since only binary comparisons (less than or greater than) are used, the decision tree is a binary tree.
Since a binary tree of height h has at most leaves, we know , or .
By inspection , since the last n/2 terms of the product are each greater than n/2. By Sterling's approximation, a better bound is where e=2.718.
Non-Comparison-Based Sorting
All the sorting algorithms we have seen assume binary comparisons as the basic primative, questions of the form ``is x before y?''.
Suppose you were given a deck of playing cards to sort. Most likely you would set up 13 piles and put all cards with the same number in one pile.
A 2 3 4 5 6 7 8 9 10 J Q K
A 2 3 4 5 6 7 8 9 10 J Q K
A 2 3 4 5 6 7 8 9 10 J Q K
A 2 3 4 5 6 7 8 9 10 J Q K
With only a constant number of cards left in each pile, you can use insertion sort to order by suite and concatenate everything together.
If we could find the correct pile for each card in constant time, and each pile gets O(1) cards, this algorithm takes O(n) time.
Bucketsort
Suppose we are sorting n numbers from 1 to m, where we know the numbers are approximately uniformly distributed.
We can set up n buckets, each responsible for an interval of m/n numbers from 1 to m
If we use an array of buckets, each item gets mapped to the right bucket in O(1) time.
With uniformly distributed keys, the expected number of items per bucket is 1. Thus sorting each bucket takes O(1) time!
The total effort of bucketing, sorting buckets, and concatenating the sorted buckets together is O(n).
What happened to our lower bound!
We can use bucketsort effectively whenever we understand the distribution of the data.
However, bad things happen when we assume the wrong distribution.
Suppose in the previous example all the keys happened to be 1. After the bucketing phase, we have:
Problems like this are why we worry about the worst-case performance of algorithms!
Such distribution techniques can be used on strings instead of just numbers. The buckets will correspond to letter ranges instead of just number ranges.
The worst case ``shouldn't'' happen if we understand the distribution of our data.
Real World Distributions
Consider the distribution of names in a telephone book.
Either make sure you understand your data, or use a good worst-case or randomized algorithm!
The Shifflett's of Charlottesville
For comparison, note that there are seven Shifflett's (of various spellings) in the 1000 page Manhattan telephone directory.
Rules for Algorithm Design
The secret to successful algorithm design, and problem solving in general, is to make sure you ask the right questions. Below, I give a possible series of questions for you to ask yourself as you try to solve difficult algorithm design problems:
8.2-3 Argue that insertion sort is better than Quicksort for sorting checks
-- 5
In insertion sort, the cost of each insertion is the number of items which we have to jump over. In the check example, the expected number of moves per items is small, say c. We win if .
8.3-1 Why do we analyze the average-case performance of a randomized algorithm, instead of the worst-case?
8.3-2 How many calls are made to Random in randomized quicksort in the best and worst cases?
The number of partitions is in any run of quicksort!!
The reason - any binary tree with n leaves has n-1 internal nodes, each of which corresponds to a call to partition in the quicksort recursion tree.
Elementary Data Structures
``Mankind's progress is measured by the number of things we can do without thinking.''
Elementary data structures such as stacks, queues, lists, and heaps will be the ``of-the-shelf'' components we build our algorithm from. There are two aspects to any data structure:
The fact that we can describe the behavior of our data structures in terms of abstract operations explains why we can use them without thinking, while the fact that we have different implementation of the same abstract operations enables us to optimize performance.
Stacks and Queues
Sometimes, the order in which we retrieve data is independent of its content, being only a function of when it arrived.
A stack supports last-in, first-out operations: push and pop.
A queue supports first-in, first-out operations: enqueue and dequeue.
A deque is a double ended queue and supports all four operations: push, pop, enqueue, dequeue.
Lines in banks are based on queues, while food in my refrigerator is treated as a stack.
Both can be used to traverse a tree, but the order is completely different.
Stack Implementation
Although this implementation uses an array, a linked list would eliminate the need to declare the array size in advance.
STACK-EMPTY(S)
if top[S] = 0
then return TRUE
else return FALSE
PUSH(S, x)
POP(S)
if STACK-EMPTY(S)
then error ``underflow''
else
return S[top[S] + 1]
Queue Implementation
A circular queue implementation requires pointers to the head and tail elements, and wraps around to reuse array elements.
ENQUEUE(Q, x)
Q[tail[Q]] x
if tail[Q] = length[Q]
then tail[Q] 1
else tail[Q] tail[Q] + 1
DEQUEUE(Q)
x = Q[head[Q]]
if head[Q] = length[Q]
then head[Q] = 1
else head[Q] = head[Q] + 1
return x
A list-based implementation would eliminate the possibility of overflow.
All are O(1) time operations.
Dynamic Set Operations
Perhaps the most important class of data structures maintain a set of items, indexed by keys.
There are a variety of implementations of these dictionary operations, each of which yield different time bounds for various operations.
Pointer Based Implementation
We can maintain a dictionary in either a singly or doubly linked list.
Since the extra big-Oh costs of doubly-linkly lists is zero, we will usually assume they are, although it might not be necessary.
Singly linked to doubly-linked list is as a Conga line is to a Can-Can line.
Lecture Sound../sounds/lec7-8a.au
Array Based Sets
Unsorted Arrays
Sorted Arrays
What are the costs for a heap?
Unsorted List Implementation
LIST-SEARCH(L, k)
x = head[L]
while x <> NIL and key[x] <> k
do x = next[x]
return x
Note: the while loop might require two lines in some programming languages.
LIST-INSERT(L, x)
next[x] = head[L]
if head[L] <> NIL
then prev[head[L]] = x
head[L] = x
prev[x] = NIL
LIST-DELETE(L, x)
if prev[x] <> NIL
then next[prev[x]] = next[x]
else head[L] = next[x]
if next[x] <> NIL
then prev[next[x]] = prev[x]
Sentinels
Boundary conditions can be eliminated using a sentinel element which doesn't go away.
LIST-SEARCH'(L, k)
x = next[nil[L]]
while x <> NIL[L] and key[x] <> k
do x = next[x]
return x
LIST-INSERT'(L, x)
next[x] = next[nil[L]]
prev[next[nil[L]]] = x
next[nil[L]] = x
prev[x] = NIL[L]
LIST-DELETE'(L, x)
next[prev[x]] <> next[x]
next[prev[x]] = prev[x]
Hash Tables
Hash tables are a very practical way to maintain a dictionary. As with bucket sort, it assumes we know that the distribution of keys is fairly well-behaved.
The idea is simply that looking an item up in an array is once you have its index. A hash function is a mathematical function which maps keys to integers.
In bucket sort, our hash function mapped the key to a bucket based on the first letters of the key. ``Collisions'' were the set of keys mapped to the same bucket.
If the keys were uniformly distributed, then each bucket contains very few keys!
The resulting short lists were easily sorted, and could just as easily be searched!
Hash Functions
It is the job of the hash function to map keys to integers. A good hash function:
The first step is usually to map the key to a big integer, for example
This large number must be reduced to an integer whose size is between 1 and the size of our hash table.
One way is by , where M is best a large prime not too close to , which would just mask off the high bits.
This works on the same principle as a roulette wheel!
Good and Bad Hash functions
The first three digits of the Social Security Number
The Birthday Paradox
No matter how good our hash function is, we had better be prepared for collisions, because of the birthday paradox.
When m = 366, this probability sinks below 1/2 when N = 23 and to almost 0 when .
Collision Resolution by Chaining
The easiest approach is to let each element in the hash table be a pointer to a list of keys.
Chaining is easy, but devotes a considerable amount of memory to pointers, which could be used to make the table larger. Still, it is my preferred method.
Open Addressing
We can dispense with all these pointers by using an implicit reference derived from a simple function:
The reason for using a more complicated science is to avoid long runs from similarly hashed keys.
Deletion in an open addressing scheme is ugly, since removing one element can break a chain of insertions, making some elements inaccessible.
Performance on Set Operations
With either chaining or open addressing:
Pragmatically, a hash table is often the best data structure to maintain a dictionary. However, we will not use it much in proving the efficiency of our algorithms, since the worst-case time is unpredictable.
The best worst-case bounds come from balanced binary trees, such as red-black trees.
9.1-3 Show that there is no sorting algorithm which sorts at least instances in O(n) time.
Moral: there cannot be too many good cases for any sorting algorithm!
9.1-4 Show that the lower bound for sorting still holds with ternary comparisons.
So it goes for any constant base.
Binary Search Trees
``I think that I shall never see
a poem as lovely as a tree Poem's
are wrote by fools like me but only
G-d can make a tree ``
- Joyce Kilmer
Binary search trees provide a data structure which efficiently supports all six dictionary operations.
A binary tree is a rooted tree where each node contains at most two children.
Each child can be identified as either a left or right child.
Binary Search Trees
A binary search tree labels each node in a binary tree with a single key such that for any node x, and nodes in the left subtree of x have keys and all nodes in the right subtree of x have key's .
The search tree labeling enables us to find where any key is. Start at the root - if that is not the one we want, search either left or right depending upon whether what we want is or then the root.
Searching in a Binary Tree
Dictionary search operations are easy in binary trees ...
TREE-SEARCH(x, k)
if (x = NIL) and (k = key[x])
then return x
if (k < key[x])
then return TREE-SEARCH(left[x],k)
else return TREE-SEARCH(right[x],k)
The algorithm works because both the left and right subtrees of a binary search tree are binary search trees - recursive structure, recursive algorithm.
This takes time proportional to the height of the tree, O(h).
Maximum and Minimum
Where are the maximum and minimum elements in a binary tree?
TREE-MAXIMUM(X)
while
do x = right[x]
return x
TREE-MINIMUM(x)
while
do x = left[x]
return x
Both take time proportional to the height of the tree, O(h).
Where is the predecessor?
Where is the predecessor of a node in a tree, assuming all keys are distinct?
What if a node doesn't have children?
The proof of correctness comes from looking at the in-order traversal of the tree.
Tree-Successor(x)
if
then return Tree-Minimum(right[x])
while and (x = right[y])
do
return y
Tree predecessor/successor both run in time proportional to the height of the tree.
In-Order Traversal
Inorder-Tree-walk(x)
if (x <> NIL)
then Inorder-Tree-Walk(left[x])
print key[x]
Inorder-Tree-walk(right[x])
A-B-C-D-E-F-G-H
Tree Insertion
Do a binary search to find where it should be, then replace the termination NIL pointer with the new item.
Tree-insert(T,z)
y = NIL
x = root[T]
while
do y = x
if key[z] < key[x]
then x = left[x]
else x = right[x]
if y = NIL
then
else if key[z] < key[y]
then
else
y is maintained as the parent of x, since x eventually becomes NIL.
The final test establishes whether the NIL was a left or right turn from y.
Insertion takes time proportional to the height of the tree, O(h).
Tree Deletion
Deletion is somewhat more tricky than insertion, because the node to die may not be a leaf, and thus effect other nodes.
Case (a), where the node is a leaf, is simple - just NIL out the parents child pointer.
Case (b), where a node has one chld, the doomed node can just be cut out.
Case (c), relabel the node as its successor (which has at most one child when z has two children!) and delete the successor!
This implementation of deletion assumes parent pointers to make the code nicer, but if you had to save space they could be dispensed with by keeping the pointers on the search path stored in a stack.
Tree-Delete(T,z)
if (left[z] = NIL) or (right[z] = NIL)
then
else Tree-Successor(z)
if
then
else
if
then
if p[y] = NIL
then
else if (y = left[p[y]])
then
else
if (y <> z)
then
/* If y has other fields, copy them, too. */
return y
Lines 1-3 determine which node y is physically removed.
Lines 4-6 identify x as the non-nil decendant, if any.
Lines 7-8 give x a new parent.
Lines 9-10 modify the root node, if necessary
Lines 11-13 reattach the subtree, if necessary.
Lines 14-16 if the removed node is deleted, copy.
Conclusion: deletion takes time proportional to the height of the tree. Listening to Part 8-13
Balanced Search Trees
All six of our dictionary operations, when implemented with binary search trees, take O(h), where h is the height of the tree.
The best height we could hope to get is , if the tree was perfectly balanced, since
But if we get unlucky with our order of insertion or deletion, we could get linear height!
insert(a)
insert(b)
insert(c)
insert(d)
We can't easily use randomization - Why?
Perfectly Balanced Trees
Perfectly balanced trees require a lot of work to maintain:
Therefore, when we talk about "balanced" trees, we mean trees whose height is , so all dictionary operations (insert, delete, search, min/max, successor/predecessor) take time.
Red-Black trees are binary search trees where each node is assigned a color, where the coloring scheme helps us maintain the height as .
Red-Black Tree Definition
Red-black trees have the following properties:
What does this mean?
If the root of a red-black tree is black can we just color it red?
No! For one of its children might be red.
No! Because now all nodes may not have the same black height.
Red-Black Tree Height
Lemma: A red-black tree with n internal nodes has height at most .
Proof: Our strategy; first we bound the number of nodes in any subtree, then we bound the height of any subtree.
We claim that any subtree rooted at x has at least - 1 internal nodes, where bh(x) is the black height of node x.
Proof, by induction:
Now assume it is true for all tree with black height < bh(x).
If x is black, both subtrees have black height bh(x)-1. If x is red, the subtrees have black height bh(x).
Therefore, the number of internal nodes in any subtree is
Now, let h be the height of our red-black tree. At least half the nodes on any single path from root to leaf must be black if we ignore the root.
Thus and , so .
This implies that ,so . height6pt width4pt
Therefore red-black trees have height at most twice optimal. We have a balanced search tree if we can maintain the red-black tree structure under insertion and deletion.
11-1 For each of the four types of linked lists in the following table, what is the asymptotic worst-case running time for each dynamic-set operation listed?
singly | singly | doubly | doubly | |
unsorted | sorted | unsorted | sorted | |
Search(L, k) | O(N) | O(N) | O(N) | O(N)- |
Insert(L, x) | O(1) | O(N) | O(1) | O(N)- |
Delete(L, x) | O(N)* | O(N)* | O(1) | O(1) |
Successor(L, x) | O(N) | O(1) | O(N) | O(1) |
Predecessor(L, x) | O(N) | O(N) | O(N) | O(1) |
Minimum(L) | O(N) | O(1) | O(N) | O(1) |
Maximum(L) | O(N) | O(1)+ | O(N) | O(1)+ |
14.1-5 Describe a Red-Black tree with the largest and smallest ratio of red nodes.
Rotations
The basic restructuring step for binary search trees are left and right rotation:
Lecture Sound../sounds/lec10-3.1.au
LEFT-ROTATE(T,x)
(* Set y*)
(* Turn y's left into x's right*)
if left[y]= NIL
then
(* Link x's parent to y *)
if p[x] = NIL
then
else if x= left[p[x]]
then
else
Note the in-order property is preserved.
14.2-5 Show that any n-node tree can be transformed to any other using O(n) rotations (hint: convert to a right going chain).
First, observe that creating a right-going, for path from < and reversing the same construction gives a path from to .
Note that it will take at most n rotations to make the lowest valued key the root. Once it is root, all keys are to the right of it, so no more rotations need go through it to create a right-going chain. Repeating with the second lowest key, third, etc. gives that rotations suffice.
Now that if we try to create a completely balanced tree instead. To get the n/2 key to the root takes at most n rotations. Now each subtree has half the nodes and we can recur...
To get a linear algorithm, we must beware of trees like:
By picking the lowest node on the rightmost chain which has a left ancestor, we can add one node per rotation to the right most chain!
Red-Black Insertion
Since red-black trees have height, if we can preserve all properties of such trees under insertion/deletion, we have a balanced tree!
Suppose we just did a regular insertion. Under what conditions does it stay a red-black tree?
Since every insertion take places at a leaf, we will change a black NIL pointer to a node with two black NIL pointers.
How can we fix two reds in a row?
It depends upon our uncle's color:
Note that after the recoloring:
If we get all the way to the root, recall we can always color a red-black tree's root black. We always will, so initially it was black, and so this process terminates.
The Case of the Black Uncle
If our uncle was black, observe that all the nodes around us have to be black:
A double rotation can be required to set things up depending upon the left-right turn sequence, but the principle is the same.
DOUBLE ROTATION ILLUSTRATION
Pseudocode and Figures
Deletion from Red-Black Trees
Recall the three cases for deletion from a binary tree:
Case (a) The node to be deleted was a leaf;
Deletion Color Cases
Suppose the node we remove was red, do we still have a red-black tree?
Yes! No two reds will be together, and the black height for each leaf stays the same.
However, if the dead node y was black, we must give each of its decendants another black ancestor. If an appropriate node is red, we can simply color it black otherwise we must restructure.
Case (a) black NIL becomes ``double black'';
Case (b) red becomes black and black becomes ``double black'';
Case (c) red becomes black and black becomes ``double black''.
Our goal will be to recolor and restructure the tree so as to get rid of the ``double black'' node.
In setting up any case analysis, we must be sure that:
In the case analysis for red-black trees, the breakdown is:
Case 1: The double black node x has a red brother.
Case 2: x has a black brother and two black nephews.
Case 3: x has a black brother, and its left nephew is red and its right nephew is black.
Case 4: x has a black brother, and its right nephew is red (left nephew can be any color).
Conclusion
Red-Black trees let us implement all dictionary operations in . Further, in no case are more than 3 rotations done to rebalance. Certain very advanced data structures have data stored at nodes which requires a lot of work to adjust after a rotation -- red-black trees ensure it won't happen often.
Example: Each node represents the endpoint of a line, and is augmented with a list of segments in its subtree which it intersects.
We will not study such complicated structures, however.
Parallel Bubblesort
In order for me to give back your midterms, please form a line and sort yourselves in alphabetical order, from A to Z.
There is traditionally a strong correlation between the midterm grades and the number of daily problems attempted:
daily: 0, sum: 134, count: 3, avg: 44.67
daily: 1, sum: 0, count: 2, avg: 0.00
daily: 2, sum: 63, count: 1, avg: 63.00
daily: 3, sum: 194, count: 3, avg: 64.67
daily: 4, sum: 335, count: 5, avg: 67.00
daily: 5, sum: 489, count: 8, avg: 61.12
daily: 6, sum: 381, count: 6, avg: 63.50
daily: 7, sum: 432, count: 6, avg: 72.00
daily: 8, sum: 217, count: 3, avg: 72.33
daily: 9, sum: 293, count: 4, avg: 73.25
Combinatorial Search
We have seen how clever algorithms can reduce sorting from to . However, the stakes are even higher for combinatorially explosive problems:
The Traveling Salesman Problem
There is no known polynomial time algorithm (ie. for some fixed k) for this problem, so search-based algorithms are the only way to go if you need an optional solution.
But I want to use a Supercomputer
Moving to a faster computer can only buy you a relatively small improvement:
Can Eight Pieces Cover a Chess Board?
Consider the 8 main pieces in chess (king, queen, two rooks, two bishops, two knights). Can they be positioned on a chessboard so every square is threatened?
Of course, this is not an important problem, but we will use it as an example of how to attack a combinatorial search problem.
How many positions to test?
Picking a square for each piece gives us the bound:
Anything much larger than is unreasonable to search on a modest computer in a modest amount of time.
However, we can exploit symmetry to save work. With reflections along horizontal, vertical, and diagonal axis, the queen can go in only 10 non-equivallent positions.
Even better, we can restrict the white bishop to 16 spots and the queen to 16, while being certain that we get all distinct configurations.
Backtracking
Backtracking is a systematic way to go through all the possible configurations of a search space.
In the general case, we assume our solution is a vector where each element is selected from a finite ordered set ,
We build from a partial solution of length k and try to extend it by adding another element. After extending it, we will test whether what we have so far is still possible as a partial solution.
If it is still a candidate solution, great. If not, we delete and try the next element from :
Compute , the set of candidate first elements of v.
k = 1
While k > 0 do
While do (*advance*)
= an element in
if ( ) is solution, print!
k = k + 1
compute , the candidate kth elements given v.
k = k - 1 (*backtrack*)
Recursive Backtracking
Recursion can be used for elegant and easy implementation of backtracking.
Backtrack(a, k)
if a is a solution, print(a)
else {
k = k +1
compute
while do
= an element in
=
Backtrack(a, k)
}
Backtracking can easily be used to iterate through all subsets or permutations of a set.
Backtracking ensures correctness by enumerating all possibilities.
For backtracking to be efficient, we must prune the search space.
Constructing all Subsets
How many subsets are there of an n-element set?
To construct all subsets, set up an array/vector of n cells, where the value of is either true or false, signifying whether the ith item is or is not in the subset.
To use the notation of the general backtrack algorithm, , and v is a solution whenever .
What order will this generate the subsets of ?
Constructing all Permutations
How many permutations are there of an n-element set?
To construct all n! permutations, set up an array/vector of n cells, where the value of is an integer from 1 to n which has not appeared thus far in the vector, corresponding to the ith element of the permutation.
To use the notation of the general backtrack algorithm, , and v is a solution whenever .
The n-Queens Problem
The first use of pruning to deal with the combinatorial explosion was by the king who rewarded the fellow who discovered chess!
In the eight Queens, we prune whenever one queen threatens another. Listening to Part 11-11
Covering the Chess Board
In covering the chess board, we prune whenever we find there is a square which we cannot cover given the initial configuration!
Specifically, each piece can threaten a certain maximum number of squares (queen 27, king 8, rook 14, etc.) Whenever the number of unthreated squares exceeds the sum of the maximum number of coverage remaining in unplaced squares, we can prune.
As implemented by a graduate student project, this backtrack search eliminates of the search space, when the pieces are ordered by decreasing mobility.
With precomputing the list of possible moves, this program could search 1,000 positions per second. But this is too slow!
Although we might further speed the program by an order of magnitude, we need to prune more nodes!
By using a more clever algorithm, we eventually were able to prove no solution existed, in less than one day's worth of computing.
You too can fight the combinatorial explosion!
The Backtracking Contest: Bandwidth
The bandwidth problem takes as input a graph G, with n vertices and m edges (ie. pairs of vertices). The goal is to find a permutation of the vertices on the line which minimizes the maximum length of any edge.
The problem is NP-complete, meaning that it is exceedingly unlikely that you will be able to find an algorithm with polynomial worst-case running time. It remains NP-complete even for restricted classes of trees.
Since the goal of the problem is to find a permutation, a backtracking program which iterates through all the n! possible permutations and computes the length of the longest edge for each gives an easy algorithm. But the goal of this assignment is to find as practically good an algorithm as possible.
Rules of the Game
Producing Efficient Programs
When in doubt, keep it simple, stupid (KISS).
15.1-5 Given an element x in an n-node order-statistic binary tree and a natural number i, how can the ith successor of x be determined in time.
What we are interested in is Get(Rank(x)+i).
In an order statistic tree, each node x is labeled with the number of nodes contained in the subtree rooted in x.
Optimization Problems
In the algorithms we have studied so far, correctness tended to be easier than efficiency. In optimization problems, we are interested in finding a thing which maximizes or minimizes some function.
In designing algorithms for optimization problem - we must prove that the algorithm in fact gives the best possible solution.
Greedy algorithms, which makes the best local decision at each step, occasionally produce a global optimum - but you need a proof!
Dynamic Programming
Computing Fibonacci Numbers
Implementing it as a recursive procedure is easy but slow!
We keep calculating the same value over and over!
How slow is slow?
Thus , and since our recursion tree has 0 and 1 as leaves, means we have calls!
What about Dynamic Programming?
We can calculate in linear time by storing small values:
For i=1 to n
Moral: we traded space for time.
Dynamic programming is a technique for efficiently computing recurrences by storing partial results.
Once you understand dynamic programming, it is usually easier to reinvent certain algorithms than try to look them up!
Dynamic programming is best understood by looking at a bunch of different examples.
I have found dynamic programming to be one of the most useful algorithmic techniques in practice:
Multiplying a Sequence of Matrices
Suppose we want to multiply a long sequence of matrices .
Multiplying an matrix by a matrix (using the common algorithm) takes multiplications.
Matrix multiplication is not communitive, so we cannot permute the order of the matrices without changing the result.
Example
Consider , where A is , B is , C is , and D is .
There are three possible parenthesizations:
The order makes a big difference in real computation. How do we find the best order?
Let M(i,j) be the minimum number of multiplications necessary to compute .
The key observations are
A recurrence for this is:
If there are n matrices, there are n+1 dimensions.
A direct recursive implementation of this will be exponential, since there is a lot of duplicated work as in the Fibonacci recurrence.
Divide-and-conquer is seems efficient because there is no overlap, but ...
There are only substrings between 1 and n. Thus it requires only space to store the optimal cost for each of them.
We can represent all the possibilities in a triangle matrix. We can also store the value of k in another triangle matrix to reconstruct to order of the optimal parenthesisation.
The diagonal moves up to the right as the computation progresses. On each element of the kth diagonal |j-i| = k.
For the previous example:
Procedure MatrixOrder
for i=1 to n do M[i, j]=0
for diagonal=1 to n-1
for i=1 to n-diagonal do
j=i+diagonal
faster(i,j)=k
return [m(1, n)]
Procedure ShowOrder(i, j)
if (i=j) write ( )
else
k=factor(i, j)
write ``(''
ShowOrder(i, k)
write ``*''
ShowOrder (k+1, j)
write ``)''
A dynamic programming solution has three components:
Approximate String Matching
A common task in text editing is string matching - finding all occurrences of a word in a text.
Unfortunately, many words are mispelled. How can we search for the string closest to the pattern?
Let p be a pattern string and T a text string over the same alphabet.
A k-approximate match between P and T is a substring of T with at most k differences.
Differences may be:
Approximate Matching is important in genetics as well as spell checking.
A 3-Approximate Match
A match with one of each of three edit operations is:
P = unescessaraly
T = unnecessarily
Finding such a matching seems like a hard problem because we must figure out where you add blanks, but we can solve it with dynamic programming.
D[i, j] = the minimum number of differences between and the segment of T ending at j.
D[i, j] is the minimum of the three possible ways to extend smaller strings:
Once you accept the recurrence it is easy.
To fill each cell, we need only consider three other cells, not O(n) as in other examples. This means we need only store two rows of the table. The total time is O(mn).
Boundary conditions for string matching
What should the value of D[0,i] be, corresponding to the cost of matching the first i characters of the text with none of the pattern?
It depends. Are we doing string matching in the text or substring matching?
In both cases, D[i,0] = i, since we cannot excuse deleting the first i characters of the pattern without cost.
What do we return?
If we want the cost of comparing all of the pattern against all of the text, such as comparing the spelling of two words, all we are interested in is D[n,m].
But what if we want the cheapest match between the pattern anywhere in the text? Assuming the initialization for substring matching, we seek the cheapest matching of the full pattern ending anywhere in the text. This means the cost equals .
This only gives the cost of the optimal matching. The actual alignment - what got matched, substituted, and deleted - can be reconstructed from the pattern/text and table without an auxiliary storage, once we have identified the cell with the lowest cost.
How much space do we need?
Do we need to keep all O(mn) cells, since if we evaluate the recurrence filling in the columns of the matrix from left to right, we will never need more than two columns of cells to do what we need. Thus O(m) space is sufficient to evaluate the recurrence without changing the time complexity at all.
Unfortunately, because we won't have the full matrix we cannot reconstruct the alignment, as above.
Saving space in dynamic programming is very important. Since memory on any computer is limited, O(nm) space is more of a bottleneck than O(nm) time.
Fortunately, there is a clever divide-and-conquer algorithm which computes the actual alignment in O(nm) time and O(m) space.
16.3-5 Give an algorithm to find the longest montonically increasing sequence in a sequence of n numbers.
Ask yourself what would you like to know about the first n-1 elements to tell you the answer for the entire sequence?
Let be the length of the longest sequence ending with the ith character:
sequence | 5 | 2 | 8 | 7 | 3 | 1 | 6 | 4 | |
1 | 1 | 2 | 2 | 2 | 1 | 3 | 3 |
To find the longest sequence - we know it ends somewhere, so Length =
The Principle of Optimality
To use dynamic programming, the problem must observe the principle of optimality, that whatever the initial state is, remaining decisions must be optimal with regard the state following from the first decision.
Combinatorial problems may have this property but may use too much memory/time to be efficient.
Example: The Traveling Salesman Problem
Here there can be any subset of instead of any subinterval - hence exponential.
Still, with other ideas (some type of pruning or best-first search) it can be effective for combinatorial search.
When can you use Dynamic Programming?
Dynamic programming computes recurrences efficiently by storing partial results. Thus dynamic programming can only be efficient when there are not too many partial results to compute!
There are n! permutations of an n-element set - we cannot use dynamic programming to store the best solution for each subpermutation. There are subsets of an n-element set - we cannot use dynamic programming to store the best solution for each.
However, there are only n(n-1)/2 continguous substrings of a string, each described by a starting and ending point, so we can use it for string problems.
There are only n(n-1)/2 possible subtrees of a binary search tree, each described by a maximum and minimum key, so we can use it for optimizing binary search trees.
Dynamic programming works best on objects which are linearly ordered and cannot be rearranged - characters in a string, matrices in a chain, points around the boundary of a polygon, the left-to-right order of leaves in a search tree.
Whenever your objects are ordered in a left-to-right way, you should smell dynamic programming!
Minimum Length Triangulation
A triangulation of a polygon is a set of non-intersecting diagonals which partitions the polygon into diagonals.
We seek to find the minimum length triangulation. For a convex polygon, or part thereof:
Evaluation proceeds as in the matrix multiplication example - values of t, each of which takes O(j-i) time if we evaluate the sections in order of increasing size.
Dynamic Programming and High Density Bar Codes
Symbol Technology has developed a new design for bar codes, PDF-417 that has a capacity of several hundred bytes. What is the best way to encode text for this design?
Originally, Symbol used a greedy algorithm to encode a string, making local decisions only. We realized that for any prefix, you want an optimal encoding which might leave you in every possible mode.
Our simple dynamic programming algorithm improved to capacity of PDF-417 by an average of !
Dynamic Programming and Morphing
Morphing is the problem of creating a smooth series of intermediate images given a starting and ending image.
The key problem is establishing a correspondence between features in the two images. You want to morph an eye to an eye, not an ear to an ear.
We can do this matching on a line-by-line basis:
This algorithm was incorported into a morphing system, with the following results:
Problem Solving Techniques
Most important: make sure you understand exactly what the question is asking - if not, you have no hope of answer it!!
Never be afraid to ask for another explanation of a problem until it is clear.
Play around with the problem by constructing examples to get insight into it.
Ask yourself questions. Does the first idea which comes into my head work? If not, why not?
Am I using all information that I am given about the problem?
Read Polya's book How to Solve it.
16-1: The Euclidean traveling-salesman problem is the problem of determining the shortest closed tour that connects a given set of n points in the plane.
Bentley suggested simplifying the problem by restricting attention to bitonic tours, that is tours which start at the leftmost point, go strictly left to right to the rightmost point, and then go strictly right back to the starting point.
First of all, play with the problem. Why isn't it trivial?
Why will they let us assume that no two x-coordinates are the same? What does the hint mean? What happens if I scan from left to right?
If we scan from left to right, we get an open tour which uses all points to the left of our scan line.
Hey, I have got a recurrence! And look, the two parameters which describe my optimal tour are the two endpoints.
Let c[k,n] be the optimal cost partial tour where the two endpoints are k<n.
(when k < n-1)
c[0, 1]=d[0, 1]
But this doesn't quite give the tour, but just an open tour. We simply must figure where the last edge to n must go.
Graphs
A graph G consists of a set of vertices V together with a set E of vertex pairs or edges.
Graphs are important because any binary relation is a graph, so graphs can be used to represent essentially any relationship.
Example: A network of roads, with cities as vertices and roads between cities as edges.
The Friendship Graph
Consider a graph where the vertices are people, and there is an edge between two people if and only if they are friends.
What questions might we ask about the friendship graph?
A graph is undirected if (x,y) implies (y,x). Otherwise the graph is directed. The ``heard-of'' graph is directed since countless famous people have never heard of me! The ``had-sex-with'' graph is presumably undirected, since it requires a partner.
An edge of the form (x,x) is said to be a loop. If x is y's friend several times over, that could be modeled using multiedges, multiple edges between the same pair of vertices. A graph is said to be simple if it contains no loops and multiple edges.
A path is a sequence of edges connecting two vertices. Since Mel Brooks is my father's-sister's-husband's cousin, there is a path between me and him!
If I were trying to impress you with how tight I am with Mel Brooks, I would be much better off saying that Uncle Lenny knows him than to go into the details of how connected I am to Uncle Lenny. Thus we are often interested in the shortest path between two nodes.
A graph is connected if there is a path between any two vertices. A directed graph is strongly connected if there is a directed path between any two vertices.
The degree of a vertex is the number of edges adjacent to it.
A social clique is a group of mutual friends who all hang around together. A graph theoretic clique is a complete subgraph, where each vertex pair has an edge between them. Cliques are the densest possible subgraphs. Within the friendship graph, we would expect that large cliques correspond to workplaces, neighborhoods, religious organizations, schools, and the like.
A cycle is a path where the last vertex is adjacent to the first. A cycle in which no vertex repeats (such as 1-2-3-1 verus 1-2-3-2-1) is said to be simple. The shortest cycle in the graph defines its girth, while a simple cycle which passes through each vertex is said to be a Hamiltonian cycle.
Data Structures for Graphs
There are two main data structures used to represent graphs.
Adjacency Matrices
Can we save space if (1) the graph is undirected? (2) if the graph is sparse?
Adjacency Lists
Note that can be much less than n when the graph is sparse. If necessary, the two copies of each edge can be linked by a pointer to facilitate deletions.
Tradeoffs Between Adjacency Lists and Adjacency Matrices
Comparison | Winner |
Faster to test if (x, y) exists? | matrices |
Faster to find vertex degree? | lists |
Less memory on small graphs? | lists (m+n) vs. |
Less memory on big graphs? | matrices (small win) |
Edge insertion or deletion? | matrices O(1) |
Faster to traverse the graph? | lists m+n vs. |
Better for most problems? | lists |
Both representations are very useful and have different properties, although adjacency lists are probably better for most problems.
Traversing a Graph
One of the most fundamental graph problems is to traverse every edge and vertex in a graph. Applications include:
For efficiency, we must make sure we visit each edge at most twice.
For correctness, we must do the traversal in a systematic way so that we don't miss anything.
Since a maze is just a graph, such an algorithm must be powerful enough to enable us to get out of an arbitrary maze.
Marking Vertices
The idea in graph traversal is that we must mark each vertex when we first visit it, and keep track of what have not yet completely explored.
For each vertex, we can maintain two flags:
We must also maintain a structure containing all the vertices we have discovered but not yet completely explored.
Initially, only a single start vertex is considered to be discovered.
To completely explore a vertex, we look at each edge going out of it. For each edge which goes to an undiscovered vertex, we mark it discovered and add it to the list of work to do.
Note that regardless of what order we fetch the next vertex to explore, each edge is considered exactly twice, when each of its endpoints are explored.
Correctness of Graph Traversal
Every edge and vertex in the connected component is eventually visited.
Suppose not, ie. there exists a vertex which was unvisited whose neighbor was visited. This neighbor will eventually be explored so we would visit it:
Traversal Orders
The order we explore the vertices depends upon what kind of data structure is used:
The three possible colors of each node reflect if it is unvisited (white), visited but unexplored (grey) or completely explored (black).
Breadth-First Search
BFS(G,s)
for each vertex do
color[u] = white
, ie. the distance from s
p[u] = NIL, ie. the parent in the BFS tree
color[u] = grey
d[s] = 0
p[s] = NIL
while do
u = head[Q]
for each do
if color[v] = white then
color[v] = gray
d[v] = d[u] + 1
p[v] = u
enqueue[Q,v]
dequeue[Q]
color[u] = black
Depth-First Search
DFS has a neat recursive implementation which eliminates the need to explicitly use a stack.
Discovery and final times are sometimes a convenience to maintain.
DFS(G)
for each vertex do
color[u] = white
parent[u] = nil
time = 0
for each vertex do
if color[u] = white then DFS-VISIT[u]
Initialize each vertex in the main routine, then do a search from each connected component. BFS must also start from a vertex in each component to completely visit the graph.
DFS-VISIT[u]
color[u] = grey (*u had been white/undiscovered*)
discover[u] = time
time = time+1
for each do
if color[v] = white then
parent[v] = u
DFS-VISIT(v)
color[u] = black (*now finished with u*)
finish[u] = time
time = time+1
23.1-5 - The square of a directed graph G=(V,E) is the graph such that iff for some , both and ; ie. there is a path of exactly two edges.
Give efficient algorithms for both adjacency lists and matricies.
Since there are at most n intermediate vertices to check, and pairs of vertices to ask about, this takes time.
With adjacency lists, we have a list of all the edges in the graph. For a given edge (u,v), we can run through all the edges from v in O(n) time, and fill the results into an adjacency matrix of , which is initially empty.
It takes O(mn) to construct the edges, and to initialize and read the adjacency matrix, a total of O((n+m)n). Since unless the graph is disconnected, this is usually simplified to O(mn), and is faster than the previous algorithm on sparse graphs.
Why is it called the square of a graph? Because the square of the adjacency matrix is the adjacency matrix of the square! This provides a theoretically faster algorithm.
BFS Trees
If BFS is performed on a connected, undirected graph, a tree is defined by the edges involved with the discovery of new nodes:
The proof is by induction on the length of the shortest path from the root:
The key idea about DFS
A depth-first search of a graph organizes the edges of the graph in a precise way.
In a DFS of an undirected graph, we assign a direction to each edge, from the vertex which discover it:
In a DFS of a directed graph, no cross edge goes to a higher numbered or rightward vertex. Thus, no edge from 4 to 5 is possible:
Edge Classification for DFS
What about the other edges in the graph? Where can they go on a search?
Every edge is either:
DFS Trees
The reason DFS is so important is that it defines a very nice ordering to the edges of the graph.
In a DFS of an undirected graph, every edge is either a tree edge or a back edge.
Why? Suppose we have a forward edge. We would have encountered (4,1) when expanding 4, so this is a back edge.
Paths in search trees
Where is the shortest path in a DFS?
DFS gives a better approximation of the longest path than BFS.
Topological Sorting
A directed, acyclic graph is a directed graph with no directed cycles.
Only a DAG can have a topological sort.
Applications of Topological Sorting
Topological sorting is often useful in scheduling jobs in their proper sequence. In general, we can use it to order things given constraints, such as a set of left-right constraints on the positions of objects.
Example: Dressing schedule from CLR.
Example: Identifying errors in DNA fragment assembly.
Certain fragments are constrained to be to the left or right of other fragments, unless there are errors.
A DFS can test if a graph is a DAG (it is iff there are no back edges - forward edges are allowed for DFS on directed graph).
Algorithm
Theorem: Arranging vertices in decreasing order of DFS finishing time gives a topological sort of a DAG.
Proof: Consider any directed edge u,v, when we encounter it during the exploration of vertex u:
Thus we can do topological sorting in O(n+m) time.
Articulation Vertices
Suppose you are a terrorist, seeking to disrupt the telephone network. Which station do you blow up?
Clearly connectivity is an important concern in the design of any network.
Articulation vertices can be found in O(n(m+n)) - just delete each vertex to do a DFS on the remaining graph to see if it is connected.
A Faster O(n+m) DFS Algorithm
Theorem: In a DFS tree, a vertex v (other than the root) is an articulation vertex iff v is not a leaf and some subtree of v has no back edge incident until a proper ancestor of v.
Why? Deleting v must seperate a pair of vertices x and y. Because of the other tree edges, this cannot happen unless y is a decendant of v.
(2) Conditions v is a non-root articulation vertex. v separates any ancestor of v from any decendant in the appropriate subtree.
Actually implementing this test in O(n+m) is tricky - but believable once you accept this theorem.
23.2-6 Give an efficient algorithm to test if a graph is bipartite.
We can augment either BFS or DFS when we first discover a new vertex, color it opposited its parents, and for each other edge, check it doesn't link two vertices of the same color. The first vertex in any connected component can be red or black!
Bipartite graphs arise in many situations, and special algorithms are often available for them. What is the interpretation of a bipartite ``had-sex-with'' graph?
How would you break people into two groups such that no group contains a pair of people who hate each other?
23.4-3 Give an O(n) algorithm to test whether an undirected graph contains a cycle.
23.4-5 Show that you can topologically sort in O(n+m) by repeatedly deleting vertices of degree 0.
Time:
Strongly Connected Components
A directed graph is strongly connected iff there is a directed path between any two vertices.
The strongly connected components of a graph is a partition of the vertices into subsets (maximal) such that each subset is strongly connected.
This algorithm takes O(n+m), but why does it compute strongly connected components?
Lemma: If two vertices are in the same strong component, no path between them ever leaves the component.
Proof: Consider the first vertex v in the component to be discovered. Everything in the component is reachable from it, so we will traverse it before finishing with v.
What does DFS( , v) Do?
It tells you what vertices have directed paths to v, while DFS( ,v) tells what vertices have directed paths from v. But why must any vertex in the search tree of DFS( , v) also have a path from u?
Example of Strong Components Algorithm
5, 6, 8 can reach 5, oldest remaining is 7.
7 can reach 7, oldest remaining is 1.
1, 2, 3 can reach 1, oldest remaining is 4.
4 can reach 4.
24.2-6 Describe an efficent algorithm that, given an undirected graph G, determines a spanning tree G whose largest edge weight is minimum over all spanning trees of G.
Certainly! Since Krushal's algorithm considers the edges in order of increasing weight, and stops the moment these edges form a connected graph, the tree it gives must minimize the edge weight.
``Hey, but then why doesn't Prim's algorithm also work?''
It gives the same thing as Kruskal's algorithm, so it must be true that any minimum spanning tree minimizes the maximum edge weight!
Proof: Give me a MST and consider the largest edge weight,
Minimum Spanning Trees
A tree is a connected graph with no cycles. A spanning tree is a subgraph of G which has the same set of vertices of G and is a tree.
A minimum spanning tree of a weighted graph G is the spanning tree of G whose edges sum to minimum weight.
There can be more than one minimum spanning tree in a graph consider a graph with identical weight edges.
The minimum spanning tree problem has a long history - the first algorithm dates back at least to 1926!.
Minimum spanning tree is always taught in algorithm courses since (1) it arises in many applications, (2) it is an important example where greedy algorithms always give the optimal answer, and (3) Clever data structures are necessary to make it work.
In greedy algorithms, we make the decision of what next to do by selecting the best local option from all available choices - without regard to the global structure.
Applications of Minimum Spanning Trees
Minimum spanning trees are useful in constructing networks, by describing the way to connect a set of sites using the smallest total amount of wire. Much of the work on minimum spanning (and related Steiner) trees has been conducted by the phone company.
Minimum spanning trees provide a reasonable way for clustering points in space into natural groups.
When the cities are points in the Euclidean plane, the minimum spanning tree provides a good heuristic for traveling salesman problems. The optimum traveling salesman tour is at most twice the length of the minimum spanning tree.
Prim's Algorithm
If G is connected, every vertex will appear in the minimum spanning tree. If not, we can talk about a minimum spanning forest.
Prim's algorithm starts from one vertex and grows the rest of the tree an edge at a time.
As a greedy algorithm, which edge should we pick? The cheapest edge with which can grow the tree by one vertex without creating a cycle.
During execution we will label each vertex as either in the tree, fringe - meaning there exists an edge from a tree vertex, or unseen - meaning the vertex is more than one edge away.
Select an arbitrary vertex to start.
While (there are fringe vertices)
select minimum weight edge between tree and fringe
add the selected edge and vertex to the tree
Clearly this creates a spanning tree, since no cycle can be introduced via edges between tree and fringe vertices, but is it minimum?
Why is Prim's algorithm correct?
Don't be scared by the proof - the reason is really quite basic:
Theorem: Let G be a connected, weighted graph and let be a subset of the edges in a MST . Let V' be the vertices incident with edges in E'. If (x,y) is an edge of minimum weight such that and y is not in V', then is a subset of a minimum spanning tree.
Proof: If the edge is in T, this is trivial.
Suppose (x,y) is not in T Then there must be a path in T from x to y since T is connected. If (v,w) is the first edge on this path with one edge in V', if we delete it and replace it with (x, y) we get a spanning tree.
This tree must have smaller weight than T, since W(v,w)>W(x,y). Thus T could not have been the MST.
Thus we cannot go wrong with the greedy strategy the way we could with the traveling salesman problem.
But how fast is Prim's?
That depends on what data structures are used. In the simplest implementation, we can simply mark each vertex as tree and non-tree and search always from scratch:
Select an arbitrary vertex to start.
While (there are non-tree vertices)
select minimum weight edge between tree and fringe
add the selected edge and vertex to the tree
This can be done in O(n m) time, by doing a DFS or BFS to loop through all edges, with a constant time test per edge, and a total of n iterations.
Can we do faster? If so, we need to be able to identify fringe vertices and the minimum cost edge associated with it, fast. We will augment an adjacency list with fields maintaining fringe information.
Vertex:
Finding the minimum weight fringe-edge takes O(n) time - just bump through fringe list.
After adding a vertex to the tree, running through its adjacency list to update the cost of adding fringe vertices (there may be a cheaper way through the new vertex) can be done in O(n) time.
Total time is .
Kruskal's Algorithm
Since an easy lower bound argument shows that every edge must be looked at to find the minimum spanning tree, and the number of edges , Prim's algorithm is optimal in the worst case. Is that all she wrote?
The complexity of Prim's algorithm is independent of the number of edges. Can we do better with sparse graphs? Yes!
Kruskal's algorithm is also greedy. It repeatedly adds the smallest edge to the spanning tree that does not create a cycle. Obviously, this gives a spanning tree, but is it minimal?
Why is Kruskal's algorithm correct?
Theorem: Let G be a weighted graph and let . If E' is contained in a MST T and e is the smallest edge in E-E' which does not create a cycle, .
Proof: As before, suppose e is not in T. Adding e to T makes a cycle. Deleting another edge from this cycle leaves a connected graph, and if it is one from E-E' the cost of this tree goes down. Since such an edge exists, T could not be a MST.
How fast is Kruskal's algorithm?
What is the simplest implementation?
The total time is O(mn), but can we do better?
Kruskal's algorithm builds up connected components. Any edge where both vertices are in the same connected component create a cycle. Thus if we can maintain which vertices are in which component fast, we do not have test for cycles!
Put the edges in a heap
count=0
while (count < n-1) do
get next edge (v,w)
if (component (v) component(w))
add to T
component (v)=component(w)
If we can test components in , we can find the MST in !
Question: Is better than ?
Union-Find Programs
Our analysis that Kruskal's MST algorithm is requires a fast way to test whether an edge links two vertices in the same connected component.
Thus we need a data structure for maintaining sets which can test if two elements are in the same and merge two sets together. These can be implemented by UNION and FIND operations:
Is
t= Find
u= Find
Return (Is t=u?)
Make
Union(t, u)
Find returns the name of the set and Union sets the members of t to have the same name as u.
We are interested in minimizing the time it takes to execute any sequence of unions and finds.
A simple implementation is to represent each set as a tree, with pointers from a node to its parent. Each element is contained in a node, and the name of the set is the key at the root:
For i = 1 to n/2 do
UNION(i,i+1)
For i = 1 to n/2 do
FIND(1)
We want the limit the height of our trees which are effected by UNIONs. When we union, we can make the tree with fewer nodes the child.
Since the number of nodes is related to the height, the height of the final tree will increase only if both subtrees are of equal height!
Lemma: If Union(t,v) attaches the root of v as a subtree of t iff the number of nodes in t is greater than or equal to the number in v, after any sequence of unions, any tree with h/4 nodes has height at most .
Proof: By induction on the number of nodes k, k=1 has height 0.
Assume true to k-1 nodes. Let be the height of the tree
If , then .
Can we do better?
We can do unions and finds in , good enough for Kruskal's algorithm. But can we do better?
The ideal Union-Find tree has depth 1:
O(n)? Not quite ... Difficult analysis shows that it takes time, where is the inverse Ackerman function and number of atoms in the universe)=5.
25.1-1 Give two more shortest path trees for the following graph:
There are two choices for how to get to the third vertex x, both of which cost 5.
There are two choices for how to get to vertex v, both of which cost 9.
Lessons from the Backtracking contest
Winning Optimizations
Shortest Paths
Finding the shortest path between two nodes in a graph arises in many different applications:
Shortest Paths and Sentence Disambiguation
In our work on reconstructing text typed on an (overloaded) telephone keypad, we had to select which of many possible interpretations was most likely.
The final system worked extremely well - identifying over 99% of characters correctly based on grammatical and statistical constraints.
Dynamic programming (the Viterbi algorithm) can be used on the sentences to obtain the same results, by finding the shortest paths in the underlying DAG.
Finding Shortest Paths
In an unweighted graph, the cost of a path is just the number of edges on the shortest path, which can be found in O(n+m) time via breadth-first search.
In a weighted graph, the weight of a path between two vertices is the sum of the weights of the edges on a path.
BFS will not work on weighted graphs because sometimes visiting more edges can lead to shorter distance, ie. 1+1+1+1+1+1+1 < 10.
Note that there can be an exponential number of shortest paths between two nodes - so we cannot report all shortest paths efficiently.
Note that negative cost cycles render the problem of finding the shortest path meaningless, since you can always loop around the negative cost cycle more to reduce the cost of the path.
Thus in our discussions, we will assume that all edge weights are positive. Other algorithms deal correctly with negative cost edges.
Minimum spanning trees are uneffected by negative cost edges.
Dijkstra's Algorithm
We can use Dijkstra's algorithm to find the shortest path between any two vertices s and t in G.
The principle behind Dijkstra's algorithm is that if is the shortest path from s to t, then had better be the shortest path from s to x.
This suggests a dynamic programming-like strategy, where we store the distance from s to all nearby nodes, and use them to find the shortest path to more distant nodes.
The shortest path from s to s, d(s,s)=0. If all edge weights are positive, the smallest edge incident to s, say (s,x), defines d(s,x).
We can use an array to store the length of the shortest path to each node. Initialize each to to start.
Soon as we establish the shortest path from s to a new node x, we go through each of its incident edges to see if there is a better way from s to other nodes thru x.
for i=1 to n,
for each edge (s,v), dist[v]=d(s,v)
last=s
while ( )
select v such that
for each (v,x),
last=v
Complexity if we use adjacency lists and a Boolean array to mark what is known.
This is essentially the same as Prim's algorithm.
An implementation of Dijkstra's algorithm would be faster for sparse graphs, and comes from using a heap of the vertices (ordered by distance), and updating the distance to each vertex (if necessary) in time for each edge out from freshly known vertices.
Even better, follows from using Fibonacci heaps, since they permit one to do a decrease-key operation in O(1) amortized time.
All-Pairs Shortest Path
Notice that finding the shortest path between a pair of vertices (s,t) in worst case requires first finding the shortest path from s to all other vertices in the graph.
Many applications, such as finding the center or diameter of a graph, require finding the shortest path between all pairs of vertices.
We can run Dijkstra's algorithm n times (once from each possible start vertex) to solve all-pairs shortest path problem in . Can we do better?
Improving the complexity is an open question but there is a super-slick dynamic programming algorithm which also runs in .
Dynamic Programming and Shortest Paths
The four-step approach to dynamic programming is:
From the adjacency matrix, we can construct the following matrix:
, if and is not in E
D[i,j] = w(i,j), if
D[i,j] = 0, if i=j
This tells us the shortest path going through no intermediate nodes.
There are several ways to characterize the shortest path between two nodes in a graph. Note that the shortest path from i to j, , using at most M edges consists of the shortest path from i to k using at most M-1 edges + W(k, j) for some k.
This suggests that we can compute all-pair shortest path with an induction based on the number of edges in the optimal path.
Let be the length of the shortest path from i to j using at most m edges.
What is ?
What if we know for all i,j?
since w[k, k]=0
This gives us a recurrence, which we can evaluate in a bottom up fashion:
for i=1 to n
for j=1 to n
for k=1 to n
=Min( , )
This is an algorithm just like matrix multiplication, but it only goes from m to m+1 edges.
Since the shortest path between any two nodes must use at most n edges (unless we have negative cost cycles), we must repeat that procedure n times (m=1 to n) for an algorithm.
We can improve this to with the observation that any path using at most 2m edges is the function of paths using at most m edges each. This is just like computing . So a logarithmic number of multiplications suffice for exponentiation.
Although this is slick, observe that even is slower than running Dijkstra's algorithm starting from each vertex!
The Floyd-Warshall Algorithm
An alternate recurrence yields a more efficient dynamic programming formulation. Number the vertices from 1 to n.
Let be the shortest path from i to j using only vertices from 1, 2,..., k as possible intermediate vertices.
What is ? With no intermediate vertices, any path consists of at most one edge, so .
In general, adding a new vertex k+1 helps iff a path goes through it, so
Although this looks similar to the previous recurrence, it isn't. The following algorithm implements it:
for k=1 to n
for i=1 to n
for j=1 to n
This obviously runs in time, which asymptotically is no better than a calls to Dijkstra's algorithm. However, the loops are so tight and it is so short and simple that it runs better in practice by a constant factor.
The Theory of NP-Completeness
Several times this semester we have encountered problems for which we couldn't find efficient algorithms, such as the traveling salesman problem. We also couldn't prove an exponential time lower bound for the problem.
By the early 1970s, literally hundreds of problems were stuck in this limbo. The theory of NP-Compleness, developed by Stephen Cook and Richard Karp, provided the tools to show that all of these problems were really the same problem.
Polynomial vs. Exponential Time
n | f(n) = n | f(n) = n! | |||
10 | 0.01 s | 0.1 s | 1 s | 3.63 ms | |
20 | 0.02 s | 0.4 s | 1 ms | 77.1 years | |
30 | 0.03 s | 0.9 s | 1 sec | years | |
40 | 0.04 s | 1.6 s | 18.3 min | ||
50 | 0.05 s | 2.5 s | 13 days | ||
100 | 0.1 s | 10 s | years | ||
1,000 | 1.00 s | 1 ms |
The Main Idea
Suppose I gave you the following algorithm to solve the bandersnatch problem:
Bandersnatch(G)
Convert G to an instance of the Bo-billy problem Y.
Call the subroutine Bo-billy on Y to solve this instance.
Return the answer of Bo-billy(Y) as the answer to G.
Such a translation from instances of one type of problem to instances of another type such that answers are preserved is called a reduction.
Now suppose my reduction translates G to Y in O(P(n)):
The second argument is the idea we use to prove problems hard!
Convex Hull and Sorting
A nice example of a reduction goes from sorting numbers to the convex hull problem:
Since this parabola is convex, every point is on the convex hull. Further since neighboring points on the convex hull have neighboring x values, the convex hull returns the points sorted by x-coordinate, ie. the original numbers.
Sort(S)
For each , create point .
Call subroutine convex-hull on this point set.
From the leftmost point in the hull,
read off the points from left to right.
Creating and reading off the points takes O(n) time.
What does this mean? Recall the sorting lower bound of . If we could do convex hull in better than , we could sort faster than - which violates our lower bound.
Thus convex hull must take as well!!!
Observe that any convex hull algorithm also gives us a complicated but correct sorting algorithm as well.
What is a problem?
A problem is a general question, with parameters for the input and conditions on what is a satisfactory answer or solution.
An instance is a problem with the input parameters specified.
Example: The Traveling Salesman
Problem: Given a weighted graph G, what tour minimizes .
Instance: , , , , ,
A problem with answers restricted to yes and no is called a decision problem. Most interesting optimization problems can be phrased as decision problems which capture the essence of the computation.
Example: The Traveling Salesman Decision Problem.
Given a weighted graph G and integer k, does there exist a traveling salesman tour with cost k?
Using binary search and the decision version of the problem we can find the optimal TSP solution.
For convenience, from now on we will talk only about decision problems.
Note that there are many possible ways to encode the input graph: adjacency matrices, edge lists, etc. All reasonable encodings will be within polynomial size of each other.
The fact that we can ignore minor differences in encoding is important. We are concerned with the difference between algorithms which are polynomial and exponential in the size of the input.
Satisfiability
Consider the following logic problem:
Instance: A set V of variables and a set of clauses C over V.
Question: Does there exist a satisfying truth assignment for C?
Example 1: and
A clause is satisfied when at least one literal in it is TRUE. C is satisfied when TRUE.
Example 2: ,
Although you try, and you try, and you try and you try, you can get no satisfaction.
There is no satisfying assigment since must be FALSE (third clause), so must be FALSE (second clause), but then the first clause is unsatisfiable!
For various reasons, it is known that satisfiability is a hard problem. Every top-notch algorithm expert in the world (and countless other, lesser lights) have tried to come up with a fast algorithm to test whether a given set of clauses is satisfiable, but all have failed.
Further, many strange and impossible-to-believe things have been shown to be true if someone in fact did find a fast satisfiability algorithm.
Clearly, Satisfiability is in NP, since we can guess an assignment of TRUE, FALSE to the literals and check it in polynomial time.
P versus NP
The precise distinction between whether a problem is in P or NP is somewhat technical, requiring formal language theory and Turing machines to state correctly.
However, intuitively a problem is in P, (ie. polynomial) if it can be solved in time polynomial in the size of the input.
A problem is in NP if, given the answer, it is possible to verify that the answer is correct within time polynomial in the size of the input.
Example P - Is there a path from s to t in G of length less than k.
Example NP - Is there a TSP tour in G of length less than k. Given the tour, it is easy to add up the costs and convince me it is correct.
Example not NP - How many TSP tours are there in G of length less than k. Since there can be an exponential number of them, we cannot count them all in polynomial time.
Don't let this issue confuse you - the important idea here is of reductions as a way of proving hardness.
3-Satisfiability
Instance: A collection of clause C where each clause contains exactly 3 literals, boolean variable v.
Question: Is there a truth assignment to v so that each clause is satisfied?
Note that this is a more restricted problem than SAT. If 3-SAT is NP-complete, it implies SAT is NP-complete but not visa-versa, perhaps long clauses are what makes SAT difficult?!
After all, 1-Sat is trivial!
Theorem: 3-SAT is NP-Complete
Proof: 3-SAT is NP - given an assignment, just check that each clause is covered. To prove it is complete, a reduction from must be provided. We will transform each clause independantly based on its length.
Suppose the clause contains k literals.
, , , .
Note that the only way all four of these can be satisfied is if z is TRUE.
If none of the original variables in a clause are TRUE, there is no way to satisfy all of them using the additional variable:
But if any literal is TRUE, we have n-3 free variables and n-3 remaining 3-clauses, so we can satisfy each of them.
Since any SAT solution will also satisfy the 3-SAT instance and any 3-SAT solution sets variables giving a SAT solution - the problems are equivallent. If there were n clauses and m total literals in the SAT instance, this transform takes O(m) time, so SAT and 3-SAT.
Note that a slight modification to this construction would prove 4-SAT, or 5-SAT,... also NP-complete. However, it breaks down when we try to use it for 2-SAT, since there is no way to stuff anything into the chain of clauses. It turns out that resolution gives a polynomial time algorithm for 2-SAT.
Having at least 3-literals per clause is what makes the problem difficult. Now that we have shown 3-SAT is NP-complete, we may use it for further reductions. Since the set of 3-SAT instances is smaller and more regular than the SAT instances, it will be easier to use 3-SAT for future reductions. Remember the direction to reduction!
36.4-5 Give a polynomial-time algorithm to satisfy Boolean formulas in disjunctive normal form.
Why not use this reduction to give a polynomial-time algorithm for 3-SAT? The DNF formula can become exponentially large and hence the reduction cannot be done in polynomial time.
A Perpetual Point of Confusion
Note carefully the direction of the reduction.
We must transform every instance of a known NP-complete problem to an instance of the problem we are interested in. If we do the reduction the other way, all we get is a slow way to solve x, by using a subroutine which probably will take exponential time.
This always is confusing at first - it seems bass-ackwards. Make sure you understand the direction of reduction now - and think back to this when you get confused.
Integer Programming
Instance: A set v of integer variables, a set of inequalities over these variables, a function f(v) to maximize, and integer B.
Question: Does there exist an assignment of integers to v such that all inequalities are true and ?
Example:
A solution to this is , .
Example:
Since the maximum value of f(v) given the constraints is , there is no solution.
Theorem: Integer Programming is NP-Hard
Proof: By reduction from Satisfiability
Any set instance has boolean variables and clauses. Our Integer programming problem will have twice as many variables as the SAT instance, one for each variable and its compliment, as well as the following inequalities:
For each variable in the set problem, we will add the following constraints:
Both IP variables are restricted to values of 0 or 1, which makes them equivalent to boolean variables restricted to true/false.
Exactly one of the IP variables associated with a given sat variable is 1. This means that exactly one of and are true!
Thus at least one IP variable must be one in each clause! Thus satisfying the constraint is equivalent to satisfying the clause!
Our maximization function and bound are relatively unimportant: B=0.
Clearly this reduction can be done in polynomial time.
We must show:
In any SAT solution, a TRUE literal corresponds to a 1 in the IP, since if the expression is SATISFIED, at least one literal per clause in TRUE, so the sum in the inequality is 1.
Given a solution to this IP instance, all variables will be 0 or 1. Set the literals correspondly to 1 variable TRUE and the 0 to FALSE. No boolean variable and its complement will both be true, so it is a legal assignment with also must satisfy the clauses.
Neat, sweet, and NP-complete!
Things to Notice
36.5-2 - Given an integer matrix A, and in integer m-vector b, the 0-1 integer programming problem asks whether there is an integer n-vector x with elements in the set (0,1) such that . Prove that 0-1 integer programming is NP-hard (hint: reduce from 3-SAT).
Vertex Cover
Instance: A graph G=(V, E), and integer
Question: Is there a subset of at most k vertices such that every has at least one vertex in the subset?
Theorem: Vertex cover is NP-complete.
Proof: VC in in NP - guess a subset of vertices, count them, and show that each edge is covered.
To prove completeness, we show 3-SAT and VC. From a 3-SAT instance with n variables and C clauses, we construct a graph with 2N+3C vertices.
For each variable, we create two vertices connected by an edge:
At least two vertices per triangle must be in the cover to take care of edges in the triangle, for a total of at least 2C vertices.
Finally, we will connect each literal in the flat structure to the corresponding vertices in the triangles which share the same literal.
Claim: This graph will have a vertex cover of size N+2C if and only if the expression is satisfiable.
By the earlier analysis, any cover must have at least N+2C vertices. To show that our reduction is correct, we must show that:
Select the N vertices cooresponding to the TRUE literals to be in the cover. Since it is a satisfying truth assignment, at least one of the three cross edges associated with each clause must already be covered - pick the other two vertices to complete the cover.
Every vertex cover must contain n first stage vertices and 2C second stage vertices. Let the first stage vertices define the truth assignment.
To give the cover, at least one cross-edge must be covered, so the truth assignment satisfies.
For a cover to have N+2C vertices, all the cross edges must be incident on a selected vertex.
Let the N selected vertices from the first stage coorespond to TRUE literals. If there is a satisfying truth assignment, that means at least one of the three cross edges from each triangle is incident on a TRUE vertex.
By adding the other two vertices to the cover, we cover all edges associated with the clause.
Every SAT defines a cover and Every Cover Truth values for the SAT!
Example: , .
Starting from the Right Problem
As you can see, the reductions can be very clever and very complicated. While theoretically any NP-complete problem can be reduced to any other one, choosing the correct one makes finding a reduction much easier.
As you can see, the reductions can be very clever and complicated. While theoretically any NP-complete problem will do, choosing the correct one can make it much easier.
Maximum Clique
Question: Does the graph contain a clique of j vertices, ie. is there a subset of v of size j such that every pair of vertices in the subset defines an edge of G?
Example: this graph contains a clique of size 5.
When talking about graph problems, it is most natural to work from a graph problem - the only NP-complete one we have is vertex cover!
Theorem: Clique is NP-complete
Proof: If you take a graph and find its vertex cover, the remaining vertices form an independent set, meaning there are no edges between any two vertices in the independent set, for if there were such an edge the rest of the vertices could not be a vertex cover.
Thus finding the maximum independent set must be NP-complete!
In an independent set, there are no edges between two vertices. In a clique, there are always between two vertices. Thus if we complement a graph (have an edge iff there was no edge in the original graph), a clique becomes an independent set and an independent set becomes a Clique!
If VC is a vertex cover in G, then V-VC is a clique in G'. If C is a clique in G, V-C is a vertex cover in G'.
36.5-1 Prove that subgraph isomorphism is NP-complete.
Thus the following reduction suffices. Let G=G' and , the complete subgraph on k nodes.
Integer Partition (Subset Sum)
Instance: A set of integers S and a target integer t.
Problem: Is there a subset of S which adds up exactly to t?
Example: and T=3754
Answer: 1+16+64+256+1040+1093+1284 = T
Observe that integer partition is a number problem, as opposed to the graph and logic problems we have seen to date.
Theorem: Integer Partition is NP-complete.
Proof: First, we note that integer partition is in NP. Guess a subset of the input number and simply add them up.
To prove completeness, we show that vertex cover integer partition. We use a data structure called an incidence matrix to represent the graph G.
How many 1's are there in each column? Exactly two.
How many 1's in a particular row? Depends on the vertex degree.
The reduction from vertex cover will create n+m numbers from G.
The numbers from the vertices will be a base-4 realization of rows from the incidence matrix, plus a high order digit:
ie. becomes .
The numbers from the edges will be .
The target integer will be
Why? Each column (digit) represents an edge. We want a subset of vertices which covers each edge. We can only use k x vertex/numbers, because of the high order digit of the target.
We might get only one instance of each edge in a cover - but we are free to take extra edge/numbers to grab an extra 1 per column.
VC in G Integer Partition in S
Integer Partition in S VC in G
This subset of k vertex/numbers must contain at least one edge-list per column, since if not there is no way to account for the two in each column of the target integer, given that we can pick up at most one edge-list using the edge number. (Again, the prevention of carrys across digits prevents any other possibilites).
Neat, sweet, and NP-complete!
Notice that this reduction could not be performed in polynomial time if the number were written in unary 5=11111. Big numbers is what makes integer partition hard!
Hamiltonian Cycle
Instance: A graph G
Question: Does the graph contains a HC, i.e. an ordered of the vertices ?
This problem is intimately relates to the Traveling Salesman.
Question: Is there an ordering of the vertices of a weighted graph such that ?
Clearly, . Assign each edge in G weight 1, any edge not in G weight 2. This new graph has a Traveling Salesman tour of cost n iff the graph is Hamiltonian. Thus TSP is NP-complete if we can show HC is NP-complete.
Theorem: Hamiltonian Circuit is NP-complete
Proof: Clearly HC is in NP-guess a permutation and check it out. To show it is complete, we use vertex cover. A vertex cover instance consists of a graph and a constant k, the minimum size of an acceptable cover. We must construct another graph. Each edge in the initial graph will be represented by the following component:
We claim this graph has a HC iff G has a VC of size k.
Assume it starts at one of the k selector vertices. It must then go through one of the chains of gadgets until it reaches a different selector vertex.
Since the tour is a HC, all gadgets are traversed. The k chains correspond to the vertices in the cover.
Note that if both vertices associated with an edge are in the cover, the gadget will be traversal in two pieces - otherwise one chain suffices.
To avoid visiting a vertex more than once, each chain is associated with a selector vertex.
We can always add more vertices to the cover to bring it up to size k.
For each vertex in the cover, start traversing the chain. At each entry point to a gadget, check if the other vertex is in the cover and traverse the gadget accordingly.
Select the selector edges to complete the circuit.
Neat, sweet, and NP-complete.
To show that Longest Path or Hamiltonian Path is NP-complete, add start and stop vertices and distinguish the first and last selector vertices.
Other NP-complete Problems
Open: Graph Isomorphism, Composite Number, Minimum Length Triangulation.
Polynomial or Exponential?
Just changing a problem a little can make the difference between it being in P or NP-complete:
P | NP-complete | |
Shortest Path | Longest Path | |
Eulerian Circuit | Hamiltonian Circuit | |
Edge Cover | Vertex Cover |
Techniques for Proving NP-completeness
The Art of Proving Hardness
Proving that problems are hard is an skill. Once you get the hang of it, it is surprisingly straightforward and pleasurable to do. Indeed, the dirty little secret of NP-completeness proofs is that they are usually easier to recreate than explain, in the same way that it is usually easier to rewrite old code than the try to understand it.
I offer the following advice to those needing to prove the hardness of a given problem:
Never use the general traveling salesman problem (TSP) as a target problem. Instead, use TSP on instances restricted to the triangle inequality. Better, use Hamiltonian cycle, i.e. where all the weights are 1 or . Even better, use Hamiltonian path instead of cycle. Best of all, use Hamiltonian path on directed, planar graphs where each vertex has total degree 3. All of these problems are equally hard, and the more you can restrict the problem you are reducing, the less work your reduction has to do.
Don't be afraid to add extra constraints or freedoms in order to make your problem more general (at least temporarily).
Selecting the right source problem makes a big difference is how difficult it is to prove a problem hard. This is the first and easiest place to go wrong.
I usually consider four and only four problems as candidates for my hard source problem. Limiting them to four means that I know a lot about these problems - which variants of these problems are hard and which are soft. My favorites are:
You are trying to translate one problem into another, while making them stay the same as much as possible. The easiest way to do this is to be bold with your penalties, to punish anyone trying to deviate from your proposed solution. ``If you pick this, then you have to pick up this huge set which dooms you to lose.'' The sharper the consequences for doing what is undesired, the easier it is to prove if and only if.
You should be asking these kinds of questions. ``How can I force that either A or B but not both are chosen?'' ``How can I force that A is taken before B?'' ``How can I clean up the things I did not select?''
Sometimes the reason you cannot prove hardness is that there is an efficient algorithm to solve your problem! When you can't prove hardness, it likely pays to change your thinking at least for a little while to keep you honest.
Now watch me try it!
To demonstrate how one goes about proving a problem hard, I accept the challenge of showing how a proof can be built on the fly.
I need a volunteer to pick a random problem from the 400+ hard problems in the back of Garey and Johnson.
Dealing with NP-complete Problems
Option 1: Algorithm fast in the Average case
Option 2: Heuristics
Note that the theory of NP-completeness does not stipulate that it is hard to get close to the answer, only that it is hard to get the optimal answer.
Often, we can prove performance bounds on heuristics, that the resulting answer is within C times that of the optimal one.
36.5-5 Prove that Hamiltonian Path is NP-complete.
The easiest argument says that G contains a HP but no HC iff (x,y) in G such that adding edge (x, y) to G causes to have a HC, so calls to a HC function solves HP.
The cleanest proof modifies the VC and HC reduction from the book:
Approximating Vertex Cover
As we have seen, finding the minimum vertex cover is NP-complete. However, a very simple strategy (heuristic) can get us a cover at most twice that of the optimal.
While the graph has edges
pick an arbitrary edge v, u
add both u and v to the cover
delete all edges incident on either u and v
If the graph is represented by an adjacency list this can be implemented in O(m+n) time.
This heuristic must always produce cover, since an edge is only deleted when it is adjacent to a cover vertex.
Further, any cover uses at least half as many vertices as the greedy cover.
No two of these edges share a vertex. Therefore, any cover of just these edges must include one vertex per edge, or half the greedy cover!
Things to Notice
Example: Pick one of the two vertices instead of both (after all, the middle edge is already covered) The optimal cover is one vertex, the greedy heuristic is two vertices, while the new/bad heuristic can be as bad as n-1.
The Euclidean Traveling Salesman
In the traditional version of TSP - a salesman wants to plan a drive to visit all his customers exactly once and get back home.
Euclidean geometry satisfies the triangle inequality, .
TSP remains hard even when the distances are Euclidean distances in the plane.
However, we can approximate the optimal Euclidean TSP tour using minimum spanning trees.
Claim: the cost of a MST is a lower bound on the cost of a TSP tour.
Why? Deleting any edge from a TSP tour leaves a path, which is a tree of weight at least that of the MST!
If we were allowed to visit cities more than once, doing a depth-first traversal of a MST, and then walking out the tour specified is at most twice the cost of MST. Why? We will be using each edge exactly twice.
However, how can we avoid revisiting cities?
We can take a shortest path to the next unvisited vertex. The improved tour is 1-2-3-5-8-9-6-4-7-10-11-1. Because we replaced a chain of edges by the edge, the triangle inequality ensures the tour only gets shorter. Thus this is still within twice optimal!
37.1-3 Give an efficient greedy algorithm that finds an optimal vertex cover of a tree in linear time.
Every tree has at least two leaves, meaning that there is always an edge which is adjacent to a leaf. Which vertex can we never go wrong picking? The non-leaf, since it is the only one which can also cover other edges!
After trimming off the covered edges, we have a smaller tree. We can repeat the process until the tree as 0 or 1 edges. When the tree consists only of an isolated edge, pick either vertex.
All leaves can be identified and trimmed in O(n) time during a DFS.
Formal Languages and the Theory of NP-completeness
The theory of NP-completeness is based on formal languages and Turing machines, and so we will must work on a more abstract level than usual.
For a given alphabet of symbols 0, 1, &, we can form an infinite set of strings or words by arranging them in any order: `&10', `111111',`&&&', and `&'.
A subset of the set of strings over some alphabet is a formal language.
Formal language theory concerns the study of how powerful a machine you need to recognize whether a string is from a particular language.
Example: Is the string a binary representation of a even number? A simple finite machine can check if the last symbol is zero:
Observe that solving decision problems can be thought of as formal language recognition. The problem instances are encoded as strings and strings in the language if and only if the answer to the decision problem is YES!
What kind of machine is necessary to recognize this language? A Turing Machine!
A Turing machine has a finite-state-control (its program), a two way infinite tape (its memory) and a read-write head (its program counter)
So, where are we?
Each instance of an optimization or decision problem can be encoded as string on some alphabet. The set of all instances which return True for some problem define a language.
Hence, any problem which solves this problem is equivalent to a machine which recognizes whether an instance is in the language!
The goal of all this is going to be a formal way to talk about the set of problems which can be solved in polynomial time, and the set that cannot be.
Non-deterministic Turing Machines
Suppose we buy a guessing module peripherial for our Turing machine, which looks at a Turing machine program and problem instance and in polynomial time writes something it says is an answer. To convince ourselves it really is an answer, we can run another program to check it.
Ex: The Traveling Salesman Problem
The guessing module can easily write a permutation of the vertices in polynomial time. We can check if it is correct by summing up the weights of the special edges in the permutation and see that it is less than k.
The class of languages we can recognize in time polynomial in the length of the string or a non-deterministic Turing Machine is called NP.
Clearly, , since for any DTM program we can run it on a non-deterministic machine, ignore what the guessing module is doing, and it will just as fast.
P ?= NP
Observe that any NDTM program which takes time P(n) can simulated in time on a deterministic machine, by running the checking program times, once on each possible guessed string.
The $10,000 question is whether a polynomial time simulation exists, or in other words whether P=NP?. Do there exist languages which can be verified in polynomial time and still take exponential time on deterministic machines?
This is the most important question in computer science. Since proving an exponential time lower bound for a problem in NP would make us famous, we assume that we cannot do it.
What we can do is prove that it is at least as hard as any problem in NP. A problem in NP for which a polynomial time algorithm would imply all languages in NP are in P is called NP-complete.
Turing Machines and Cook's Theorem
Cook's Theorem proves that satisfiability is NP-complete by reducing all non-deterministic Turing machines to SAT.
Each Turing machine has access to a two-way infinite tape (read/write) and a finite state control, which serves as the program.
We know a problem is in NP if we have a NDTM program to solve it in worst-case time p[n], where p is a polynomial and n is the size of the input.
Cook's Theorem - Satisfiability is NP-complete!
Proof: We must show that any problem in NP is at least as hard as SAT. Any problem in NP has a non-deterministic TM program which solves it in polynomial time, specifically P(n).
We will take this program and create from it an instance of satisfiability such that it is satisfiable if and only if the input string was in the language.
Our transformation will use boolean variables to maintain the state of the TM:
Variable | Range | Intended meaning | |
Q[i, j] | At time i, M is in | ||
state | |||
H[i,j] | At time i, the read-write head | ||
is scanning tape square j | |||
S[i,j,k] | At time i, the contents of | ||
tape square j is symbol | |||
Note that there are literals, a polynomial number if p(n) is polynomial.
We will now have to add clauses to ensure that these variables takes or the values as in the TM computation.
The group 6 clauses enforce the transition function of the machine. If the read-write head is not on tape square j at time i, it doesn't change ....
There are literals and clauses in all, so the transformation is done in polynomial time!
Polynomial Time Reductions
A decision problem is NP-hard if the time complexity on a deterministic machine is within a polynomial factor of the complexity of any problem in NP.
A problem is NP-complete if it is NP-hard and in NP. Cook's theorem proved SATISFIABILITY was NP-hard by using a polynomial time reduction translating each problem in NP into an instance of SAT:
The proof of Cook's Theorem, while quite clever, was certainly difficult and complicated. We had to show that all problems in NP could be reduced to SAT to make sure we didn't miss a hard one.
But now that we have a known NP-complete problem in SAT. For any other problem, we can prove it NP-hard by polynomially transforming SAT to it!
Finding the Optimal Spouse
For example, if the input permutation is
we see (3,1,2) after three candidates.
Picking the first or last candidate gives us a probability of 1/n of getting the best.
Since we seek maximize our chances of getting the best, it never pays to pick someone who is not the best we have seen.
The optimal strategy is clearly to sample some fraction of the candidates, then pick the first one who is better than the best we have seen.
But what is the fraction?
For a given fraction 1/f, what is the probability of finding the best?
Suppose i+1 is the highest ranked person in the first n/f candidates. We win whenever the best candidate occurs before any number from 2 to i in the last n (1- 1/f) / f candidates.
There is a 1/i probability of that, so,
In fact, the optimal is obtained by sampling the first n/e candidates.
This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html all.tex.
The translation was initiated by Algorithms on Mon Jun 2 09:21:39 EDT 1997