CSE 373/548 - Analysis of Algorithms

Spring 1996

Steven Skiena
Department of Computer Science
SUNY Stony Brook

In Spring 1996, I taught my Analysis of Algorithms course via EngiNet, the SUNY Stony Brook distance learning program. Each of my lectures that semester was videotaped, and the tapes made available to off-site students. I found it an enjoyable experience.

As an experiment in using the Internet for distance learning, we have digitized the complete audio of all 23 lectures, and have made this available on the WWW. We partitioned the full audio track into sound clips, each corresponding to one page of lecture notes, and linked them to the associated text and images.

In a real sense, listening to all the audio is analogous to sitting through a one-semester college course on algorithms! Properly compressed, the full semester's audio requires less than 300 megabytes of storage, which is much less than I would have imagined. The entire semesters lectures, over thirty hours of audio files, fit comfortably on The Algorithm Design Manual CD-ROM, which also includes a hypertext version of the book and a substantial amount of software.


Lecture 1 - analyzing algorithms

Listening To Part 1-7

Lecture Schedule

subject topics reading
Preliminaries Analyzing algorithms 1-32
" Asymptotic notation 32-37
" Recurrence relations 53-64
Sorting Heapsort 140-150
" Quicksort 153-167
" Linear Sorting 172-182
Searching Data structures 200-215
" Binary search trees 244-245
" Red-Black trees:insertion 262-272
`` Red-Black trees:deletion 272-277
Comb. Search Backtracking
" Elements of dynamic programming 301-314
" Examples of dynamic programming 314-323
Graph Algorithms Data structures 465-477
for graphs
" Breadth/depth-first search 477-483
" Topological Sort/Connectivity 485-493
" Minimum Spanning Trees 498-510
" Single-source shortest paths 514-532
" All-pairs shortest paths 550-563
Intractability P and NP 916-928
" NP-completeness 929-939
" NP-completeness proofs 939-951
" Further reductions 951-960
" Approximation algorithms 964-974
" Set cover / knapsack heuristics 974-983

Listening To Part 1-8

What Is An Algorithm?

Algorithms are the ideas behind computer programs.  

An algorithm is the thing which stays the same whether the program is in Pascal running on a Cray in New York or is in BASIC running on a Macintosh in Kathmandu!

To be interesting, an algorithm has to solve a general, specified problem. An algorithmic problem is specified by describing the set of instances it must work on and what desired properties the output must have.  

Example: Sorting

Input: A sequence of N numbers tex2html_wrap_inline13209

Output: the permutation (reordering) of the input sequence such as tex2html_wrap_inline13211 .

We seek algorithms which are correct and efficient.


For any algorithm, we must prove that it always returns the desired output for all legal instances of the problem.  

For sorting, this means even if (1) the input is already sorted, or (2) it contains repeated elements.

Listening To Part 1-9

Correctness is Not Obvious!

The following problem arises often in manufacturing and transportation testing applications.

Suppose you have a robot arm equipped with a tool, say a soldering iron. To enable the robot arm to do a soldering job, we must construct an ordering of the contact points, so the robot visits (and solders) the first contact point, then visits the second point, third, and so forth until the job is done.   

Since robots are expensive, we need to find the order which minimizes the time (ie. travel distance) it takes to assemble the circuit board.

tex2html_wrap13295 tex2html_wrap13297
You are given the job to program the robot arm. Give me an algorithm to find the best tour!

Listening To Part 1-10

Nearest Neighbor Tour

A very popular solution starts at some point tex2html_wrap_inline13213 and then walks to its nearest neighbor tex2html_wrap_inline13215 first, then repeats from tex2html_wrap_inline13217 , etc. until done.  

Pick and visit an initial point tex2html_wrap_inline13219


i = 0

While there are still unvisited points

i = i+1

Let tex2html_wrap_inline13227 be the closest unvisited point to tex2html_wrap_inline13229

Visit tex2html_wrap_inline13231

Return to tex2html_wrap_inline13233 from tex2html_wrap_inline13235

This algorithm is simple to understand and implement and very efficient. However, it is not correct!

Always starting from the leftmost point or any other point will not fix the problem.

Listening To Part 1-11

Closest Pair Tour

Always walking to the closest point is too restrictive, since that point might trap us into making moves we don't want.  

Another idea would be to repeatedly connect the closest pair of points whose connection will not cause a cycle or a three-way branch to be formed, until we have a single chain with all the points in it.

Let n be the number of points in the set


For i=1 to n-1 do

For each pair of endpoints (x,y) of partial paths

If tex2html_wrap_inline13243 then

tex2html_wrap_inline13245 , tex2html_wrap_inline13247 , d = dist(x,y)

Connect tex2html_wrap_inline13251 by an edge

Connect the two endpoints by an edge.

Although it works correctly on the previous example, other data causes trouble:

tex2html_wrap13303 tex2html_wrap13305
This algorithm is not correct!

Listening To Part 1-12

A Correct Algorithm

We could try all possible orderings of the points, then select the ordering which minimizes the total length:  


For each of the n! permutations tex2html_wrap_inline13257 of the n points

If tex2html_wrap_inline13259 then

tex2html_wrap_inline13261 and tex2html_wrap_inline13263

Return tex2html_wrap_inline13265

Since all possible orderings are considered, we are guaranteed to end up with the shortest possible tour.

Because it trys all n! permutations, it is extremely slow, much too slow to use when there are more than 10-20 points.  

No efficient, correct algorithm exists for the traveling salesman problem, as we will see later.

Listening To Part 1-13


"Why not just use a supercomputer?"

Supercomputers are for people too rich and too stupid to design efficient algorithms!  

A faster algorithm running on a slower computer will always win for sufficiently large instances, as we shall see.

Usually, problems don't have to get that large before the faster algorithm wins.

Expressing Algorithms

We need some way to express the sequence of steps comprising an algorithm.

In order of increasing precision, we have English, pseudocode, and real programming languages. Unfortunately, ease of expression moves in the reverse order.

I prefer to describe the ideas of an algorithm in English, moving to pseudocode to clarify sufficiently tricky details of the algorithm.  

Listening To Part 1-14

The RAM Model

Algorithms are the only important, durable, and original part of computer science because they can be studied in a machine and language independent way.

The reason is that we will do all our design and analysis for the RAM model of computation:   

We measure the run time of an algorithm by counting the number of steps.

This model is useful and accurate in the same sense as the flat-earth model (which is useful)!  

Listening To Part 1-15

Best, Worst, and Average-Case

The worst case complexity of the algorithm is the function defined by the maximum number of steps taken on any instance of size n.  

The best case complexity of the algorithm is the function defined by the minimum number of steps taken on any instance of size n.  

The average-case complexity of the algorithm is the function defined by an average number of steps taken on any instance of size n.  

Each of these complexities defines a numerical function - time vs. size!

Insertion Sort

One way to sort an array of n elements is to start with tex2html_wrap_inline13269 empty list, then successively insert new elements in the proper position:  


At each stage, the inserted element leaves a sorted list, and after n insertions contains exactly the right elements. Thus the algorithm must be correct.

But how efficient is it?

Note that the run time changes with the permutation instance! (even for a fixed size problem)

How does insertion sort do on sorted permutations?

How about unsorted permutations?

Exact Analysis of Insertion Sort

Count the number of times each line of pseudocode will be executed.

Line InsertionSort(A) #Inst. #Exec.
1 for j:=2 to len. of A do c1 n
2 key:=A[j] c2 n-1
3 /* put A[j] into A[1..j-1] */ c3=0 /
4 i:=j-1 c4 n-1
5 while tex2html_wrap_inline13271 do c5 tj
6 A[i+1]:= A[i] c6
7 i := i-1 c7
8 A[i+1]:=key c8 n-1

The for statement is executed (n-1)+1 times (why?)

Within the for statement, "key:=A[j]" is executed n-1 times.

Steps 5, 6, 7 are harder to count.

Let tex2html_wrap_inline13275 the number of elements that have to be slide right to insert the jth item.

Step 5 is executed tex2html_wrap_inline13277 times.

Step 6 is tex2html_wrap_inline13279 .

Add up the executed instructions for all pseudocode lines to get the run-time of the algorithm:

tex2html_wrap_inline13281 tex2html_wrap_inline13283 tex2html_wrap_inline13285 tex2html_wrap_inline13287

What are the tex2html_wrap_inline13289 ? They depend on the particular input.

Best Case

If it's already sorted, all tex2html_wrap_inline13291 's are 1.

Hence, the best case time is


where C and D are constants.

Worst Case

If the input is sorted in descending order, we will have to slide all of the already-sorted elements, so tex2html_wrap_inline13293 , and step 5 is executed


Next: Lecture 2 - asymptotic notation Up: Table of contents Previous: None

Lecture 2 - asymptotic notation

Listening To Part 2-1

Problem 1.2-6:   How can we modify almost any algorithm to have a good best-case running time?

To improve the best case, all we have to do it to be able to solve one instance of each size efficiently. We could modify our algorithm to first test whether the input is the special instance we know how to solve, and then output the canned answer.

For sorting, we can check if the values are already ordered, and if so output them. For the traveling salesman, we can check if the points lie on a line, and if so output the points in that order.

The supercomputer people pull this trick on the linpack benchmarks!

Because it is so easy to cheat with the best case running time, we usually don't rely too much about it.

Because it is usually very hard to compute the average running time, since we must somehow average over all the instances, we usually strive to analyze the worst case running time.

The worst case is usually fairly easy to analyze and often close to the average or real running time.

Listening To Part 2-2

Exact Analysis is Hard!

We have agreed that the best, worst, and average case complexity of an algorithm is a numerical function of the size of the instances.

However, it is difficult to work with exactly because it is typically very complicated!

Thus it is usually cleaner and easier to talk about upper and lower bounds of the function.   

This is where the dreaded big O notation comes in!  

Since running our algorithm on a machine which is twice as fast will effect the running times by a multiplicative constant of 2 - we are going to have to ignore constant factors anyway.

Listening To Part 2-3

Names of Bounding Functions

Now that we have clearly defined the complexity functions we are talking about, we can talk about upper and lower bounds on it:   

Got it? C, tex2html_wrap_inline13367 , and tex2html_wrap_inline13369 are all constants independent of n.

All of these definitions imply a constant tex2html_wrap_inline13371 beyond which they are satisfied. We do not care about small values of n.

Listening To Part 2-4

O, tex2html_wrap_inline13373 , and tex2html_wrap_inline13375

The value of tex2html_wrap_inline13377 shown is the minimum possible value; any greater value would also work.

(a) tex2html_wrap_inline13379 if there exist positive constants tex2html_wrap_inline13381 , tex2html_wrap_inline13383 , and tex2html_wrap_inline13385 such that to the right of tex2html_wrap_inline13387 , the value of f(n) always lies between tex2html_wrap_inline13391 and tex2html_wrap_inline13393 inclusive.

(b) f(n) = O(g(n)) if there are positive constants tex2html_wrap_inline13397 and c such that to the right of tex2html_wrap_inline13399 , the value of f(n) always lies on or below tex2html_wrap_inline13403 .

(c) tex2html_wrap_inline13405 if there are positive constants tex2html_wrap_inline13407 and c such that to the right of tex2html_wrap_inline13409 , the value of f(n) always lies on or above tex2html_wrap_inline13413 .

Asymptotic notation tex2html_wrap_inline13415 are as well as we can practically deal with complexity functions.

Listening To Part 2-5

What does all this mean?




Think of the equality as meaning in the set of functions.

Note that time complexity is every bit as well defined a function as tex2html_wrap_inline13417 or you bank account as a function of time.

Listening To Part 2-6

Testing Dominance

f(n) dominates g(n) if tex2html_wrap_inline13423 , which is the same as saying g(n)=o(f(n)).  

Note the little-oh - it means ``grows strictly slower than''.

Knowing the dominance relation between common functions is important because we want algorithms whose time complexity is as low as possible in the hierarchy. If f(n) dominates g(n), f is much larger (ie. slower) than g.

Complexity 10 20 30 40 50 60
n 0.00001 sec 0.00002 sec 0.00003 sec 0.00004 sec 0.00005 sec 0.00006 sec
tex2html_wrap_inline13441 0.0001 sec 0.0004 sec 0.0009 sec 0.016 sec 0.025 sec 0.036 sec
tex2html_wrap_inline13443 0.001 sec 0.008 sec 0.027 sec 0.064 sec 0.125 sec 0.216 sec
tex2html_wrap_inline13445 0.1 sec 3.2 sec 24.3 sec 1.7 min 5.2 min 13.0 min
tex2html_wrap_inline13447 0.001 sec 1.0 sec 17.9 min 12.7 days 35.7 years 366 cent
tex2html_wrap_inline13449 0.59 sec 58 min 6.5 years 3855 cent tex2html_wrap_inline13451 cent tex2html_wrap_inline13453 cent

Listening To Part 2-7


It is important to understand deep in your bones what logarithms are and where they come from.   

A logarithm is simply an inverse exponential function. Saying tex2html_wrap_inline13455 is equivalent to saying that tex2html_wrap_inline13457 .

Exponential functions, like the amount owed on a n year mortgage at an interest rate of tex2html_wrap_inline13459 per year, are functions which grow distressingly fast, as anyone who has tried to pay off a mortgage knows.

Thus inverse exponential functions, ie. logarithms, grow refreshingly slowly.  

Binary search is an example of an tex2html_wrap_inline13461 algorithm. After each comparison, we can throw away half the possible number of keys. Thus twenty comparisons suffice to find any name in the million-name Manhattan phone book!

If you have an algorithm which runs in tex2html_wrap_inline13463 time, take it, because this is blindingly fast even on very large instances.

Listening To Part 2-8

Properties of Logarithms

Recall the definition, tex2html_wrap_inline13465 .

Asymptotically, the base of the log does not matter:



Thus, tex2html_wrap_inline13467 , and note that tex2html_wrap_inline13469 is just a constant.

Asymptotically, any polynomial function of n does not matter:

Note that


since tex2html_wrap_inline13471 , and tex2html_wrap_inline13473 .

Any exponential dominates every polynomial. This is why we will seek to avoid exponential time algorithms.

Listening To Part 2-9

Federal Sentencing Guidelines

2F1.1. Fraud and Deceit; Forgery; Offenses Involving Altered or Counterfeit Instruments other than Counterfeit Bearer Obligations of the United States.  

(a) Base offense Level: 6

(b) Specific offense Characteristics

(1) If the loss exceeded $2,000, increase the offense level as follows:

Loss(Apply the Greatest) Increase in Level
(A) $2,000 or less no increase
(B) More than $2,000 add 1
(C) More than $5,000 add 2
(D) More than $10,000 add 3
(E) More than $20,000 add 4
(F) More than $40,000 add 5
(G) More than $70,000 add 6
(H) More than $120,000 add 7
(I) More than $200,000 add 8
(J) More than $350,000 add 9
(K) More than $500,000 add 10
(L) More than $800,000 add 11
(M) More than $1,500,000 add 12
(N) More than $2,500,000 add 13
(O) More than $5,000,000 add 14
(P) More than $10,000,000 add 15
(Q) More than $20,000,000 add 16
(R) More than $40,000,000 add 17
(Q) More than $80,000,000 add 18

Listening To Part 2-10

The federal sentencing guidelines are designed to help judges be consistent in assigning punishment. The time-to-serve is a roughly linear function of the total level.

However, notice that the increase in level as a function of the amount of money you steal grows logarithmically in the amount of money stolen.  

This very slow growth means it pays to commit one crime stealing a lot of money, rather than many small crimes adding up to the same amount of money, because the time to serve if you get caught is much less.

The Moral: ``if you are gonna do the crime, make it worth the time!''

Listening To Part 2-11

Working with the Asymptotic Notation

Suppose tex2html_wrap_inline13475 and tex2html_wrap_inline13477 .  

What do we know about g'(n) = f(n)+g(n)? Adding the bounding constants shows tex2html_wrap_inline13481 .

What do we know about g''(n) = f(n)-g(n)? Since the bounding constants don't necessary cancel, tex2html_wrap_inline13485

We know nothing about the lower bounds on g'+g'' because we know nothing about lower bounds on f, g.

Suppose tex2html_wrap_inline13489 and tex2html_wrap_inline13491 .

What do we know about g'(n) = f(n)+g(n)? Adding the lower bounding constants shows tex2html_wrap_inline13495 .

What do we know about g''(n) = f(n)-g(n)? We know nothing about the lower bound of this!

Listening To Part 2-12

The Complexity of Songs

Suppose we want to sing a song which lasts for n units of time. Since n can be large, we want to memorize songs which require only a small amount of brain space, i.e. memory.    

Let S(n) be the space complexity of a song which lasts for n units of time.

The amount of space we need to store a song can be measured in either the words or characters needed to memorize it. Note that the number of characters is tex2html_wrap_inline13501 since every word in a song is at most 34 letters long - Supercalifragilisticexpialidocious!

What bounds can we establish on S(n)?

Listening To Part 2-13

The Refrain

Most popular songs have a refrain, which is a block of text which gets repeated after each stanza in the song:  

Bye, bye Miss American pie
Drove my chevy to the levy but the levy was dry
Them good old boys were drinking whiskey and rye
Singing this will be the day that I die.

Refrains made a song easier to remember, since you memorize it once yet sing it O(n) times. But do they reduce the space complexity?

Not according to the big oh. If


Then the space complexity is still O(n) since it is only halved (if the verse-size = refrain-size):


Listening To Part 2-14

The k Days of Christmas

To reduce S(n), we must structure the song differently.

Consider ``The k Days of Christmas''. All one must memorize is:

On the kth Day of Christmas, my true love gave to me, tex2html_wrap_inline13515
On the First Day of Christmas, my true love gave to me, a partridge in a pear tree

But the time it takes to sing it is


If tex2html_wrap_inline13519 , then tex2html_wrap_inline13521 , so tex2html_wrap_inline13523 .

Listening To Part 2-15

100 Bottles of Beer

What do kids sing on really long car trips?

n bottles of beer on the wall,
n bottles of beer.
You take one down and pass it around
n-1 bottles of beer on the ball.

All you must remember in this song is this template of size tex2html_wrap_inline13525 , and the current value of n. The storage size for n depends on its value, but tex2html_wrap_inline13527 bits suffice.

This for this song, tex2html_wrap_inline13529 .

Is there a song which eliminates even the need to count?

That's the way, uh-huh, uh-huh
I like it, uh-huh, huh

Reference: D. Knuth, `The Complexity of Songs', Comm. ACM, April 1984, pp.18-24

Next: Lecture 3 - recurrence Up: Table of contents Previous: Lecture 1 - analyzing

Lecture 3 - recurrence relations

Listening To Part 3-1

Problem 2.1-2: Show that for any real constants a and b, b > 0,  


To show tex2html_wrap_inline13580 , we must show O and tex2html_wrap_inline13582 . Go back to the definition!

Note the need for absolute values.

Listening To Part 3-2

Problem 2.1-4:

(a) Is tex2html_wrap_inline13606 ?

(b) Is tex2html_wrap_inline13608 ?

(a) Is tex2html_wrap_inline13610 ?

Is tex2html_wrap_inline13612 ?

Yes, if tex2html_wrap_inline13614 for all n

(b) Is tex2html_wrap_inline13616

Is tex2html_wrap_inline13618 ?

note tex2html_wrap_inline13620

Is tex2html_wrap_inline13622 ?

Is tex2html_wrap_inline13624 ?

No! Certainly for any constant c we can find an n such that this is not true.

Listening To Part 3-3

Recurrence Relations

Many algorithms, particularly divide and conquer algorithms, have time complexities which are naturally modeled by recurrence relations.  

A recurrence relation is an equation which is defined in terms of itself.

Why are recurrences good things?

  1. Many natural functions are easily expressed as recurrences:




  2. It is often easy to find a recurrence as the solution of a counting problem. Solving the recurrence can be done for many special cases as we will see, although it is somewhat of an art.

Listening To Part 3-4

Recursion is Mathematical Induction!

In both, we have general and boundary conditions, with the general condition breaking the problem into smaller and smaller pieces.   

The initial or boundary condition terminate the recursion.  

As we will see, induction provides a useful tool to solve recurrences - guess a solution and prove it by induction.


n 0 1 2 3 4 5 6 7
tex2html_wrap_inline13626 0 1 3 7 15 31 63 127

Guess what the solution is?

Prove tex2html_wrap_inline13628 by induction:

  1. Show that the basis is true: tex2html_wrap_inline13630 .
  2. Now assume true for tex2html_wrap_inline13632 .
  3. Using this assumption show:


height6pt width4pt

Listening To Part 3-5

Solving Recurrences

No general procedure for solving recurrence relations is known, which is why it is an art. My approach is:  

Realize that linear, finite history, constant coefficient recurrences always can be solved

Check out any combinatorics or differential equations book for a procedure.

Consider tex2html_wrap_inline13634 , tex2html_wrap_inline13636 , tex2html_wrap_inline13638

It has history = 2, degree = 1, and coefficients of 2 and 1. Thus it can be solved mechanically! Proceed:


Systems like Mathematica and Maple have packages for doing this.   

Listening To Part 3-6

Guess a solution and prove by induction

To guess the solution, play around with small values for insight.

Note that you can do inductive proofs with the big-O's notations - just be sure you use it right.  

Example: tex2html_wrap_inline13640 .

Show that tex2html_wrap_inline13642 for large enough c and n. Assume that it is true for n/2, then


Starting with basis cases T(2)=4, T(3)=5, lets us complete the proof for tex2html_wrap_inline13650 .

Listening To Part 3-7

Try backsubstituting until you know what is going on

Also known as the iteration method. Plug the recurrence back into itself until you see a pattern.  

Example: tex2html_wrap_inline13652 .

Try backsubstituting:


The tex2html_wrap_inline13654 term should now be obvious.

Although there are only tex2html_wrap_inline13656 terms before we get to T(1), it doesn't hurt to sum them all since this is a fast growing geometric series:



Listening To Part 3-8

Recursion Trees

Drawing a picture of the backsubstitution process gives you a idea of what is going on.  

We must keep track of two things - (1) the size of the remaining argument to the recurrence, and (2) the additive stuff to be accumulated during this call.

Example: tex2html_wrap_inline13660

tex2html_wrap13800 tex2html_wrap13802
The remaining arguments are on the left, the additive terms on the right.

Although this tree has height tex2html_wrap_inline13662 , the total sum at each level decreases geometrically, so:


The recursion tree framework made this much easier to see than with algebraic backsubstitution.

Listening To Part 3-9

See if you can use the Master theorem to provide an instant asymptotic solution

The Master Theorem:   Let tex2html_wrap_inline13664 and b>1 be constants, let f(n) be a function, and let T(n) be defined on the nonnegative integers by the recurrence


where we interpret n/b as tex2html_wrap_inline13674 or tex2html_wrap_inline13676 . Then T(n) can be bounded asymptotically as follows:

  1. If tex2html_wrap_inline13680 for some constant tex2html_wrap_inline13682 , then tex2html_wrap_inline13684 .
  2. If tex2html_wrap_inline13686 , then tex2html_wrap_inline13688 .
  3. If tex2html_wrap_inline13690 for some constant tex2html_wrap_inline13692 , and if tex2html_wrap_inline13694 for some constant c<1, and all sufficiently large n, then tex2html_wrap_inline13698 .

Listening To Part 3-10

Examples of the Master Theorem

Which case of the Master Theorem applies?

Listening To Part 3-11

Why should the Master Theorem be true?

Consider T(n) = a T(n/b) + f(n).

Suppose f(n) is small enough

Say f(n)=0, ie. T(n) = a T(n/b).

Then we have a recursion tree where the only contribution is at the leaves.  

There will be tex2html_wrap_inline13756 levels, with tex2html_wrap_inline13758 leaves at level l.


so long as f(n) is small enough that it is dwarfed by this, we have case 1 of the Master Theorem!

Listening To Part 3-12

Suppose f(n) is large enough

If we draw the recursion tree for T(n) = a T(n/b) + f(n).

If f(n) is a big enough function, the one top call can be bigger than the sum of all the little calls.

Example: tex2html_wrap_inline13766 . In fact this holds unless tex2html_wrap_inline13768 !

In case 3 of the Master Theorem, the additive term dominates.

In case 2, both parts contribute equally, which is why the log pops up. It is (usually) what we want to have happen in a divide and conquer algorithm.

Listening To Part 3-13

Famous Algorithms and their Recurrence

Matrix Multiplication

The standard matrix multiplication algorithm for two tex2html_wrap_inline13770 matrices is tex2html_wrap_inline13772 .    

tex2html_wrap13808 tex2html_wrap13810
Strassen discovered a divide-and-conquer algorithm which takes tex2html_wrap_inline13774 time.

Since tex2html_wrap_inline13776 dwarfs tex2html_wrap_inline13778 , case 1 of the master theorem applies and tex2html_wrap_inline13780 .

This has been ``improved'' by more and more complicated recurrences until the current best in tex2html_wrap_inline13782 .

Listening To Part 3-14

Polygon Triangulation

Given a polygon in the plane, add diagonals so that each face is a triangle None of the diagonals are allowed to cross.   

tex2html_wrap13812 tex2html_wrap13814
Triangulation is an important first step in many geometric algorithms.

The simplest algorithm might be to try each pair of points and check if they see each other. If so, add the diagonal and recur on both halves, for a total of tex2html_wrap_inline13784 .

However, Chazelle gave an algorithm which runs in tex2html_wrap_inline13786 time. Since tex2html_wrap_inline13788 , by case 1 of the Master Theorem, Chazelle's algorithm is linear, ie. T(n) = O(n).


The classic divide and conquer recurrence is Mergesort's T(n) = 2 T(n/2) + O(n), which divides the data into equal-sized halves and spends linear time merging the halves after they are sorted.  

Since tex2html_wrap_inline13794 but not tex2html_wrap_inline13796 , Case 2 of the Master Theorem applies and tex2html_wrap_inline13798 .

In case 2, the divide and merge steps balance out perfectly, as we usually hope for from a divide-and-conquer algorithm.

Mergesort Animations

Approaches to Algorithms Design


Job is partly done - do a little more, repeat until done.  

A good example of this approach is insertion sort


A recursive technique  

A good example of this approach is Mergesort.

Next: Lecture 4 - heapsort Up: Table of contents Previous: Lecture 2 - asymptotic

Lecture 4 - heapsort

Listening To Part 4-1

4.2-2 Argue the solution to


is tex2html_wrap_inline13852 by appealing to the recursion tree.  

Draw the recursion tree.

tex2html_wrap14006 tex2html_wrap14008
How many levels does the tree have? This is equal to the longest path from the root to a leaf.

The shortest path to a leaf occurs when we take the heavy branch each time. The height k is given by tex2html_wrap_inline13854 , meaning tex2html_wrap_inline13856 or tex2html_wrap_inline13858 .

The longest path to a leaf occurs when we take the light branch each time. The height k is given by tex2html_wrap_inline13860 , meaning tex2html_wrap_inline13862 or tex2html_wrap_inline13864 .

The problem asks to show that tex2html_wrap_inline13866 , meaning we are looking for a lower bound

On any full level, the additive terms sums to n. There are tex2html_wrap_inline13868 full levels. Thus tex2html_wrap_inline13870

Listening To Part 4-2

4.2-4 Use iteration to solve T(n) = T(n-a) + T(a) + n, where tex2html_wrap_inline13874 is a constant.

Note iteration is backsubstitution.  


Listening To Part 4-3

Why don't CS profs ever stop talking about sorting?!

  1. Computers spend more time sorting than anything else, historically 25% on mainframes.    
  2. Sorting is the best studied problem in computer science, with a variety of different algorithms known.
  3. Most of the interesting ideas we will encounter in the course can be taught in the context of sorting, such as divide-and-conquer, randomized algorithms, and lower bounds.

You should have seen most of the algorithms - we will concentrate on the analysis.

Listening To Part 4-4

Applications of Sorting

One reason why sorting is so important is that once a set of items is sorted, many other problems become easy.  


Binary search lets you test whether an item is in a dictionary in tex2html_wrap_inline13876 time.  

Speeding up searching is perhaps the most important application of sorting.

Closest pair

Given n numbers, find the pair which are closest to each other.  

Once the numbers are sorted, the closest pair will be next to each other in sorted order, so an O(n) linear scan completes the job.

Listening To Part 4-5

Element uniqueness

Given a set of n items, are they all unique or are there any duplicates?    

Sort them and do a linear scan to check all adjacent pairs.

This is a special case of closest pair above.

Frequency distribution - Mode

Given a set of n items, which element occurs the largest number of times?   

Sort them and do a linear scan to measure the length of all adjacent runs.

Median and Selection

What is the kth largest item in the set?   

Once the keys are placed in sorted order in an array, the kth largest can be found in constant time by simply looking in the kth position of the array.

Listening To Part 4-6

Convex hulls

Given n points in two dimensions, find the smallest area polygon which contains them all.  

The convex hull is like a rubber band stretched over the points.

Convex hulls are the most important building block for more sophisticated geometric algorithms.  

Once you have the points sorted by x-coordinate, they can be inserted from left to right into the hull, since the rightmost point is always on the boundary.

Without sorting the points, we would have to check whether the point is inside or outside the current hull.

Adding a new rightmost point might cause others to be deleted.

Huffman codes

If you are trying to minimize the amount of space a text file is taking up, it is silly to assign each letter the same length (ie. one byte) code.   

Example: e is more common than q, a is more common than z.

If we were storing English text, we would want a and e to have shorter codes than q and z.

To design the best possible code, the first and most important step is to sort the characters in order of frequency of use.

>Listening t0
Character Frequency Code
f 5 1100
e 9 1101
c 12 100
b 13 101
d 16 111
a 45 0

Listening to Part 4-8

Selection Sort

A simple tex2html_wrap_inline13880 sorting algorithm is selection sort.  

Sweep through all the elements to find the smallest item, then the smallest remaining item, etc. until the array is sorted.


for i = 1 to n

for j = i+1 to n

if (A[j] < A[i]) then swap(A[i],A[j])

It is clear this algorithm must be correct from an inductive argument, since the ith element is in its correct position.

It is clear that this algorithm takes tex2html_wrap_inline13888 time.

It is clear that the analysis of this algorithm cannot be improved because there will be n/2 iterations which will require at least n/2 comparisons each, so at least tex2html_wrap_inline13894 comparisons will be made. More careful analysis doubles this.

Thus selection sort runs in tex2html_wrap_inline13896 time.

Listening to Part 4-9

Binary Heaps

A binary heap is defined to be a binary tree with a key in each node such that:  

  1. All leaves are on, at most, two adjacent levels.
  2. All leaves on the lowest level occur to the left, and all levels except the lowest one are completely filled.
  3. The key in root is tex2html_wrap_inline13898 all its children, and the left and right subtrees are again binary heaps.

Conditions 1 and 2 specify shape of the tree, and condition 3 the labeling of the tree.

Listening to Part 4-10

The ancestor relation in a heap defines a partial order on its elements, which means it is reflexive, anti-symmetric, and transitive.  

  1. Reflexive: x is an ancestor of itself.
  2. Anti-symmetric: if x is an ancestor of y and y is an ancestor of x, then x=y.
  3. Transitive: if x is an ancestor of y and y is an ancestor of z, x is an ancestor of z.

Partial orders can be used to model heirarchies with incomplete information or equal-valued elements. One of my favorite games with my parents is fleshing out the partial order of ``big'' old-time movie stars.  

The partial order defined by the heap structure is weaker than that of the total order, which explains

  1. Why it is easier to build.
  2. Why it is less useful than sorting (but still very important).

Listening to Part 4-11

Constructing Heaps

Heaps can be constructed incrementally, by inserting new elements into the left-most open spot in the array.  

If the new element is greater than its parent, swap their positions and recur.

Since at each step, we replace the root of a subtree by a larger one, we preserve the heap order.

Since all but the last level is always filled, the height h of an n element heap is bounded because:


so tex2html_wrap_inline13902 .

Doing n such insertions takes tex2html_wrap_inline13904 , since the last n/2 insertions require tex2html_wrap_inline13908 time each.

Listening to Part 4-12


The bottom up insertion algorithm gives a good way to build a heap, but Robert Floyd found a better way, using a merge procedure called heapify.  

Given two heaps and a fresh element, they can be merged into one by making the new one the root and trickling down.


n = |A|

For tex2html_wrap_inline13912 do



left = 2i

right = 2i+1

if tex2html_wrap_inline13914 then

max = left

else max = i

if tex2html_wrap_inline13916 and (A(right] > A[max]) then

max = right

if tex2html_wrap_inline13920 then



Rough Analysis of Heapify

Heapify on a subtree containing n nodes takes


The 2/3 comes from merging heaps whose levels differ by one. The last row could be exactly half filled. Besides, the asymptotic answer won't change so long the fraction is less than one.  

Solve the recurrence using the Master Theorem.

Let a = 1, b= 3/2 and f(n) = 1.

Note that tex2html_wrap_inline13928 , since tex2html_wrap_inline13930 .

Thus Case 2 of the Master theorem applies.

The Master Theorem: Let tex2html_wrap_inline13932 and b>1 be constants, let f(n) be a function, and let T(n) be defined on the nonnegative integers by the recurrence


where we interpret n/b to mean either tex2html_wrap_inline13942 or tex2html_wrap_inline13944 . Then T(n) can be bounded asymptotically as follows:

  1. If tex2html_wrap_inline13948 for some constant tex2html_wrap_inline13950 , then tex2html_wrap_inline13952 .
  2. If tex2html_wrap_inline13954 , then tex2html_wrap_inline13956 .
  3. If tex2html_wrap_inline13958 for some constant tex2html_wrap_inline13960 , and if tex2html_wrap_inline13962 for some constant c<1, and all sufficiently large n, then tex2html_wrap_inline13966 .

Listening to Part 4-14

Exact Analysis of Heapify

In fact, Heapify performs better than tex2html_wrap_inline13968 , because most of the heaps we merge are extremely small.

In a full binary tree on n nodes, there are n/2 nodes which are leaves (i.e. height 0), n/4 nodes which are height 1, n/8 nodes which are height 2, ...

In general, there are at most tex2html_wrap_inline13976 nodes of height h, so the cost of building a heap is:


Since this sum is not quite a geometric series, we can't apply the usual identity to get the sum. But it should be clear that the series converges.

Listening to Part 4-15

Proof of Convergence

Series convergence is the ``free lunch'' of algorithm analysis.    

The identify for the sum of a geometric series is


If we take the derivative of both sides, ...


Multiplying both sides of the equation by x gives the identity we need:


Substituting x = 1/2 gives a sum of 2, so Build-heap uses at most 2n comparisons and thus linear time.

Listening to Part 4-16

The Lessons of Heapsort, I

"Are we doing a careful analysis? Might our algorithm be faster than it seems?"

Typically in our analysis, we will say that since we are doing at most x operations of at most y time each, the total time is O(x y).

However, if we overestimate too much, our bound may not be as tight as it should be!

Listening to Part 4-17


Heapify can be used to construct a heap, using the observation that an isolated element forms a heap of size 1.  



for i = n to 1 do


n = n - 1


If we construct our heap from bottom to top using Heapify, we do not have to do anything with the last n/2 elements.

With the implicit tree defined by array positions, (i.e. the ith position is the parent of the 2ith and (2i+1)st positions) the leaves start out as heaps.

Exchanging the maximum element with the last element and calling heapify repeatedly gives an tex2html_wrap_inline13990 sorting algorithm, named Heapsort.

Lecture Sound../sounds/lec4-17a.au

Heapsort Animations

Listening to Part 4-18

The Lessons of Heapsort, II

Always ask yourself, ``Can we use a different data structure?''

Selection sort scans throught the entire array, repeatedly finding the smallest remaining element.  

For i = 1 to n

A: Find the smallest of the first n-i+1 items.

B: Pull it out of the array and put it first.

Using arrays or unsorted linked lists as the data structure, operation A takes O(n) time and operation B takes O(1).

Using heaps, both of these operations can be done within tex2html_wrap_inline13998 time, balancing the work and achieving a better tradeoff.

Listening to Part 4-19

Priority Queues

A priority queue is a data structure on sets of keys supporting the following operations:  

These operations can be easily supported using a heap.

Listening to Part 4-20

Applications of Priority Queues

Heaps as stacks or queues


Both stacks and queues can be simulated by using a heap, when we add a new time field to each item and order the heap according it this time field.

This simulation is not as efficient as a normal stack/queue implementation, but it is a cute demonstration of the flexibility of a priority queue.

Listening to Part 4-21

Discrete Event Simulations

In simulations of airports, parking lots, and jai-alai - priority queues can be used to maintain who goes next.   

The stack and queue orders are just special cases of orderings. In real life, certain people cut in line.

Sweepline Algorithms in Computational Geometry


In the priority queue, we will store the points we have not yet encountered, ordered by x coordinate. and push the line forward one stop at a time.

Listening to Part 4-22

Greedy Algorithms

In greedy algorithms, we always pick the next thing which locally maximizes our score. By placing all the things in a priority queue and pulling them off in order, we can improve performance over linear search or sorting, particularly if the weights change.  

Example: Sequential strips in triangulations.

Danny Heep


Next: Lecture 5 - quicksort Up: Table of contents Previous: Lecture 3 - recurrence

Lecture 5 - quicksort

Listening to Part 5-1

4-2 Find the missing integer from 0 to n using O(n) ``is bit[j] in A[i]'' queries.

Note - there are a total of tex2html_wrap_inline14085 bits, so we are not allowed to read the entire input!  

Also note, the problem is asking us to minimize the number of bits we read. We can spend as much time as we want doing other things provided we don't look at extra bits.

How can we find the last bit of the missing integer?

Ask all the n integers what their last bit is and see whether 0 or 1 is the bit which occurs less often than it is supposed to. That is the last bit of the missing integer!

How can we determine the second-to-last bit?

Ask the tex2html_wrap_inline14087 numbers which ended with the correct last bit! By analyzing the bit patterns of the numbers from 0 to n which end with this bit.  

By recurring on the remaining candidate numbers, we get the answer in T(n) = T(n/2) + n =O(n), by the Master Theorem.

Listening to Part 5-2


Although mergesort is tex2html_wrap_inline14091 , it is quite inconvenient for implementation with arrays, since we need space to merge.  

In practice, the fastest sorting algorithm is Quicksort, which uses partitioning as its main idea.  

Example: Pivot about 10.

17 12 6 19 23 8 5 10 - before

6 8 5 10 23 19 12 17 - after

Partitioning places all the elements less than the pivot in the left part of the array, and all elements greater than the pivot in the right part of the array. The pivot fits in the slot between them.  

Note that the pivot element ends up in the correct place in the total order!

Listening to Part 5-3

Partitioning the elements

Once we have selected a pivot element, we can partition the array in one linear scan, by maintaining three sections of the array: < pivot, > pivot, and unexplored.

Example: pivot about 10

| 17 12 6 19 23 8 5 | 10

| 5 12 6 19 23 8 | 17

5 | 12 6 19 23 8 | 17

5 | 8 6 19 23 | 12 17

5 8 | 6 19 23 | 12 17

5 8 6 | 19 23 | 12 17

5 8 6 | 23 | 19 12 17

5 8 6 ||23 19 12 17

5 8 6 10 19 12 17 23

As we scan from left to right, we move the left bound to the right when the element is less than the pivot, otherwise we swap it with the rightmost unexplored element and move the right bound one step closer to the left.

Listening to Part 5-4

Since the partitioning step consists of at most n swaps, takes time linear in the number of keys. But what does it buy us?

  1. The pivot element ends up in the position it retains in the final sorted order.
  2. After a partitioning, no element flops to the other side of the pivot in the final sorted order.

Thus we can sort the elements to the left of the pivot and the right of the pivot independently!

This gives us a recursive sorting algorithm, since we can use the partitioning approach to sort each subproblem.

Listening to Part 5-5

Quicksort Animations

Listening to Part 5-6




Quicksort(A, low, high)

if (low < high)

pivot-location = Partition(A,low,high)

Quicksort(A,low, pivot-location - 1)

Quicksort(A, pivot-location+1, high)


pivot = A[low]

leftwall = low

for i = low+1 to high

if (A[i] < pivot) then

leftwall = leftwall+1



Listening to Part 5-7

Best Case for Quicksort

Since each element ultimately ends up in the correct position, the algorithm correctly sorts. But how long does it take?  

The best case for divide-and-conquer algorithms comes when we split the input as evenly as possible. Thus in the best case, each subproblem is of size n/2.

The partition step on each subproblem is linear in its size. Thus the total effort in partitioning the tex2html_wrap_inline14105 problems of size tex2html_wrap_inline14107 is O(n).

The recursion tree for the best case looks like this:

The total partitioning on each level is O(n), and it take tex2html_wrap_inline14113 levels of perfect partitions to get to single element subproblems. When we are down to single elements, the problems are sorted. Thus the total time in the best case is tex2html_wrap_inline14115 .

Listening to Part 5-8

Worst Case for Quicksort

Suppose instead our pivot element splits the array as unequally as possible. Thus instead of n/2 elements in the smaller half, we get zero, meaning that the pivot element is the biggest or smallest element in the array.

Now we have n-1 levels, instead of tex2html_wrap_inline14119 , for a worst case time of tex2html_wrap_inline14121 , since the first n/2 levels each have tex2html_wrap_inline14125 elements to partition.

Thus the worst case time for Quicksort is worse than Heapsort or Mergesort.

To justify its name, Quicksort had better be good in the average case. Showing this requires some fairly intricate analysis.

The divide and conquer principle applies to real life. If you will break a job into pieces, it is best to make the pieces of equal size!

Listening to Part 5-9

Intuition: The Average Case for Quicksort

Suppose we pick the pivot element at random in an array of n keys.

Half the time, the pivot element will be from the center half of the sorted array.

Whenever the pivot element is from positions n/4 to 3n/4, the larger remaining subarray contains at most 3n/4 elements.

If we assume that the pivot element is always in this range, what is the maximum number of partitions we need to get from n elements down to 1 element?




Listening to Part 5-10

What have we shown?

At most tex2html_wrap_inline14133 levels of decent partitions suffices to sort an array of n elements.  

But how often when we pick an arbitrary element as pivot will it generate a decent partition?

Since any number ranked between n/4 and 3n/4 would make a decent pivot, we get one half the time on average.

If we need tex2html_wrap_inline14139 levels of decent partitions to finish the job, and half of random partitions are decent, then on average the recursion tree to quicksort the array has tex2html_wrap_inline14141 levels.

Since O(n) work is done partitioning on each level, the average time is tex2html_wrap_inline14145 .

More careful analysis shows that the expected number of comparisons is tex2html_wrap_inline14147 .

Listening to Part 5-11

Average-Case Analysis of Quicksort

To do a precise average-case analysis of quicksort, we formulate a recurrence given the exact expected time T(n):


Each possible pivot p is selected with equal probability. The number of comparisons needed to do the partition is n-1.  

We will need one useful fact about the Harmonic numbers tex2html_wrap_inline14151 , namely


It is important to understand (1) where the recurrence relation comes from and (2) how the log comes out from the summation. The rest is just messy algebra.

Listening to Part 5-12






rearranging the terms give us:


substituting tex2html_wrap_inline14153 gives



We are really interested in A(n), so


Listening to Part 5-13

What is the Worst Case?

The worst case for Quicksort depends upon how we select our partition or pivot element. If we always select either the first or last element of the subarray, the worst-case occurs when the input is already sorted!








Having the worst case occur when they are sorted or almost sorted is very bad, since that is likely to be the case in certain applications.

To eliminate this problem, pick a better pivot:

  1. Use the middle element of the subarray as pivot.
  2. Use a random element of the array as the pivot.
  3. Perhaps best of all, take the median of three elements (first, last, middle) as the pivot. Why should we use median instead of the mean?

Whichever of these three rules we use, the worst case remains tex2html_wrap_inline14157 . However, because the worst case is no longer a natural order it is much more difficult to occur.

Listening to Part 5-14

Is Quicksort really faster than Heapsort?

Since Heapsort is tex2html_wrap_inline14159 and selection sort is tex2html_wrap_inline14161 , there is no debate about which will be better for decent-sized files.  

But how can we compare two tex2html_wrap_inline14163 algorithms to see which is faster? Using the RAM model and the big Oh notation, we can't!

When Quicksort is implemented well, it is typically 2-3 times faster than mergesort or heapsort. The primary reason is that the operations in the innermost loop are simpler. The best way to see this is to implement both and experiment with different inputs.

Since the difference between the two programs will be limited to a multiplicative constant factor, the details of how you program each algorithm will make a big difference.

If you don't want to believe me when I say Quicksort is faster, I won't argue with you. It is a question whose solution lies outside the tools we are using.

Listening to Part 5-15


Suppose you are writing a sorting program, to run on data given to you by your worst enemy. Quicksort is good on average, but bad on certain worst-case instances.  

If you used Quicksort, what kind of data would your enemy give you to run it on? Exactly the worst-case instance, to make you look bad.

But instead of picking the median of three or the first element as pivot, suppose you picked the pivot element at random.

Now your enemy cannot design a worst-case instance to give to you, because no matter which data they give you, you would have the same probability of picking a good pivot!

Randomization is a very important and useful idea. By either picking a random pivot or scrambling the permutation before sorting it, we can say:

``With high probability, randomized quicksort runs in tex2html_wrap_inline14165 time.''

Where before, all we could say is:

``If you give me random input data, quicksort runs in expected tex2html_wrap_inline14167 time.''

Since the time bound how does not depend upon your input distribution, this means that unless we are extremely unlucky (as opposed to ill prepared or unpopular) we will certainly get good performance.

Randomization is a general tool to improve algorithms with bad worst-case but good average-case complexity.

The worst-case is still there, but we almost certainly won't see it.

Next: Lecture 6 - linear Up: Table of contents Previous: Lecture 4 - heapsort

Lecture 6 - linear sorting

Listening to Part 6-1

7.1-2: Show that an n-element heap has height tex2html_wrap_inline14189 .

Since it is balanced binary tree, the height of a heap is clearly tex2html_wrap_inline14191 , but the problem asks for an exact answer.  

The height is defined as the number of edges in the longest simple path from the root.

The number of nodes in a complete balanced binary tree of height h is tex2html_wrap_inline14193 .

Thus the height increases only when tex2html_wrap_inline14195 , or in other words when tex2html_wrap_inline14197 is an integer.

Listening to Part 6-2

7.1-5 Is a reverse sorted array a heap?

In a heap, each element is greater than or equal to each of its descendants.

In the array representation of a heap, the descendants of the ith element are the 2ith and (2i+1)th elements.

If A is sorted in reverse order, then tex2html_wrap_inline14201 implies that tex2html_wrap_inline14203 .

Since 2i > i and 2i+1 > i then tex2html_wrap_inline14209 and tex2html_wrap_inline14211 .

Thus by definition A is a heap!

Listening to Part 6-3

Can we sort in better than tex2html_wrap_inline14213 ?

Any comparison-based sorting program can be thought of as defining a decision tree of possible executions.  

Running the same program twice on the same permutation causes it to do exactly the same thing, but running it on different permutations of the same data causes a different sequence of comparisons to be made on each.

Claim: the height of this decision tree is the worst-case complexity of sorting.  

Listening to Part 6-4

Once you believe this, a lower bound on the time complexity of sorting follows easily.  

Since any two different permutations of n elements requires a different sequence of steps to sort, there must be at least n! different paths from the root to leaves in the decision tree, ie. at least n! different leaves in the tree.

Since only binary comparisons (less than or greater than) are used, the decision tree is a binary tree.

Since a binary tree of height h has at most tex2html_wrap_inline14219 leaves, we know tex2html_wrap_inline14221 , or tex2html_wrap_inline14223 .

By inspection tex2html_wrap_inline14225 , since the last n/2 terms of the product are each greater than n/2. By Sterling's approximation, a better bound is tex2html_wrap_inline14231 where e=2.718.


Listening to Part 6-5

Non-Comparison-Based Sorting

All the sorting algorithms we have seen assume binary comparisons as the basic primative, questions of the form ``is x before y?''.  

Suppose you were given a deck of playing cards to sort. Most likely you would set up 13 piles and put all cards with the same number in one pile.

A 2 3 4 5 6 7 8 9 10 J Q K

A 2 3 4 5 6 7 8 9 10 J Q K

A 2 3 4 5 6 7 8 9 10 J Q K

A 2 3 4 5 6 7 8 9 10 J Q K

With only a constant number of cards left in each pile, you can use insertion sort to order by suite and concatenate everything together.

If we could find the correct pile for each card in constant time, and each pile gets O(1) cards, this algorithm takes O(n) time.

Listening to Part 6-6


Suppose we are sorting n numbers from 1 to m, where we know the numbers are approximately uniformly distributed.  

We can set up n buckets, each responsible for an interval of m/n numbers from 1 to m

Given an input number x, it belongs in bucket number tex2html_wrap_inline14241 .

If we use an array of buckets, each item gets mapped to the right bucket in O(1) time.

With uniformly distributed keys, the expected number of items per bucket is 1. Thus sorting each bucket takes O(1) time!

The total effort of bucketing, sorting buckets, and concatenating the sorted buckets together is O(n).

What happened to our tex2html_wrap_inline14249 lower bound!

Listening to Part 6-7

We can use bucketsort effectively whenever we understand the distribution of the data.

However, bad things happen when we assume the wrong distribution.

Suppose in the previous example all the keys happened to be 1. After the bucketing phase, we have:

We spent linear time distributing our items into buckets and learned nothing. Perhaps we could split the big bucket recursively, but it is not certain that we will ever win unless we understand the distribution.

Problems like this are why we worry about the worst-case performance of algorithms!

Such distribution techniques can be used on strings instead of just numbers. The buckets will correspond to letter ranges instead of just number ranges.

The worst case ``shouldn't'' happen if we understand the distribution of our data.

Listening to Part 6-8

Real World Distributions

Consider the distribution of names in a telephone book.  

Either make sure you understand your data, or use a good worst-case or randomized algorithm!

The Shifflett's of Charlottesville

For comparison, note that there are seven Shifflett's (of various spellings) in the 1000 page Manhattan telephone directory.  

Listening to Part 6-10

Rules for Algorithm Design

The secret to successful algorithm design, and problem solving in general, is to make sure you ask the right questions. Below, I give a possible series of questions for you to ask yourself as you try to solve difficult algorithm design problems:    

  1. Do I really understand the problem?

    1. What exactly does the input consist of?
    2. What exactly are the desired results or output?
    3. Can I construct some examples small enough to solve by hand? What happens when I solve them?
    4. Are you trying to solve a numerical problem? A graph algorithm problem? A geometric problem? A string problem? A set problem? Might your problem be formulated in more than one way? Which formulation seems easiest?

  2. Can I find a simple algorithm for the problem?

    1. Can I find the solve my problem exactly by searching all subsets or arrangements and picking the best one?

      1. If so, why am I sure that this algorithm always gives the correct answer?
      2. How do I measure the quality of a solution once I construct it?

        Listening to Part 6-11

      3. Does this simple, slow solution run in polynomial or exponential time?
      4. If I can't find a slow, guaranteed correct algorithm, am I sure that my problem is well defined enough to permit a solution?
    2. Can I solve my problem by repeatedly trying some heuristic rule, like picking the biggest item first? The smallest item first? A random item first?
      1. If so, on what types of inputs does this heuristic rule work well? Do these correspond to the types of inputs that might arise in the application?
      2. On what types of inputs does this heuristic rule work badly? If no such examples can be found, can I show that in fact it always works well?
      3. How fast does my heuristic rule come up with an answer?

  3. Are there special cases of this problem I know how to solve exactly?

    1. Can I solve it efficiently when I ignore some of the input parameters?
    2. What happens when I set some of the input parameters to trivial values, such as 0 or 1?

      Listening to Part 6-12

    3. Can I simplify the problem to create a problem I can solve efficiently? How simple do I have to make it?
    4. If I can solve a certain special case, why can't this be generalized to a wider class of inputs?

  4. Which of the standard algorithm design paradigms seem most relevant to the problem?

    1. Is there a set of items which can be sorted by size or some key? Does this sorted order make it easier to find what might be the answer?
    2. Is there a way to split the problem in two smaller problems, perhaps by doing a binary search, or a partition of the elements into big and small, or left and right? If so, does this suggest a divide-and-conquer algorithm?
    3. Are there certain operations being repeatedly done on the same data, such as searching it for some element, or finding the largest/smallest remaining element? If so, can I use a data structure of speed up these queries, like hash tables or a heap/priority queue?

  5. Am I still stumped?

    1. Why don't I go back to the beginning of the list and work through the questions again? Do any of my answers from the first trip change on the second?

Next: Lecture 7 - elementary Up: Table of contents Previous: Lecture 5 - quicksort

Lecture 7 - elementary data structures

Listening to Part 7-1

8.2-3 Argue that insertion sort is better than Quicksort for sorting checks

In the best case, Quicksort takes tex2html_wrap_inline14275 . Although using median-of-three turns the sorted permutation into the best case, we lose if insertion sort is better on the given data.   

-- 5

In insertion sort, the cost of each insertion is the number of items which we have to jump over. In the check example, the expected number of moves per items is small, say c. We win if tex2html_wrap_inline14277 .

Listening to Part 7-2

8.3-1 Why do we analyze the average-case performance of a randomized algorithm, instead of the worst-case?

In a randomized algorithm, the worst case is not a matter of the input but only of luck. Thus we want to know what kind of luck to expect. Every input we see is drawn from the uniform distribution.  

Listening to Part 7-3

8.3-2 How many calls are made to Random in randomized quicksort in the best and worst cases?

Each call to random occurs once in each call to partition.

The number of partitions is tex2html_wrap_inline14279 in any run of quicksort!!

There is some potential variation depending upon what you do with intervals of size 1 - do you call partition on intervals of size one? However, there is no asymptotic difference between best and worst case.

The reason - any binary tree with n leaves has n-1 internal nodes, each of which corresponds to a call to partition in the quicksort recursion tree.

Listening to Part 7-4

Elementary Data Structures

``Mankind's progress is measured by the number of things we can do without thinking.''

Elementary data structures such as stacks, queues, lists, and heaps will be the ``of-the-shelf'' components we build our algorithm from. There are two aspects to any data structure:  

The fact that we can describe the behavior of our data structures in terms of abstract operations explains why we can use them without thinking, while the fact that we have different implementation of the same abstract operations enables us to optimize performance.  

Listening to Part 7-5

Stacks and Queues

Sometimes, the order in which we retrieve data is independent of its content, being only a function of when it arrived.    

A stack supports last-in, first-out operations: push and pop.

A queue supports first-in, first-out operations: enqueue and dequeue.

A deque is a double ended queue and supports all four operations: push, pop, enqueue, dequeue.

Lines in banks are based on queues, while food in my refrigerator is treated as a stack.  

Both can be used to traverse a tree, but the order is completely different.

Which order is better for WWW crawler robots?

Listening to Part 7-6

Stack Implementation

Although this implementation uses an array, a linked list would eliminate the need to declare the array size in advance.


if top[S] = 0

then return TRUE

else return FALSE

PUSH(S, x)





then error ``underflow''

else tex2html_wrap_inline14285

return S[top[S] + 1]

All are O(1) time operations.

Listening to Part 7-7

Queue Implementation

A circular queue implementation requires pointers to the head and tail elements, and wraps around to reuse array elements.


Q[tail[Q]] tex2html_wrap_inline14291 x

if tail[Q] = length[Q]

then tail[Q] tex2html_wrap_inline14293 1

else tail[Q] tex2html_wrap_inline14295 tail[Q] + 1



x = Q[head[Q]]

if head[Q] = length[Q]

then head[Q] = 1

else head[Q] = head[Q] + 1

return x

A list-based implementation would eliminate the possibility of overflow.

All are O(1) time operations.

Listening to Part 7-8

Dynamic Set Operations

Perhaps the most important class of data structures maintain a set of items, indexed by keys.   

There are a variety of implementations of these dictionary operations, each of which yield different time bounds for various operations.

Listening to Part 7-9

Pointer Based Implementation

We can maintain a dictionary in either a singly or doubly linked list.   

We gain extra flexibility on predecessor queries at a cost of doubling the number of pointers by using doubly-linked lists.

Since the extra big-Oh costs of doubly-linkly lists is zero, we will usually assume they are, although it might not be necessary.

Singly linked to doubly-linked list is as a Conga line is to a Can-Can line.

Lecture Sound../sounds/lec7-8a.au

Array Based Sets

Unsorted Arrays

Listening to Part 7-10

Sorted Arrays

What are the costs for a heap?

Listening to Part 7-11

Unsorted List Implementation


x = head[L]

while x <> NIL and key[x] <> k

do x = next[x]

return x

Note: the while loop might require two lines in some programming languages.



next[x] = head[L]

if head[L] <> NIL

then prev[head[L]] = x

head[L] = x

prev[x] = NIL


if prev[x] <> NIL

then next[prev[x]] = next[x]

else head[L] = next[x]

if next[x] <> NIL

then prev[next[x]] = prev[x]


Boundary conditions can be eliminated using a sentinel element which doesn't go away.   



x = next[nil[L]]

while x <> NIL[L] and key[x] <> k

do x = next[x]

return x


next[x] = next[nil[L]]

prev[next[nil[L]]] = x

next[nil[L]] = x

prev[x] = NIL[L]


next[prev[x]] <> next[x]

next[prev[x]] = prev[x]

Listening to Part 7-13

Hash Tables

Hash tables are a very practical way to maintain a dictionary. As with bucket sort, it assumes we know that the distribution of keys is fairly well-behaved.  

The idea is simply that looking an item up in an array is tex2html_wrap_inline14337 once you have its index. A hash function is a mathematical function which maps keys to integers.

In bucket sort, our hash function mapped the key to a bucket based on the first letters of the key. ``Collisions'' were the set of keys mapped to the same bucket.

If the keys were uniformly distributed, then each bucket contains very few keys!

The resulting short lists were easily sorted, and could just as easily be searched!

Listening to Part 7-14

Hash Functions

It is the job of the hash function to map keys to integers. A good hash function:  

  1. Is cheap to evaluate
  2. Tends to use all positions from tex2html_wrap_inline14339 with uniform frequency.
  3. Tends to put similar keys in different parts of the tables (Remember the Shifletts!!)

The first step is usually to map the key to a big integer, for example


This large number must be reduced to an integer whose size is between 1 and the size of our hash table.

One way is by tex2html_wrap_inline14341 , where M is best a large prime not too close to tex2html_wrap_inline14343 , which would just mask off the high bits.

This works on the same principle as a roulette wheel!

Listening to Part 7-15

Good and Bad Hash functions

The first three digits of the Social Security Number  

The last three digits of the Social Security Number

Listening to Part 7-16

The Birthday Paradox

No matter how good our hash function is, we had better be prepared for collisions, because of the birthday paradox.  

The probability of there being no collisions after n insertions into an m-element table is


When m = 366, this probability sinks below 1/2 when N = 23 and to almost 0 when tex2html_wrap_inline14349 .

Listening to Part 7-17

Collision Resolution by Chaining

The easiest approach is to let each element in the hash table be a pointer to a list of keys.  

Insertion, deletion, and query reduce to the problem in linked lists. If the n keys are distributed uniformly in a table of size m/n, each operation takes O(m/n) time.

Chaining is easy, but devotes a considerable amount of memory to pointers, which could be used to make the table larger. Still, it is my preferred method.

Listening to Part 7-18

Open Addressing

We can dispense with all these pointers by using an implicit reference derived from a simple function:  

If the space we want to use is filled, we can examine the remaining locations:
  1. Sequentially tex2html_wrap_inline14355
  2. Quadratically tex2html_wrap_inline14357
  3. Linearly tex2html_wrap_inline14359

The reason for using a more complicated science is to avoid long runs from similarly hashed keys.

Deletion in an open addressing scheme is ugly, since removing one element can break a chain of insertions, making some elements inaccessible.

Listening to Part 7-19

Performance on Set Operations

With either chaining or open addressing:

Pragmatically, a hash table is often the best data structure to maintain a dictionary. However, we will not use it much in proving the efficiency of our algorithms, since the worst-case time is unpredictable.

The best worst-case bounds come from balanced binary trees, such as red-black trees.

Next: Lecture 8 - binary Up: Table of contents Previous: Lecture 6 - linear

Lecture 8 - binary trees

Listening to Part 8-1

9.1-3 Show that there is no sorting algorithm which sorts at least tex2html_wrap_inline14430 instances in O(n) time.

Think of the decision tree which can do this.    What is the shortest tree with tex2html_wrap_inline14434 leaves?



Moral: there cannot be too many good cases for any sorting algorithm!

Listening to Part 8-2

9.1-4 Show that the tex2html_wrap_inline14436 lower bound for sorting still holds with ternary comparisons.

The maximum number of leaves in a tree of height h is tex2html_wrap_inline14438 ,


So it goes for any constant base.

Listening to Part 8-3

Binary Search Trees

``I think that I shall never see
a poem as lovely as a tree Poem's
are wrote by fools like me but only
G-d can make a tree ``
- Joyce Kilmer

Binary search trees provide a data structure which efficiently supports all six dictionary operations.   

A binary tree is a rooted tree where each node contains at most two children.

Each child can be identified as either a left or right child.

A binary tree can be implemented where each node has left and right pointer fields, an (optional) parent pointer, and a data field.

Listening to Part 8-4

Binary Search Trees

A binary search tree labels each node in a binary tree with a single key such that for any node x, and nodes in the left subtree of x have keys tex2html_wrap_inline14440 and all nodes in the right subtree of x have key's tex2html_wrap_inline14442 .

tex2html_wrap14586 tex2html_wrap14588
Left: A binary search tree. Right: A heap but not a binary search tree.

The search tree labeling enables us to find where any key is. Start at the root - if that is not the one we want, search either left or right depending upon whether what we want is tex2html_wrap_inline14444 or tex2html_wrap_inline14446 then the root.

Listening to Part 8-5

Searching in a Binary Tree

Dictionary search operations are easy in binary trees ...


if (x = NIL) and (k = key[x])

then return x

if (k < key[x])

then return TREE-SEARCH(left[x],k)

else return TREE-SEARCH(right[x],k)

The algorithm works because both the left and right subtrees of a binary search tree are binary search trees - recursive structure, recursive algorithm.

This takes time proportional to the height of the tree, O(h).

Listening to Part 8-6

Maximum and Minimum

Where are the maximum and minimum elements in a binary tree?  



while tex2html_wrap_inline14456

do x = right[x]

return x


while tex2html_wrap_inline14458

do x = left[x]

return x

Both take time proportional to the height of the tree, O(h).

Listening to Part 8-7

Where is the predecessor?

Where is the predecessor of a node in a tree, assuming all keys are distinct?   

If X has two children, its predecessor is the maximum value in its left subtree and its successor the minimum value in its right subtree.

Listening to Part 8-8

What if a node doesn't have children?

If it does not have a left child, a node's predecessor is its first left ancestor.

The proof of correctness comes from looking at the in-order traversal of the tree.


if tex2html_wrap_inline14462

then return Tree-Minimum(right[x])


while tex2html_wrap_inline14468 and (x = right[y])

do tex2html_wrap_inline14472


return y

Tree predecessor/successor both run in time proportional to the height of the tree.

Listening to Part 8-9

In-Order Traversal



if (x <> NIL)

then Inorder-Tree-Walk(left[x])

print key[x]



Listening to Part 8-10

Tree Insertion

Do a binary search to find where it should be, then replace the termination NIL pointer with the new item.  



y = NIL

x = root[T]

while tex2html_wrap_inline14486

do y = x

if key[z] < key[x]

then x = left[x]

else x = right[x]


if y = NIL

then tex2html_wrap_inline14498

else if key[z] < key[y]

then tex2html_wrap_inline14502

else tex2html_wrap_inline14504

y is maintained as the parent of x, since x eventually becomes NIL.

The final test establishes whether the NIL was a left or right turn from y.

Insertion takes time proportional to the height of the tree, O(h).

Listening to Part 8-12

Tree Deletion

Deletion is somewhat more tricky than insertion, because the node to die may not be a leaf, and thus effect other nodes.  

Case (a), where the node is a leaf, is simple - just NIL out the parents child pointer.

Case (b), where a node has one chld, the doomed node can just be cut out.

Case (c), relabel the node as its successor (which has at most one child when z has two children!) and delete the successor!

This implementation of deletion assumes parent pointers to make the code nicer, but if you had to save space they could be dispensed with by keeping the pointers on the search path stored in a stack.


if (left[z] = NIL) or (right[z] = NIL)

then tex2html_wrap_inline14512

else tex2html_wrap_inline14514 Tree-Successor(z)

if tex2html_wrap_inline14516

then tex2html_wrap_inline14518

else tex2html_wrap_inline14520

if tex2html_wrap_inline14522

then tex2html_wrap_inline14524

if p[y] = NIL

then tex2html_wrap_inline14528

else if (y = left[p[y]])

then tex2html_wrap_inline14532

else tex2html_wrap_inline14534

if (y <> z)

then tex2html_wrap_inline14538

/* If y has other fields, copy them, too. */

return y

Lines 1-3 determine which node y is physically removed.

Lines 4-6 identify x as the non-nil decendant, if any.

Lines 7-8 give x a new parent.

Lines 9-10 modify the root node, if necessary

Lines 11-13 reattach the subtree, if necessary.

Lines 14-16 if the removed node is deleted, copy.

Conclusion: deletion takes time proportional to the height of the tree. Listening to Part 8-13

Balanced Search Trees

All six of our dictionary operations, when implemented with binary search trees, take O(h), where h is the height of the tree.  

The best height we could hope to get is tex2html_wrap_inline14542 , if the tree was perfectly balanced, since


But if we get unlucky with our order of insertion or deletion, we could get linear height!






In fact, random search trees on average have tex2html_wrap_inline14548 height, but we are worried about worst case height.

We can't easily use randomization - Why?

Listening to Part 8-14

Perfectly Balanced Trees

Perfectly balanced trees require a lot of work to maintain:

If we insert the key 1, we must move every single node in the tree to rebalance it, taking tex2html_wrap_inline14550 time.

Therefore, when we talk about "balanced" trees, we mean trees whose height is tex2html_wrap_inline14552 , so all dictionary operations (insert, delete, search, min/max, successor/predecessor) take tex2html_wrap_inline14554 time.

Red-Black trees are binary search trees where each node is assigned a color, where the coloring scheme helps us maintain the height as tex2html_wrap_inline14556 .

Listening to Part 8-15

Red-Black Tree Definition

Red-black trees have the following properties:  

  1. Every node is colored either red or black.
  2. Every leaf (NIL pointer) is black.
  3. If a node is red then both its children are black.
  4. Every single path from a node to a decendant leaf contains the same number of black nodes.

Listening to Part 8-16

What does this mean?

If the root of a red-black tree is black can we just color it red?

No! For one of its children might be red.

If an arbitrary node is red can we color it black?

No! Because now all nodes may not have the same black height.


What tree maximizes the number of nodes in a tree of black height h?

Listening to Part 8-17

What does a red-black tree with two real nodes look like?

Not (1) - consecutive reds Not (2), (4) - Non-Uniform black height

Listening to Part 8-18

Red-Black Tree Height

Lemma: A red-black tree with n internal nodes has height at most tex2html_wrap_inline14558 .

Proof: Our strategy; first we bound the number of nodes in any subtree, then we bound the height of any subtree.

We claim that any subtree rooted at x has at least tex2html_wrap_inline14560 - 1 internal nodes, where bh(x) is the black height of node x.

Proof, by induction:


Now assume it is true for all tree with black height < bh(x).

If x is black, both subtrees have black height bh(x)-1. If x is red, the subtrees have black height bh(x).

Therefore, the number of internal nodes in any subtree is


Listening to Part 8-19

Now, let h be the height of our red-black tree. At least half the nodes on any single path from root to leaf must be black if we ignore the root.

Thus tex2html_wrap_inline14570 and tex2html_wrap_inline14572 , so tex2html_wrap_inline14574 .

This implies that tex2html_wrap_inline14576 ,so tex2html_wrap_inline14578 . height6pt width4pt

Therefore red-black trees have height at most twice optimal. We have a balanced search tree if we can maintain the red-black tree structure under insertion and deletion.

Next: Lecture 9 - catch Up: Table of contents Previous: Lecture 7 - elementary

Lecture 9 - catch up

Listening to Part 9-1

11-1 For each of the four types of linked lists in the following table, what is the asymptotic worst-case running time for each dynamic-set operation listed?  

singly singly doubly doubly
unsorted sorted unsorted sorted
Search(L, k) O(N) O(N) O(N) O(N)-
Insert(L, x) O(1) O(N) O(1) O(N)-
Delete(L, x) O(N)* O(N)* O(1) O(1)
Successor(L, x) O(N) O(1) O(N) O(1)
Predecessor(L, x) O(N) O(N) O(N) O(1)
Minimum(L) O(N) O(1) O(N) O(1)
Maximum(L) O(N) O(1)+ O(N) O(1)+

Next: Lecture 10 - tree Up: Table of contents Previous: Lecture 8 - binary

Lecture 10 - tree restructuring

Listening to Part 10-1

14.1-5 Describe a Red-Black tree with the largest and smallest ratio of red nodes.

To minimize the ratio of red-black nodes, make all black   (possible for tex2html_wrap_inline14629 )

To maximize the ratio of red nodes, interleave with red nodes as real leaves





Listening to Part 10-2


The basic restructuring step for binary search trees are left and right rotation:  

tex2html_wrap14691 tex2html_wrap14693
  1. Rotation is a local operation changing O(1) pointers.
  2. An in-order search tree before a rotation stays an in-order search tree.
  3. In a rotation, one subtree gets one level closer to the root and one subtree one level further from the root.

Lecture Sound../sounds/lec10-3.1.au


tex2html_wrap_inline14633 (* Set y*)

tex2html_wrap_inline14635 (* Turn y's left into x's right*)

if left[y]= NIL

then tex2html_wrap_inline14639

tex2html_wrap_inline14641 (* Link x's parent to y *)

if p[x] = NIL

then tex2html_wrap_inline14645

else if x= left[p[x]]

then tex2html_wrap_inline14649

else tex2html_wrap_inline14651



Note the in-order property is preserved.

Listening to Part 10-3

Listening to Part 10-4

14.2-5 Show that any n-node tree can be transformed to any other using O(n) rotations (hint: convert to a right going chain).

I will start by showing weaker bounds - that tex2html_wrap_inline14659 and tex2html_wrap_inline14661 rotations suffice - because that is how I proceeded when I first saw the problem.

First, observe that creating a right-going, for tex2html_wrap_inline14663 path from tex2html_wrap_inline14665 < and reversing the same construction gives a path from tex2html_wrap_inline14667 to tex2html_wrap_inline14669 .

Note that it will take at most n rotations to make the lowest valued key the root. Once it is root, all keys are to the right of it, so no more rotations need go through it to create a right-going chain. Repeating with the second lowest key, third, etc. gives that tex2html_wrap_inline14671 rotations suffice.

Now that if we try to create a completely balanced tree instead. To get the n/2 key to the root takes at most n rotations. Now each subtree has half the nodes and we can recur...


Listening to Part 10-5

To get a linear algorithm, we must beware of trees like:

The correct answer is that n-1 rotations suffice to get to a rightmost chain.

By picking the lowest node on the rightmost chain which has a left ancestor, we can add one node per rotation to the right most chain!

Initially, the rightmost chain contained at least 1 node, so after 1 rotations it contains all n. Slick!

Listening to Part 10-6

Red-Black Insertion

Since red-black trees have tex2html_wrap_inline14675 height, if we can preserve all properties of such trees under insertion/deletion, we have a balanced tree!  

Suppose we just did a regular insertion. Under what conditions does it stay a red-black tree?

Since every insertion take places at a leaf, we will change a black NIL pointer to a node with two black NIL pointers.

To preserve the black height of the tree, the new node must be red. If its new parent is black, we can stop, otherwise we must restructure!

Listening to Part 10-7

How can we fix two reds in a row?

It depends upon our uncle's color:

If our uncle is red, reversing our relatives' color either solves the problem or pushes it higher!


Note that after the recoloring:

  1. The black height is unchanged.
  2. The shape of the tree is unchanged.
  3. We are done if our great-grandparent is black.

If we get all the way to the root, recall we can always color a red-black tree's root black. We always will, so initially it was black, and so this process terminates.

Listening to Part 10-8

The Case of the Black Uncle

If our uncle was black, observe that all the nodes around us have to be black:

Solution - rotate right about B:

Since the root of the subtree is now black with the same black-height as before, we have restored the colors and can stop!

Listening to Part 10-9

A double rotation can be required to set things up depending upon the left-right turn sequence, but the principle is the same.


Listening to Part 10-10

Pseudocode and Figures

Listening to Part 10-11

Deletion from Red-Black Trees

Recall the three cases for deletion from a binary tree:  

Case (a) The node to be deleted was a leaf;

Case (b) The node to be deleted had one child;

Case (c) relabel to node as its successor and delete the successor.

Listening to Part 10-12

Deletion Color Cases

Suppose the node we remove was red, do we still have a red-black tree?

Yes! No two reds will be together, and the black height for each leaf stays the same.

However, if the dead node y was black, we must give each of its decendants another black ancestor. If an appropriate node is red, we can simply color it black otherwise we must restructure.

Case (a) black NIL becomes ``double black'';

Case (b) red tex2html_wrap_inline14677 becomes black and black tex2html_wrap_inline14679 becomes ``double black'';

Case (c) red tex2html_wrap_inline14681 becomes black and black tex2html_wrap_inline14683 becomes ``double black''.

Our goal will be to recolor and restructure the tree so as to get rid of the ``double black'' node.

Listening to Part 10-13

In setting up any case analysis, we must be sure that:

  1. All possible cases are covered.
  2. No case is covered twice.

In the case analysis for red-black trees, the breakdown is:

Case 1: The double black node x has a red brother.

Case 2: x has a black brother and two black nephews.

Case 3: x has a black brother, and its left nephew is red and its right nephew is black.

Case 4: x has a black brother, and its right nephew is red (left nephew can be any color).

Listening to Part 10-14


Red-Black trees let us implement all dictionary operations in tex2html_wrap_inline14685 . Further, in no case are more than 3 rotations done to rebalance. Certain very advanced data structures have data stored at nodes which requires a lot of work to adjust after a rotation -- red-black trees ensure it won't happen often.

Example: Each node represents the endpoint of a line, and is augmented with a list of segments in its subtree which it intersects.

We will not study such complicated structures, however.

Next: Lecture 11 - backtracking Up: Table of contents Previous: Lecture 9 - catch

Lecture 11 - backtracking

Listening to Part 11-1

Parallel Bubblesort

In order for me to give back your midterms, please form a line and sort yourselves in alphabetical order, from A to Z.  

There is traditionally a strong correlation between the midterm grades and the number of daily problems attempted:

daily: 0, sum: 134, count: 3, avg: 44.67

daily: 1, sum: 0, count: 2, avg: 0.00

daily: 2, sum: 63, count: 1, avg: 63.00

daily: 3, sum: 194, count: 3, avg: 64.67

daily: 4, sum: 335, count: 5, avg: 67.00

daily: 5, sum: 489, count: 8, avg: 61.12

daily: 6, sum: 381, count: 6, avg: 63.50

daily: 7, sum: 432, count: 6, avg: 72.00

daily: 8, sum: 217, count: 3, avg: 72.33

daily: 9, sum: 293, count: 4, avg: 73.25

Listening to Part 11-2

Combinatorial Search

We have seen how clever algorithms can reduce sorting from tex2html_wrap_inline14741 to tex2html_wrap_inline14743 . However, the stakes are even higher for combinatorially explosive problems:  

The Traveling Salesman Problem

Given a weighted graph, find the shortest cycle which visits each vertex once.  

Applications include minimizing plotter movement, printed-circuit board wiring, transportation problems, etc.

There is no known polynomial time algorithm (ie. tex2html_wrap_inline14745 for some fixed k) for this problem, so search-based algorithms are the only way to go if you need an optional solution.

Listening to Part 11-3

But I want to use a Supercomputer

Moving to a faster computer can only buy you a relatively small improvement:  

Listening to Part 11-4

Can Eight Pieces Cover a Chess Board?

Consider the 8 main pieces in chess (king, queen, two rooks, two bishops, two knights). Can they be positioned on a chessboard so every square is threatened?  

Only 63 square are threatened in this configuration. Since 1849, no one had been able to find an arrangement with bishops on different colors to cover all squares.

Of course, this is not an important problem, but we will use it as an example of how to attack a combinatorial search problem.

Listening to Part 11-5

How many positions to test?

Picking a square for each piece gives us the bound:


Anything much larger than tex2html_wrap_inline14759 is unreasonable to search on a modest computer in a modest amount of time.  

However, we can exploit symmetry to save work. With reflections along horizontal, vertical, and diagonal axis, the queen can go in only 10 non-equivallent positions.

Even better, we can restrict the white bishop to 16 spots and the queen to 16, while being certain that we get all distinct configurations.



Listening to Part 11-6


Backtracking is a systematic way to go through all the possible configurations of a search space.  

In the general case, we assume our solution is a vector tex2html_wrap_inline14761 where each element tex2html_wrap_inline14763 is selected from a finite ordered set tex2html_wrap_inline14765 ,

We build from a partial solution of length k tex2html_wrap_inline14767 and try to extend it by adding another element. After extending it, we will test whether what we have so far is still possible as a partial solution.

If it is still a candidate solution, great. If not, we delete tex2html_wrap_inline14769 and try the next element from tex2html_wrap_inline14771 :

Compute tex2html_wrap_inline14773 , the set of candidate first elements of v.

k = 1

While k > 0 do

While tex2html_wrap_inline14779 do (*advance*)

tex2html_wrap_inline14781 = an element in tex2html_wrap_inline14783


if ( tex2html_wrap_inline14787 ) is solution, print!

k = k + 1

compute tex2html_wrap_inline14791 , the candidate kth elements given v.

k = k - 1 (*backtrack*)

Listening to Part 11-7

Recursive Backtracking

Recursion can be used for elegant and easy implementation of backtracking.  

Backtrack(a, k)

if a is a solution, print(a)

else {

k = k +1

compute tex2html_wrap_inline14797

while tex2html_wrap_inline14799 do

tex2html_wrap_inline14801 = an element in tex2html_wrap_inline14803

tex2html_wrap_inline14805 = tex2html_wrap_inline14807

Backtrack(a, k)


Backtracking can easily be used to iterate through all subsets or permutations of a set.

Backtracking ensures correctness by enumerating all possibilities.

For backtracking to be efficient, we must prune the search space.

Listening to Part 11-8

Constructing all Subsets

How many subsets are there of an n-element set?  

To construct all tex2html_wrap_inline14809 subsets, set up an array/vector of n cells, where the value of tex2html_wrap_inline14811 is either true or false, signifying whether the ith item is or is not in the subset.

To use the notation of the general backtrack algorithm, tex2html_wrap_inline14813 , and v is a solution whenever tex2html_wrap_inline14815 .

What order will this generate the subsets of tex2html_wrap_inline14817 ?







Listening to Part 11-9

Constructing all Permutations

How many permutations are there of an n-element set?  

To construct all n! permutations, set up an array/vector of n cells, where the value of tex2html_wrap_inline14821 is an integer from 1 to n which has not appeared thus far in the vector, corresponding to the ith element of the permutation.

To use the notation of the general backtrack algorithm, tex2html_wrap_inline14823 , and v is a solution whenever tex2html_wrap_inline14825 .


The n-Queens Problem

The first use of pruning to deal with the combinatorial explosion was by the king who rewarded the fellow who discovered chess!  

In the eight Queens, we prune whenever one queen threatens another. Listening to Part 11-11

Covering the Chess Board

In covering the chess board, we prune whenever we find there is a square which we cannot cover given the initial configuration!

Specifically, each piece can threaten a certain maximum number of squares (queen 27, king 8, rook 14, etc.) Whenever the number of unthreated squares exceeds the sum of the maximum number of coverage remaining in unplaced squares, we can prune.

As implemented by a graduate student project, this backtrack search eliminates tex2html_wrap_inline14827 of the search space, when the pieces are ordered by decreasing mobility.

With precomputing the list of possible moves, this program could search 1,000 positions per second. But this is too slow!


Although we might further speed the program by an order of magnitude, we need to prune more nodes!

By using a more clever algorithm, we eventually were able to prove no solution existed, in less than one day's worth of computing.

You too can fight the combinatorial explosion!

Listening to Part 11-12

The Backtracking Contest: Bandwidth

The bandwidth problem takes as input a graph G, with n vertices and m edges (ie. pairs of vertices). The goal is to find a permutation of the vertices on the line which minimizes the maximum length of any edge.    

tex2html_wrap14839 tex2html_wrap14841
The bandwidth problem has a variety of applications, including circuit layout, linear algebra, and optimizing memory usage in hypertext documents.

The problem is NP-complete, meaning that it is exceedingly unlikely that you will be able to find an algorithm with polynomial worst-case running time. It remains NP-complete even for restricted classes of trees.

Since the goal of the problem is to find a permutation, a backtracking program which iterates through all the n! possible permutations and computes the length of the longest edge for each gives an easy tex2html_wrap_inline14831 algorithm. But the goal of this assignment is to find as practically good an algorithm as possible.

Listening to Part 12-4

Rules of the Game

  1. Everyone must do this assignment separately. Just this once, you are not allowed to work with your partner. The idea is to think about the problem from scratch.
  2. If you do not completely understand what the bandwidth of a graph is, you don't have the slightest chance of producing a working program. Don't be afraid to ask for a clarification or explanation!!!!!
  3. There will be a variety of different data files of different sizes. Test on the smaller files first. Do not be afraid to create your own test files to help debug your program.
  4. The data files are available via the course WWW page.
  5. You will be graded on how fast and clever your program is, not on style. No credit will be given for incorrect programs.
  6. The programs are to run on the whatever computer you have access to, although it must be vanilla enough that I can run the program on something I have access to.
  7. You are to turn in a listing of your program, along with a brief description of your algorithm and any interesting optimizations, sample runs, and the time it takes on sample data files. Report the largest test file your program could handle in one minute or less of wall clock time.
  8. The top five self-reported times / largest sizes will be collected and tested by me to determine the winner.

Listening to Part 12-5

Producing Efficient Programs

  1. Don't optimize prematurely: Worrying about recursion vs. iteration is counter-productive until you have worked out the best way to prune the tree. That is where the money is.  
  2. Choose your data structures for a reason: What operations will you be doing? Is case of insertion/deletion more crucial than fast retrieval?

    When in doubt, keep it simple, stupid (KISS).

  3. Let the profiler determine where to do final tuning: Your program is probably spending time where you don't expect.

Next: Lecture 12 - introduction Up: Table of contents Previous: Lecture 10 - tree

Lecture 12 - introduction to dynamic programming

Listening to Part 12-1

15.1-5 Given an element x in an n-node order-statistic binary tree and a natural number i, how can the ith successor of x be determined in tex2html_wrap_inline14868 time.  

This problem can be solved if our data structure supports two operations:

What we are interested in is Get(Rank(x)+i).

In an order statistic tree, each node x is labeled with the number of nodes contained in the subtree rooted in x.

Implementing both operations involves keeping track of how many nodes lie to the left of our path.

Listening to Part 12-6

Optimization Problems

In the algorithms we have studied so far, correctness tended to be easier than efficiency. In optimization problems, we are interested in finding a thing which maximizes or minimizes some function.  

In designing algorithms for optimization problem - we must prove that the algorithm in fact gives the best possible solution.

Greedy algorithms, which makes the best local decision at each step, occasionally produce a global optimum - but you need a proof!  

Dynamic Programming

Dynamic Programming is a technique for computing recurrence relations efficiently by sorting partial results.  

Listening to Part 12-9

Computing Fibonacci Numbers



Implementing it as a recursive procedure is easy but slow!  

We keep calculating the same value over and over!


How slow is slow?


Thus tex2html_wrap_inline14874 , and since our recursion tree has 0 and 1 as leaves, means we have tex2html_wrap_inline14876 calls!

Listening to Part 12-10

What about Dynamic Programming?

We can calculate tex2html_wrap_inline14878 in linear time by storing small values:



For i=1 to n


Moral: we traded space for time.

Dynamic programming is a technique for efficiently computing recurrences by storing partial results.

Once you understand dynamic programming, it is usually easier to reinvent certain algorithms than try to look them up!

Dynamic programming is best understood by looking at a bunch of different examples.

I have found dynamic programming to be one of the most useful algorithmic techniques in practice:

Listening to Part 12-11

Multiplying a Sequence of Matrices

Suppose we want to multiply a long sequence of matrices tex2html_wrap_inline14888 .  

Multiplying an tex2html_wrap_inline14890 matrix by a tex2html_wrap_inline14892 matrix (using the common algorithm) takes tex2html_wrap_inline14894 multiplications.

tex2html_wrap15002 tex2html_wrap15004
We would like to avoid big intermediate matrices, and since matrix multiplication is associative, we can parenthesise however we want.

Matrix multiplication is not communitive, so we cannot permute the order of the matrices without changing the result.

Listening to Part 12-12


Consider tex2html_wrap_inline14896 , where A is tex2html_wrap_inline14898 , B is tex2html_wrap_inline14900 , C is tex2html_wrap_inline14902 , and D is tex2html_wrap_inline14904 .

There are three possible parenthesizations:




The order makes a big difference in real computation. How do we find the best order?

Let M(i,j) be the minimum number of multiplications necessary to compute tex2html_wrap_inline14908 .

The key observations are

Listening to Part 12-13

A recurrence for this is:


If there are n matrices, there are n+1 dimensions.

A direct recursive implementation of this will be exponential, since there is a lot of duplicated work as in the Fibonacci recurrence.

Divide-and-conquer is seems efficient because there is no overlap, but ...

There are only tex2html_wrap_inline14912 substrings between 1 and n. Thus it requires only tex2html_wrap_inline14914 space to store the optimal cost for each of them.

We can represent all the possibilities in a triangle matrix. We can also store the value of k in another triangle matrix to reconstruct to order of the optimal parenthesisation.

The diagonal moves up to the right as the computation progresses. On each element of the kth diagonal |j-i| = k.

For the previous example:

Listening to Part 13-3

Procedure MatrixOrder

for i=1 to n do M[i, j]=0

for diagonal=1 to n-1

for i=1 to n-diagonal do




return [m(1, n)]

Procedure ShowOrder(i, j)

if (i=j) write ( tex2html_wrap_inline14938 )


k=factor(i, j)

write ``(''

ShowOrder(i, k)

write ``*''

ShowOrder (k+1, j)

write ``)''

Listening to Part 13-4

A dynamic programming solution has three components:


  1. Formulate the answer as a recurrence relation or recursive algorithm.
  2. Show that the number of different instances of your recurrence is bounded by a polynomial.
  3. Specify an order of evaluation for the recurrence so you always have what you need.

Listening to Part 13-5

Approximate String Matching

A common task in text editing is string matching - finding all occurrences of a word in a text.     

Unfortunately, many words are mispelled. How can we search for the string closest to the pattern?

Let p be a pattern string and T a text string over the same alphabet.

A k-approximate match between P and T is a substring of T with at most k differences.

Differences may be:

  1. the corresponding characters may differ: KAT tex2html_wrap_inline14948 CAT
  2. P is missing a character from T: CAAT tex2html_wrap_inline14950 CAT
  3. T is missing a character from P: CT tex2html_wrap_inline14952 CAT

Approximate Matching is important in genetics as well as spell checking.

Listening to Part 13-6

A 3-Approximate Match

A match with one of each of three edit operations is:

P = unescessaraly

T = unnecessarily

Finding such a matching seems like a hard problem because we must figure out where you add blanks, but we can solve it with dynamic programming.

D[i, j] = the minimum number of differences between tex2html_wrap_inline14956 and the segment of T ending at j.

D[i, j] is the minimum of the three possible ways to extend smaller strings:

  1. If tex2html_wrap_inline14960 then D[i-1, j-1] else D[i-1, j-1]+1 (corresponding characters do or do not match)
  2. D[i-1, j]+1 (extra character in text - we do not advance the pattern pointer).
  3. D[i, j-1]+1 (character in pattern which is not in text).

Once you accept the recurrence it is easy.

To fill each cell, we need only consider three other cells, not O(n) as in other examples. This means we need only store two rows of the table. The total time is O(mn).

Listening to Part 13-10

Boundary conditions for string matching

What should the value of D[0,i] be, corresponding to the cost of matching the first i characters of the text with none of the pattern?  

It depends. Are we doing string matching in the text or substring matching?

In both cases, D[i,0] = i, since we cannot excuse deleting the first i characters of the pattern without cost.

Listening to Part 13-9

What do we return?

If we want the cost of comparing all of the pattern against all of the text, such as comparing the spelling of two words, all we are interested in is D[n,m].

But what if we want the cheapest match between the pattern anywhere in the text? Assuming the initialization for substring matching, we seek the cheapest matching of the full pattern ending anywhere in the text. This means the cost equals tex2html_wrap_inline14984 .

This only gives the cost of the optimal matching. The actual alignment - what got matched, substituted, and deleted - can be reconstructed from the pattern/text and table without an auxiliary storage, once we have identified the cell with the lowest cost.

Listening to Part 13-11

How much space do we need?


Do we need to keep all O(mn) cells, since if we evaluate the recurrence filling in the columns of the matrix from left to right, we will never need more than two columns of cells to do what we need. Thus O(m) space is sufficient to evaluate the recurrence without changing the time complexity at all.

Unfortunately, because we won't have the full matrix we cannot reconstruct the alignment, as above.

Saving space in dynamic programming is very important. Since memory on any computer is limited, O(nm) space is more of a bottleneck than O(nm) time.

Fortunately, there is a clever divide-and-conquer algorithm which computes the actual alignment in O(nm) time and O(m) space.

Next: Lecture 13 - dynamic Up: Table of contents Previous: Lecture 11 - backtracking

Lecture 13 - dynamic programming applications

Listening to Part 13-1

16.3-5 Give an tex2html_wrap_inline15023 algorithm to find the longest montonically increasing sequence in a sequence of n numbers.   

Build an example first: (5, 2, 8, 7, 1, 6, 4)

Ask yourself what would you like to know about the first n-1 elements to tell you the answer for the entire sequence?

  1. The length of the longest sequence in tex2html_wrap_inline15027 . (seems obvious)
  2. The length of the longest sequence tex2html_wrap_inline15029 will extend! (not as obvious - this is the idea!)

Let tex2html_wrap_inline15031 be the length of the longest sequence ending with the ith character:

sequence 5 2 8 7 3 1 6 4
tex2html_wrap_inline15033 1 1 2 2 2 1 3 3

How do we compute si?



To find the longest sequence - we know it ends somewhere, so Length = tex2html_wrap_inline15041

Listening to Part 14-5

The Principle of Optimality

To use dynamic programming, the problem must observe the principle of optimality, that whatever the initial state is, remaining decisions must be optimal with regard the state following from the first decision.  

Combinatorial problems may have this property but may use too much memory/time to be efficient.

Example: The Traveling Salesman Problem

Let tex2html_wrap_inline15043 be the cost of the optimal tour for i to 1 that goes thru each of the other cities once  



Here there can be any subset of tex2html_wrap_inline15045 instead of any subinterval - hence exponential.

Still, with other ideas (some type of pruning or best-first search) it can be effective for combinatorial search.

Listening to Part 14-6

When can you use Dynamic Programming?

Dynamic programming computes recurrences efficiently by storing partial results. Thus dynamic programming can only be efficient when there are not too many partial results to compute!  

There are n! permutations of an n-element set - we cannot use dynamic programming to store the best solution for each subpermutation. There are tex2html_wrap_inline15049 subsets of an n-element set - we cannot use dynamic programming to store the best solution for each.

However, there are only n(n-1)/2 continguous substrings of a string, each described by a starting and ending point, so we can use it for string problems.

There are only n(n-1)/2 possible subtrees of a binary search tree, each described by a maximum and minimum key, so we can use it for optimizing binary search trees.

Dynamic programming works best on objects which are linearly ordered and cannot be rearranged - characters in a string, matrices in a chain, points around the boundary of a polygon, the left-to-right order of leaves in a search tree.

Whenever your objects are ordered in a left-to-right way, you should smell dynamic programming!

Listening to Part 14-7

Minimum Length Triangulation

A triangulation of a polygon is a set of non-intersecting diagonals which partitions the polygon into diagonals.  

The length of a triangulation is the sum of the diagonal lengths.

We seek to find the minimum length triangulation. For a convex polygon, or part thereof:

Once we identify the correct connecting vertex, the polygon is partitioned into two smaller pieces, both of which must be triangulated optimally!


Evaluation proceeds as in the matrix multiplication example - tex2html_wrap_inline15055 values of t, each of which takes O(j-i) time if we evaluate the sections in order of increasing size.

What if there are points in the interior of the polygon?

Listening to Part 14-8

Dynamic Programming and High Density Bar Codes

Symbol Technology has developed a new design for bar codes, PDF-417 that has a capacity of several hundred bytes. What is the best way to encode text for this design?  

They developed a complicated mode-switching data compression scheme.

Latch commands permanently put you in a different mode. Shift commands temporarily put you in a different mode.

Listening to Part 14-9

Originally, Symbol used a greedy algorithm to encode a string, making local decisions only. We realized that for any prefix, you want an optimal encoding which might leave you in every possible mode.

tex2html_wrap_inline15059 the cost of encoding the ith character and ending up in node j).

Our simple dynamic programming algorithm improved to capacity of PDF-417 by an average of tex2html_wrap_inline15061 !

Listening to Part 14-10

Dynamic Programming and Morphing

Morphing is the problem of creating a smooth series of intermediate images given a starting and ending image.   

The key problem is establishing a correspondence between features in the two images. You want to morph an eye to an eye, not an ear to an ear.

We can do this matching on a line-by-line basis:

This should sound like string matching, but with a different set of operations:

This algorithm was incorported into a morphing system, with the following results:


Next: Lecture 14 - data Up: Table of contents Previous: Lecture 12 - introduction

Lecture 14 - data structures for graphs

Listening to Part 14-1

Problem Solving Techniques

Most important: make sure you understand exactly what the question is asking - if not, you have no hope of answer it!!  

Never be afraid to ask for another explanation of a problem until it is clear.

Play around with the problem by constructing examples to get insight into it.

Ask yourself questions. Does the first idea which comes into my head work? If not, why not?

Am I using all information that I am given about the problem?

Read Polya's book How to Solve it.

Listening to Part 14-2

16-1: The Euclidean traveling-salesman problem is the problem of determining the shortest closed tour that connects a given set of n points in the plane.    

Bentley suggested simplifying the problem by restricting attention to bitonic tours, that is tours which start at the leftmost point, go strictly left to right to the rightmost point, and then go strictly right back to the starting point.

Describe an tex2html_wrap_inline15090 algorithm for finding the optimal bitonic tour. You may assume that no two points have the same x-coordinate. (Hint: scan left to right, maintaining optimal possibilities for the two parts of the tour.)
Make sure you understand what a bitonic tour is, or else it is hopeless.

First of all, play with the problem. Why isn't it trivial?

Listening to Part 14-3

Am I using all the information?

Why will they let us assume that no two x-coordinates are the same? What does the hint mean? What happens if I scan from left to right?

If we scan from left to right, we get an open tour which uses all points to the left of our scan line.  

In the optimal tour, the kth point is connected to exactly one point to the left of k. tex2html_wrap_inline15092 Once I decide which point that is, say x. I need the optimal partial tour where the two endpoints are x and k-1, because if it isn't optimal I could come up with a better one.

Listening to Part 14-4

Hey, I have got a recurrence! And look, the two parameters which describe my optimal tour are the two endpoints.

Let c[k,n] be the optimal cost partial tour where the two endpoints are k<n.

tex2html_wrap_inline15098 (when k < n-1)


c[0, 1]=d[0, 1]

c[n-1, n] takes O(n) to update, c[k, n] k<n-1 takes O(1) to update. Total time is tex2html_wrap_inline15116 .

But this doesn't quite give the tour, but just an open tour. We simply must figure where the last edge to n must go.


Listening to Part 15-1


A graph G consists of a set of vertices V together with a set E of vertex pairs or edges.    

Graphs are important because any binary relation is a graph, so graphs can be used to represent essentially any relationship.

Example: A network of roads, with cities as vertices and roads between cities as edges.

Example: An electronic circuit, with junctions as vertices as components as edges.

To understand many problems, we must think of them in terms of graphs!

Listening to Part 15-2

The Friendship Graph

Consider a graph where the vertices are people, and there is an edge between two people if and only if they are friends.  

This graph is well-defined on any set of people: SUNY SB, New York, or the world.

What questions might we ask about the friendship graph?

Listening to Part 15-5

Data Structures for Graphs

There are two main data structures used to represent graphs.

Adjacency Matrices

An adjacency matrix is an tex2html_wrap_inline15124 matrix, where M[i,j] = 0 iff there is no edge from vertex i to vertex j  

It takes tex2html_wrap_inline15128 time to test if (i,j) is in a graph represented by an adjacency matrix.

Can we save space if (1) the graph is undirected? (2) if the graph is sparse?

Listening to Part 15-6

Adjacency Lists

An adjacency list consists of a tex2html_wrap_inline15132 array of pointers, where the ith element points to a linked list of the edges incident on vertex i.  

tex2html_wrap15228 tex2html_wrap15230
To test if edge (i,j) is in the graph, we search the ith list for j, which takes tex2html_wrap_inline15136 , where tex2html_wrap_inline15138 is the degree of the ith vertex.

Note that tex2html_wrap_inline15140 can be much less than n when the graph is sparse. If necessary, the two copies of each edge can be linked by a pointer to facilitate deletions.

Listening to Part 15-7

Tradeoffs Between Adjacency Lists and Adjacency Matrices

Comparison Winner
Faster to test if (x, y) exists? matrices
Faster to find vertex degree? lists
Less memory on small graphs? lists (m+n) vs. tex2html_wrap_inline15146
Less memory on big graphs? matrices (small win)
Edge insertion or deletion? matrices O(1)
Faster to traverse the graph? lists m+n vs. tex2html_wrap_inline15150
Better for most problems? lists

Both representations are very useful and have different properties, although adjacency lists are probably better for most problems.

Listening to Part 16-2

Traversing a Graph

One of the most fundamental graph problems is to traverse every edge and vertex in a graph. Applications include:  

For efficiency, we must make sure we visit each edge at most twice.

For correctness, we must do the traversal in a systematic way so that we don't miss anything.

Since a maze is just a graph, such an algorithm must be powerful enough to enable us to get out of an arbitrary maze.  

Listening to Part 16-3

Marking Vertices

The idea in graph traversal is that we must mark each vertex when we first visit it, and keep track of what have not yet completely explored.

For each vertex, we can maintain two flags:

We must also maintain a structure containing all the vertices we have discovered but not yet completely explored.

Initially, only a single start vertex is considered to be discovered.

To completely explore a vertex, we look at each edge going out of it. For each edge which goes to an undiscovered vertex, we mark it discovered and add it to the list of work to do.

Note that regardless of what order we fetch the next vertex to explore, each edge is considered exactly twice, when each of its endpoints are explored.

Listening to Part 16-4

Correctness of Graph Traversal

Every edge and vertex in the connected component is eventually visited.

Suppose not, ie. there exists a vertex which was unvisited whose neighbor was visited. This neighbor will eventually be explored so we would visit it:

Listening to Part 16-5

Traversal Orders

The order we explore the vertices depends upon what kind of data structure is used:

The three possible colors of each node reflect if it is unvisited (white), visited but unexplored (grey) or completely explored (black).

Listening to Part 16-6

Breadth-First Search



for each vertex tex2html_wrap_inline15152 do

color[u] = white

tex2html_wrap_inline15154 , ie. the distance from s

p[u] = NIL, ie. the parent in the BFS tree

color[u] = grey

d[s] = 0

p[s] = NIL


while tex2html_wrap_inline15164 do

u = head[Q]

for each tex2html_wrap_inline15168 do

if color[v] = white then

color[v] = gray

d[v] = d[u] + 1

p[v] = u



color[u] = black

Listening to Part 16-8

Depth-First Search

DFS has a neat recursive implementation which eliminates the need to explicitly use a stack.  

Discovery and final times are sometimes a convenience to maintain.


for each vertex tex2html_wrap_inline15180 do

color[u] = white

parent[u] = nil

time = 0

for each vertex tex2html_wrap_inline15188 do

if color[u] = white then DFS-VISIT[u]

Initialize each vertex in the main routine, then do a search from each connected component. BFS must also start from a vertex in each component to completely visit the graph.  


color[u] = grey (*u had been white/undiscovered*)

discover[u] = time

time = time+1

for each tex2html_wrap_inline15198 do

if color[v] = white then

parent[v] = u


color[u] = black (*now finished with u*)

finish[u] = time

time = time+1

Next: Lecture 15 - DFS Up: Table of contents Previous: Lecture 13 - dynamic

Lecture 15 - DFS and BFS

Listening to Part 15-8

23.1-5 - The square of a directed graph G=(V,E) is the graph tex2html_wrap_inline15270 such that tex2html_wrap_inline15272 iff for some tex2html_wrap_inline15274 , both tex2html_wrap_inline15276 and tex2html_wrap_inline15278 ; ie. there is a path of exactly two edges.  

Give efficient algorithms for both adjacency lists and matricies.

Given an adjacency matrix, we can check in constant time whether a given edge exists. To discover whether there is an edge tex2html_wrap_inline15280 , for each possible intermediate vertex v we can check whether (u,v) and (v,w) exist in O(1).

Since there are at most n intermediate vertices to check, and tex2html_wrap_inline15288 pairs of vertices to ask about, this takes tex2html_wrap_inline15290 time.

With adjacency lists, we have a list of all the edges in the graph. For a given edge (u,v), we can run through all the edges from v in O(n) time, and fill the results into an adjacency matrix of tex2html_wrap_inline15296 , which is initially empty.

It takes O(mn) to construct the edges, and tex2html_wrap_inline15300 to initialize and read the adjacency matrix, a total of O((n+m)n). Since tex2html_wrap_inline15304 unless the graph is disconnected, this is usually simplified to O(mn), and is faster than the previous algorithm on sparse graphs.  

Why is it called the square of a graph? Because the square of the adjacency matrix is the adjacency matrix of the square! This provides a theoretically faster algorithm.

Listening to Part 16-10

BFS Trees

If BFS is performed on a connected, undirected graph, a tree is defined by the edges involved with the discovery of new nodes:  

This tree defines a shortest path from the root to every other node in the tree.

The proof is by induction on the length of the shortest path from the root:

Listening to Part 16-11

The key idea about DFS

A depth-first search of a graph organizes the edges of the graph in a precise way.  

In a DFS of an undirected graph, we assign a direction to each edge, from the vertex which discover it:

In a DFS of a directed graph, every edge is either a tree edge or a black edge.

In a DFS of a directed graph, no cross edge goes to a higher numbered or rightward vertex. Thus, no edge from 4 to 5 is possible:

Listening to Part 16-12

Edge Classification for DFS

What about the other edges in the graph? Where can they go on a search?

Every edge is either:

On any particular DFS or BFS of a directed or undirected graph, each edge gets classified as one of the above.

Listening to Part 17-3

DFS Trees

The reason DFS is so important is that it defines a very nice ordering to the edges of the graph.

In a DFS of an undirected graph, every edge is either a tree edge or a back edge.  

Why? Suppose we have a forward edge. We would have encountered (4,1) when expanding 4, so this is a back edge.  

Suppose we have a cross-edge  


Paths in search trees

Where is the shortest path in a DFS?

It could use multiple back and tree edges, where BFS only uses tree edges.

DFS gives a better approximation of the longest path than BFS.

Listening to Part 17-4

Topological Sorting

A directed, acyclic graph is a directed graph with no directed cycles.    

A topological sort of a graph is an ordering on the vertices so that all edges go from left to right.

Only a DAG can have a topological sort.

Any DAG has (at least one) topological sort.

Listening to Part 17-5

Applications of Topological Sorting

Topological sorting is often useful in scheduling jobs in their proper sequence. In general, we can use it to order things given constraints, such as a set of left-right constraints on the positions of objects.

Example: Dressing schedule from CLR.

Example: Identifying errors in DNA fragment assembly.  

Certain fragments are constrained to be to the left or right of other fragments, unless there are errors.

tex2html_wrap15348 tex2html_wrap15350
Solution - build a DAG representing all the left-right constraints. Any topological sort of this DAG is a consistant ordering. If there are cycles, there must be errors.

A DFS can test if a graph is a DAG (it is iff there are no back edges - forward edges are allowed for DFS on directed graph).

Listening to Part 17-6


Theorem: Arranging vertices in decreasing order of DFS finishing time gives a topological sort of a DAG.

Proof: Consider any directed edge u,v, when we encounter it during the exploration of vertex u:

Thus we can do topological sorting in O(n+m) time.

Listening to Part 17-8

Articulation Vertices

Suppose you are a terrorist, seeking to disrupt the telephone network. Which station do you blow up?    

An articulation vertex is a vertex of a connected graph whose deletion disconnects the graph.

Clearly connectivity is an important concern in the design of any network.  

Articulation vertices can be found in O(n(m+n)) - just delete each vertex to do a DFS on the remaining graph to see if it is connected.

Listening to Part 17-9

A Faster O(n+m) DFS Algorithm

Theorem: In a DFS tree, a vertex v (other than the root) is an articulation vertex iff v is not a leaf and some subtree of v has no back edge incident until a proper ancestor of v.

Proof: (1) v is an articulation vertex tex2html_wrap_inline15322 v cannot be a leaf.

Why? Deleting v must seperate a pair of vertices x and y. Because of the other tree edges, this cannot happen unless y is a decendant of v.

Listening to Part 17-10

v separating x,y implies there is no back edge in the subtree of y to a proper ancestor of v.

(2) Conditions tex2html_wrap_inline15324 v is a non-root articulation vertex. v separates any ancestor of v from any decendant in the appropriate subtree.

Actually implementing this test in O(n+m) is tricky - but believable once you accept this theorem.

Next: Lecture 16 - applications Up: Table of contents Previous: Lecture 14 - data

Lecture 16 - applications of DFS and BFS

Listening to Part 16-1

23.2-6 Give an efficient algorithm to test if a graph is bipartite.

Bipartite means the vertices can be colored red or black such that no edge links vertices of the same color.   

Suppose we color a vertex red - what color must its neighbors be? black!

We can augment either BFS or DFS when we first discover a new vertex, color it opposited its parents, and for each other edge, check it doesn't link two vertices of the same color. The first vertex in any connected component can be red or black!

Bipartite graphs arise in many situations, and special algorithms are often available for them. What is the interpretation of a bipartite ``had-sex-with'' graph?

How would you break people into two groups such that no group contains a pair of people who hate each other?

Listening to Part 17-1

23.4-3 Give an O(n) algorithm to test whether an undirected graph contains a cycle.  

If you do a DFS, you have a cycle iff you have a back edge. This gives an O(n+m) algorithm. But where does the m go? If the graph contains more than n-1 edges, it must contain a cycle! Thus we never need look at more than n edges if we are given an adjacency list representation!

Listening to Part 17-7

23.4-5 Show that you can topologically sort in O(n+m) by repeatedly deleting vertices of degree 0.  

The correctness of this algorithm follows since in a DAG there must always be a vertex of indegree 0, and such a vertex can be first in topological sort. Suppose each vertex is initialized with its indegree (do DFS on G to get this). Deleting a vertex takes O(degree v). Reduce the indegree of each efficient vertex - and keep a list of degree-0 vertices to delete next.

Time: tex2html_wrap_inline15379

Listening to Part 17-12

Strongly Connected Components

A directed graph is strongly connected iff there is a directed path between any two vertices.  

The strongly connected components of a graph is a partition of the vertices into subsets (maximal) such that each subset is strongly connected.

Observe that no vertex can be in two maximal components, so it is a partition.

There is an amazingly elegant, linear time algorithm to find the strongly connected components of a directed graph, using DFS.

Listening to Part 17-13

This algorithm takes O(n+m), but why does it compute strongly connected components?

Lemma: If two vertices are in the same strong component, no path between them ever leaves the component.

Lemma: In any DFS forest, all vertices in the same strongly connected component are in the same tree.

Proof: Consider the first vertex v in the component to be discovered. Everything in the component is reachable from it, so we will traverse it before finishing with v.

Listening to Part 17-14

What does DFS( tex2html_wrap_inline15391 , v) Do?

It tells you what vertices have directed paths to v, while DFS( tex2html_wrap_inline15393 ,v) tells what vertices have directed paths from v. But why must any vertex in the search tree of DFS( tex2html_wrap_inline15395 , v) also have a path from u?

Because there is no edge from any previous DFS tree into the last tree!! Because we ordered the vertices by decreasing order of finish time, we can peel off the strongly connected components from right to left just be doing a DFS( tex2html_wrap_inline15397 ).

Listening to Part 17-16

Example of Strong Components Algorithm

9, 10, 11, 12 can reach 9, oldest remaining finished is 5.

5, 6, 8 can reach 5, oldest remaining is 7.

7 can reach 7, oldest remaining is 1.

1, 2, 3 can reach 1, oldest remaining is 4.

4 can reach 4.


Next: Lecture 17 - minimum Up: Table of contents Previous: Lecture 15 - DFS

Lecture 17 - minimum spanning trees

Listening to Part 19-4

24.2-6 Describe an efficent algorithm that, given an undirected graph G, determines a spanning tree G whose largest edge weight is minimum over all spanning trees of G.  

First, make sure you understand the question

``Hey, doesn't Kruskal's algorithm do something like this.''  

Certainly! Since Krushal's algorithm considers the edges in order of increasing weight, and stops the moment these edges form a connected graph, the tree it gives must minimize the edge weight.

``Hey, but then why doesn't Prim's algorithm also work?''

It gives the same thing as Kruskal's algorithm, so it must be true that any minimum spanning tree minimizes the maximum edge weight!

Proof: Give me a MST and consider the largest edge weight,

Listening to Part 19-5

Deleting it disconnects the MST. If there was a lower edge connects the two subtrees, I didn't have a MST!

Listening to Part 18-2

Minimum Spanning Trees

A tree is a connected graph with no cycles. A spanning tree is a subgraph of G which has the same set of vertices of G and is a tree.  

A minimum spanning tree of a weighted graph G is the spanning tree of G whose edges sum to minimum weight.  

There can be more than one minimum spanning tree in a graph tex2html_wrap_inline15428 consider a graph with identical weight edges.

The minimum spanning tree problem has a long history - the first algorithm dates back at least to 1926!.

Minimum spanning tree is always taught in algorithm courses since (1) it arises in many applications, (2) it is an important example where greedy algorithms always give the optimal answer, and (3) Clever data structures are necessary to make it work.

In greedy algorithms, we make the decision of what next to do by selecting the best local option from all available choices - without regard to the global structure.

Listening to Part 18-3

Applications of Minimum Spanning Trees

Minimum spanning trees are useful in constructing networks, by describing the way to connect a set of sites using the smallest total amount of wire. Much of the work on minimum spanning (and related Steiner) trees has been conducted by the phone company.  

Minimum spanning trees provide a reasonable way for clustering points in space into natural groups.  

When the cities are points in the Euclidean plane, the minimum spanning tree provides a good heuristic for traveling salesman problems. The optimum traveling salesman tour is at most twice the length of the minimum spanning tree.  

Listening to Part 18-4

Prim's Algorithm

If G is connected, every vertex will appear in the minimum spanning tree. If not, we can talk about a minimum spanning forest.  

Prim's algorithm starts from one vertex and grows the rest of the tree an edge at a time.

As a greedy algorithm, which edge should we pick? The cheapest edge with which can grow the tree by one vertex without creating a cycle.

During execution we will label each vertex as either in the tree, fringe - meaning there exists an edge from a tree vertex, or unseen - meaning the vertex is more than one edge away.

Select an arbitrary vertex to start.

While (there are fringe vertices)

select minimum weight edge between tree and fringe

add the selected edge and vertex to the tree

Clearly this creates a spanning tree, since no cycle can be introduced via edges between tree and fringe vertices, but is it minimum?

Listening to Part 18-5

Why is Prim's algorithm correct?

Don't be scared by the proof - the reason is really quite basic:

Theorem: Let G be a connected, weighted graph and let tex2html_wrap_inline15430 be a subset of the edges in a MST tex2html_wrap_inline15432 . Let V' be the vertices incident with edges in E'. If (x,y) is an edge of minimum weight such that tex2html_wrap_inline15440 and y is not in V', then tex2html_wrap_inline15444 is a subset of a minimum spanning tree.

Proof: If the edge is in T, this is trivial.

Suppose (x,y) is not in T Then there must be a path in T from x to y since T is connected. If (v,w) is the first edge on this path with one edge in V', if we delete it and replace it with (x, y) we get a spanning tree.

This tree must have smaller weight than T, since W(v,w)>W(x,y). Thus T could not have been the MST.

Prim's Algorithm is correct!

Thus we cannot go wrong with the greedy strategy the way we could with the traveling salesman problem.

Listening to Part 18-6

But how fast is Prim's?

That depends on what data structures are used. In the simplest implementation, we can simply mark each vertex as tree and non-tree and search always from scratch:

Select an arbitrary vertex to start.

While (there are non-tree vertices)

select minimum weight edge between tree and fringe

add the selected edge and vertex to the tree

This can be done in O(n m) time, by doing a DFS or BFS to loop through all edges, with a constant time test per edge, and a total of n iterations.

Can we do faster? If so, we need to be able to identify fringe vertices and the minimum cost edge associated with it, fast. We will augment an adjacency list with fields maintaining fringe information.


fringelink pointer to next vertex in fringe list.
fringe weight cheapest edge linking v to l.
parent other vertex with v having fringeweight.
status intree, fringe, unseen.
adjacency list the list of edges.

Listening to Part 18-8

Finding the minimum weight fringe-edge takes O(n) time - just bump through fringe list.

After adding a vertex to the tree, running through its adjacency list to update the cost of adding fringe vertices (there may be a cheaper way through the new vertex) can be done in O(n) time.

Total time is tex2html_wrap_inline15462 .

Listening to Part 18-9

Kruskal's Algorithm

Since an easy lower bound argument shows that every edge must be looked at to find the minimum spanning tree, and the number of edges tex2html_wrap_inline15464 , Prim's algorithm is optimal in the worst case. Is that all she wrote?  

The complexity of Prim's algorithm is independent of the number of edges. Can we do better with sparse graphs? Yes!  

Kruskal's algorithm is also greedy. It repeatedly adds the smallest edge to the spanning tree that does not create a cycle. Obviously, this gives a spanning tree, but is it minimal?

Listening to Part 18-10

Why is Kruskal's algorithm correct?

Theorem: Let G be a weighted graph and let tex2html_wrap_inline15466 . If E' is contained in a MST T and e is the smallest edge in E-E' which does not create a cycle, tex2html_wrap_inline15472 .

Proof: As before, suppose e is not in T. Adding e to T makes a cycle. Deleting another edge from this cycle leaves a connected graph, and if it is one from E-E' the cost of this tree goes down. Since such an edge exists, T could not be a MST.

Listening to Part 18-11

How fast is Kruskal's algorithm?

What is the simplest implementation?

The total time is O(mn), but can we do better?

Kruskal's algorithm builds up connected components. Any edge where both vertices are in the same connected component create a cycle. Thus if we can maintain which vertices are in which component fast, we do not have test for cycles!

Put the edges in a heap


while (count < n-1) do

get next edge (v,w)

if (component (v) tex2html_wrap_inline15488 component(w))

add to T

component (v)=component(w)

If we can test components in tex2html_wrap_inline15490 , we can find the MST in tex2html_wrap_inline15492 !

Question: Is tex2html_wrap_inline15494 better than tex2html_wrap_inline15496 ?

Listening to Part 19-6

Union-Find Programs

Our analysis that Kruskal's MST algorithm is tex2html_wrap_inline15498 requires a fast way to test whether an edge links two vertices in the same connected component.  

Thus we need a data structure for maintaining sets which can test if two elements are in the same and merge two sets together. These can be implemented by UNION and FIND operations:

Is tex2html_wrap_inline15500

t= Find tex2html_wrap_inline15504

u= Find tex2html_wrap_inline15508

Return (Is t=u?)

Make tex2html_wrap_inline15512



Union(t, u)

Find returns the name of the set and Union sets the members of t to have the same name as u.

We are interested in minimizing the time it takes to execute any sequence of unions and finds.

A simple implementation is to represent each set as a tree, with pointers from a node to its parent. Each element is contained in a node, and the name of the set is the key at the root:

Listening to Part 19-7

In the worst case, these structures can be very unbalanced:

For i = 1 to n/2 do


For i = 1 to n/2 do


We want the limit the height of our trees which are effected by UNIONs. When we union, we can make the tree with fewer nodes the child.

Since the number of nodes is related to the height, the height of the final tree will increase only if both subtrees are of equal height!

Lemma: If Union(t,v) attaches the root of v as a subtree of t iff the number of nodes in t is greater than or equal to the number in v, after any sequence of unions, any tree with h/4 nodes has height at most tex2html_wrap_inline15532 .

Listening to Part 19-8

Proof: By induction on the number of nodes k, k=1 has height 0.

Assume true to k-1 nodes. Let tex2html_wrap_inline15536 be the height of the tree tex2html_wrap_inline15538

If tex2html_wrap_inline15540 then tex2html_wrap_inline15542

If tex2html_wrap_inline15544 , then tex2html_wrap_inline15546 .


Listening to Part 19-9

Can we do better?

We can do unions and finds in tex2html_wrap_inline15550 , good enough for Kruskal's algorithm. But can we do better?

The ideal Union-Find tree has depth 1:

On a find, if we are going down a path anyway, why not change the pointers to point to the root?

This path compression will let us do better than tex2html_wrap_inline15552 for n union-finds.  

O(n)? Not quite ... Difficult analysis shows that it takes tex2html_wrap_inline15556 time, where tex2html_wrap_inline15558 is the inverse Ackerman function and tex2html_wrap_inline15560 number of atoms in the universe)=5.

Next: Lecture 18 - shortest Up: Table of contents Previous: Lecture 16 - applications

Lecture 18 - shortest path algorthms

Listening to Part 20-7

25.1-1 Give two more shortest path trees for the following graph:


Run through Dijkstra's algorithm, and see where there are ties which can be arbitrarily selected.  

There are two choices for how to get to the third vertex x, both of which cost 5.

There are two choices for how to get to vertex v, both of which cost 9.

Listening to Part 19-1

Lessons from the Backtracking contest

Listening to Part 19-3

Winning Optimizations

Listening to Part 19-10

Shortest Paths

Finding the shortest path between two nodes in a graph arises in many different applications:  

Listening to Part 20-1

Shortest Paths and Sentence Disambiguation

In our work on reconstructing text typed on an (overloaded) telephone keypad, we had to select which of many possible interpretations was most likely.   

We constructed a graph where the vertices were the possible words/positions in the sentence, with an edge between possible neighboring words.

Listening to Part 20-2

The weight of each edge is a function of the probability that these two words will be next to each other in a sentence. `hive me' would be less than `give me', for example.

The final system worked extremely well - identifying over 99% of characters correctly based on grammatical and statistical constraints.

Dynamic programming (the Viterbi algorithm) can be used on the sentences to obtain the same results, by finding the shortest paths in the underlying DAG.  

Listening to Part 20-3

Finding Shortest Paths

In an unweighted graph, the cost of a path is just the number of edges on the shortest path, which can be found in O(n+m) time via breadth-first search.  

In a weighted graph, the weight of a path between two vertices is the sum of the weights of the edges on a path.

BFS will not work on weighted graphs because sometimes visiting more edges can lead to shorter distance, ie. 1+1+1+1+1+1+1 < 10.

Note that there can be an exponential number of shortest paths between two nodes - so we cannot report all shortest paths efficiently.

Note that negative cost cycles render the problem of finding the shortest path meaningless, since you can always loop around the negative cost cycle more to reduce the cost of the path.

Thus in our discussions, we will assume that all edge weights are positive. Other algorithms deal correctly with negative cost edges.

Minimum spanning trees are uneffected by negative cost edges.

Listening to Part 20-4

Dijkstra's Algorithm

We can use Dijkstra's algorithm to find the shortest path between any two vertices s and t in G.  

The principle behind Dijkstra's algorithm is that if tex2html_wrap_inline15602 is the shortest path from s to t, then tex2html_wrap_inline15604 had better be the shortest path from s to x.

This suggests a dynamic programming-like strategy, where we store the distance from s to all nearby nodes, and use them to find the shortest path to more distant nodes.

The shortest path from s to s, d(s,s)=0. If all edge weights are positive, the smallest edge incident to s, say (s,x), defines d(s,x).

We can use an array to store the length of the shortest path to each node. Initialize each to tex2html_wrap_inline15612 to start.

Soon as we establish the shortest path from s to a new node x, we go through each of its incident edges to see if there is a better way from s to other nodes thru x.

Listening to Part 20-5


for i=1 to n, tex2html_wrap_inline15618

for each edge (s,v), dist[v]=d(s,v)


while ( tex2html_wrap_inline15624 )

select v such that tex2html_wrap_inline15626

for each (v,x), tex2html_wrap_inline15630



Complexity tex2html_wrap_inline15634 tex2html_wrap_inline15636 if we use adjacency lists and a Boolean array to mark what is known.

This is essentially the same as Prim's algorithm.

An tex2html_wrap_inline15638 implementation of Dijkstra's algorithm would be faster for sparse graphs, and comes from using a heap of the vertices (ordered by distance), and updating the distance to each vertex (if necessary) in tex2html_wrap_inline15640 time for each edge out from freshly known vertices.

Even better, tex2html_wrap_inline15642 follows from using Fibonacci heaps, since they permit one to do a decrease-key operation in O(1) amortized time.

Listening to Part 20-8

All-Pairs Shortest Path

Notice that finding the shortest path between a pair of vertices (s,t) in worst case requires first finding the shortest path from s to all other vertices in the graph.  

Many applications, such as finding the center or diameter of a graph, require finding the shortest path between all pairs of vertices.

We can run Dijkstra's algorithm n times (once from each possible start vertex) to solve all-pairs shortest path problem in tex2html_wrap_inline15648 . Can we do better?

Improving the complexity is an open question but there is a super-slick dynamic programming algorithm which also runs in tex2html_wrap_inline15650 .

Listening to Part 20-9

Dynamic Programming and Shortest Paths

The four-step approach to dynamic programming is:

  1. Characterize the structure of an optimal solution.
  2. Recursively define the value of an optimal solution.
  3. Compute this recurrence in a bottom-up fashion.
  4. Extract the optimal solution from computed information.

From the adjacency matrix, we can construct the following matrix:

tex2html_wrap_inline15652 , if tex2html_wrap_inline15654 and tex2html_wrap_inline15656 is not in E

D[i,j] = w(i,j), if tex2html_wrap_inline15660

D[i,j] = 0, if i=j

This tells us the shortest path going through no intermediate nodes.

There are several ways to characterize the shortest path between two nodes in a graph. Note that the shortest path from i to j, tex2html_wrap_inline15666 , using at most M edges consists of the shortest path from i to k using at most M-1 edges + W(k, j) for some k.

Listening to Part 20-10

This suggests that we can compute all-pair shortest path with an induction based on the number of edges in the optimal path.

Let tex2html_wrap_inline15670 be the length of the shortest path from i to j using at most m edges.

What is tex2html_wrap_inline15672 ?


What if we know tex2html_wrap_inline15674 for all i,j?


since w[k, k]=0

This gives us a recurrence, which we can evaluate in a bottom up fashion:

for i=1 to n

for j=1 to n


for k=1 to n

tex2html_wrap_inline15686 =Min( tex2html_wrap_inline15688 , tex2html_wrap_inline15690 )

This is an tex2html_wrap_inline15692 algorithm just like matrix multiplication, but it only goes from m to m+1 edges.

Listening to Part 20-11

Since the shortest path between any two nodes must use at most n edges (unless we have negative cost cycles), we must repeat that procedure n times (m=1 to n) for an tex2html_wrap_inline15696 algorithm.

We can improve this to tex2html_wrap_inline15698 with the observation that any path using at most 2m edges is the function of paths using at most m edges each. This is just like computing tex2html_wrap_inline15700 . So a logarithmic number of multiplications suffice for exponentiation.

Although this is slick, observe that even tex2html_wrap_inline15702 is slower than running Dijkstra's algorithm starting from each vertex!

Listening to Part 20-12

The Floyd-Warshall Algorithm


An alternate recurrence yields a more efficient dynamic programming formulation. Number the vertices from 1 to n.

Let tex2html_wrap_inline15704 be the shortest path from i to j using only vertices from 1, 2,..., k as possible intermediate vertices.

What is tex2html_wrap_inline15708 ? With no intermediate vertices, any path consists of at most one edge, so tex2html_wrap_inline15710 .

In general, adding a new vertex k+1 helps iff a path goes through it, so


Although this looks similar to the previous recurrence, it isn't. The following algorithm implements it:


for k=1 to n

for i=1 to n

for j=1 to n


This obviously runs in tex2html_wrap_inline15722 time, which asymptotically is no better than a calls to Dijkstra's algorithm. However, the loops are so tight and it is so short and simple that it runs better in practice by a constant factor.

Next: Lecture 19 - satisfiability Up: Table of contents Previous: Lecture 17 - minimum

Lecture 19 - satisfiability

Listening to Part 21-7

The Theory of NP-Completeness

Several times this semester we have encountered problems for which we couldn't find efficient algorithms, such as the traveling salesman problem. We also couldn't prove an exponential time lower bound for the problem.  

By the early 1970s, literally hundreds of problems were stuck in this limbo. The theory of NP-Compleness, developed by Stephen Cook and Richard Karp, provided the tools to show that all of these problems were really the same problem.

Listening to Part 21-8

Polynomial vs. Exponential Time


n f(n) = n tex2html_wrap_inline15753 tex2html_wrap_inline15755 f(n) = n!
10 0.01 tex2html_wrap_inline15759 s 0.1 tex2html_wrap_inline15761 s 1 tex2html_wrap_inline15763 s 3.63 ms
20 0.02 tex2html_wrap_inline15765 s 0.4 tex2html_wrap_inline15767 s 1 ms 77.1 years
30 0.03 tex2html_wrap_inline15769 s 0.9 tex2html_wrap_inline15771 s 1 sec tex2html_wrap_inline15773 years
40 0.04 tex2html_wrap_inline15775 s 1.6 tex2html_wrap_inline15777 s 18.3 min
50 0.05 tex2html_wrap_inline15779 s 2.5 tex2html_wrap_inline15781 s 13 days
100 0.1 tex2html_wrap_inline15783 s 10 tex2html_wrap_inline15785 s tex2html_wrap_inline15787 years
1,000 1.00 tex2html_wrap_inline15789 s 1 ms

Listening to Part 21-9

The Main Idea

Suppose I gave you the following algorithm to solve the bandersnatch problem:   


Convert G to an instance of the Bo-billy problem Y.

Call the subroutine Bo-billy on Y to solve this instance.

Return the answer of Bo-billy(Y) as the answer to G.

Such a translation from instances of one type of problem to instances of another type such that answers are preserved is called a reduction.  

Now suppose my reduction translates G to Y in O(P(n)):

  1. If my Bo-billy subroutine ran in O(P'(n)) I can solve the Bandersnatch problem in O(P(n)+P'(n))
  2. If I know that tex2html_wrap_inline15797 is a lower-bound to compute Bandersnatch, then tex2html_wrap_inline15799 must be a lower-bound to compute Bo-billy.

The second argument is the idea we use to prove problems hard!

Listening to Part 21-10

Convex Hull and Sorting

A nice example of a reduction goes from sorting numbers to the convex hull problem:   

We must translate each number to a point. We can map x to tex2html_wrap_inline15801 .

Why? That means each integer is mapped to a point on the parabola tex2html_wrap_inline15803 .

Listening to Part 21-11

Since this parabola is convex, every point is on the convex hull. Further since neighboring points on the convex hull have neighboring x values, the convex hull returns the points sorted by x-coordinate, ie. the original numbers.


For each tex2html_wrap_inline15805 , create point tex2html_wrap_inline15807 .

Call subroutine convex-hull on this point set.

From the leftmost point in the hull,

read off the points from left to right.

Creating and reading off the points takes O(n) time.

What does this mean? Recall the sorting lower bound of tex2html_wrap_inline15811 . If we could do convex hull in better than tex2html_wrap_inline15813 , we could sort faster than tex2html_wrap_inline15815 - which violates our lower bound.

Thus convex hull must take tex2html_wrap_inline15817 as well!!!

Observe that any tex2html_wrap_inline15819 convex hull algorithm also gives us a complicated but correct tex2html_wrap_inline15821 sorting algorithm as well.

Listening to Part 22-2

What is a problem?

A problem is a general question, with parameters for the input and conditions on what is a satisfactory answer or solution.   

An instance is a problem with the input parameters specified.

Example: The Traveling Salesman

Problem: Given a weighted graph G, what tour tex2html_wrap_inline15823 minimizes tex2html_wrap_inline15825 .

Instance: tex2html_wrap_inline15827 , tex2html_wrap_inline15829 , tex2html_wrap_inline15831 , tex2html_wrap_inline15833 , tex2html_wrap_inline15835 , tex2html_wrap_inline15837

Solution: tex2html_wrap_inline15839 cost= 27

A problem with answers restricted to yes and no is called a decision problem. Most interesting optimization problems can be phrased as decision problems which capture the essence of the computation.  

Listening to Part 22-3

Example: The Traveling Salesman Decision Problem.  

Given a weighted graph G and integer k, does there exist a traveling salesman tour with cost tex2html_wrap_inline15841 k?

Using binary search and the decision version of the problem we can find the optimal TSP solution.

For convenience, from now on we will talk only about decision problems.

Note that there are many possible ways to encode the input graph: adjacency matrices, edge lists, etc. All reasonable encodings will be within polynomial size of each other.

The fact that we can ignore minor differences in encoding is important. We are concerned with the difference between algorithms which are polynomial and exponential in the size of the input.

Listening to Part 22-4


Consider the following logic problem:   

Instance: A set V of variables and a set of clauses C over V.

Question: Does there exist a satisfying truth assignment for C?

Example 1: tex2html_wrap_inline15843 and tex2html_wrap_inline15845

A clause is satisfied when at least one literal in it is TRUE. C is satisfied when tex2html_wrap_inline15847 TRUE.

Example 2: tex2html_wrap_inline15849 ,


Although you try, and you try, and you try and you try, you can get no satisfaction.  

There is no satisfying assigment since tex2html_wrap_inline15851 must be FALSE (third clause), so tex2html_wrap_inline15853 must be FALSE (second clause), but then the first clause is unsatisfiable!

For various reasons, it is known that satisfiability is a hard problem. Every top-notch algorithm expert in the world (and countless other, lesser lights) have tried to come up with a fast algorithm to test whether a given set of clauses is satisfiable, but all have failed.

Listening to Part 22-5

Further, many strange and impossible-to-believe things have been shown to be true if someone in fact did find a fast satisfiability algorithm.

Clearly, Satisfiability is in NP, since we can guess an assignment of TRUE, FALSE to the literals and check it in polynomial time.

Listening to Part 22-10

P versus NP

The precise distinction between whether a problem is in P or NP is somewhat technical, requiring formal language theory and Turing machines to state correctly.  

However, intuitively a problem is in P, (ie. polynomial) if it can be solved in time polynomial in the size of the input.

A problem is in NP if, given the answer, it is possible to verify that the answer is correct within time polynomial in the size of the input.  

Example P - Is there a path from s to t in G of length less than k.

Example NP - Is there a TSP tour in G of length less than k. Given the tour, it is easy to add up the costs and convince me it is correct.

Example not NP - How many TSP tours are there in G of length less than k. Since there can be an exponential number of them, we cannot count them all in polynomial time.

Don't let this issue confuse you - the important idea here is of reductions as a way of proving hardness.

Listening to Part 22-7


Instance: A collection of clause C where each clause contains exactly 3 literals, boolean variable v.  

Question: Is there a truth assignment to v so that each clause is satisfied?

Note that this is a more restricted problem than SAT. If 3-SAT is NP-complete, it implies SAT is NP-complete but not visa-versa, perhaps long clauses are what makes SAT difficult?!

After all, 1-Sat is trivial!

Theorem: 3-SAT is NP-Complete

Proof: 3-SAT is NP - given an assignment, just check that each clause is covered. To prove it is complete, a reduction from tex2html_wrap_inline15855 must be provided. We will transform each clause independantly based on its length.

Suppose the clause tex2html_wrap_inline15857 contains k literals.

If none of the original variables in a clause are TRUE, there is no way to satisfy all of them using the additional variable:


But if any literal is TRUE, we have n-3 free variables and n-3 remaining 3-clauses, so we can satisfy each of them. tex2html_wrap_inline15893

Since any SAT solution will also satisfy the 3-SAT instance and any 3-SAT solution sets variables giving a SAT solution - the problems are equivallent. If there were n clauses and m total literals in the SAT instance, this transform takes O(m) time, so SAT and 3-SAT.

Note that a slight modification to this construction would prove 4-SAT, or 5-SAT,... also NP-complete. However, it breaks down when we try to use it for 2-SAT, since there is no way to stuff anything into the chain of clauses. It turns out that resolution gives a polynomial time algorithm for 2-SAT.

Listening to Part 22-9

Having at least 3-literals per clause is what makes the problem difficult. Now that we have shown 3-SAT is NP-complete, we may use it for further reductions. Since the set of 3-SAT instances is smaller and more regular than the SAT instances, it will be easier to use 3-SAT for future reductions. Remember the direction to reduction!


Next: Lecture 20 - integer Up: Table of contents Previous: Lecture 18 - shortest

Lecture 20 - integer programming

Listening to Part 22-6

36.4-5 Give a polynomial-time algorithm to satisfy Boolean formulas in disjunctive normal form.  

Satisfying one clause in DFS satisfied the whole formula. One clause can always be satisfied iff it does not contain both a variable and its complement.

Why not use this reduction to give a polynomial-time algorithm for 3-SAT? The DNF formula can become exponentially large and hence the reduction cannot be done in polynomial time.

Listening to Part 24-2

A Perpetual Point of Confusion

Note carefully the direction of the reduction.  

We must transform every instance of a known NP-complete problem to an instance of the problem we are interested in. If we do the reduction the other way, all we get is a slow way to solve x, by using a subroutine which probably will take exponential time.

This always is confusing at first - it seems bass-ackwards. Make sure you understand the direction of reduction now - and think back to this when you get confused.

Listening to Part 24-3

Integer Programming

Instance: A set v of integer variables, a set of inequalities over these variables, a function f(v) to maximize, and integer B.  

Question: Does there exist an assignment of integers to v such that all inequalities are true and tex2html_wrap_inline15937 ?





A solution to this is tex2html_wrap_inline15939 , tex2html_wrap_inline15941 .





Since the maximum value of f(v) given the constraints is tex2html_wrap_inline15945 , there is no solution.

Theorem: Integer Programming is NP-Hard

Proof: By reduction from Satisfiability

Any set instance has boolean variables and clauses. Our Integer programming problem will have twice as many variables as the SAT instance, one for each variable and its compliment, as well as the following inequalities:

Listening to Part 24-4

For each variable tex2html_wrap_inline15947 in the set problem, we will add the following constraints:

Our maximization function and bound are relatively unimportant: tex2html_wrap_inline15961 B=0.

Clearly this reduction can be done in polynomial time.

Listening to Part 24-5

We must show:

  1. Any SAT solution gives a solution to the IP problem.

    In any SAT solution, a TRUE literal corresponds to a 1 in the IP, since if the expression is SATISFIED, at least one literal per clause in TRUE, so the sum in the inequality is tex2html_wrap_inline15965 1.

  2. Any IP solution gives a SAT solution.

    Given a solution to this IP instance, all variables will be 0 or 1. Set the literals correspondly to 1 variable TRUE and the 0 to FALSE. No boolean variable and its complement will both be true, so it is a legal assignment with also must satisfy the clauses.

Neat, sweet, and NP-complete!

Listening to Part 24-6

Things to Notice

  1. The reduction preserved the structure of the problem. Note that the reduction did not solve the problem - it just put it in a different format.
  2. The possible IP instances which result are a small subset of the possible IP instances, but since some of them are hard, the problem in general must be hard.
  3. The transformation captures the essence of why IP is hard - it has nothing to do with big coefficients or big ranges on variables; for restricting to 0/1 is enough. A careful study of what properties we do need for our reduction tells us a lot about the problem.
  4. It is not obvious that IP tex2html_wrap_inline15969 NP, since the numbers assigned to the variables may be too large to write in polynomial time - don't be too hasty!

Next: Lecture 21 - vertex Up: Table of contents Previous: Lecture 19 - satisfiability

Lecture 21 - vertex cover

Listening to Part 24-7

36.5-2 - Given an integer tex2html_wrap_inline15978 matrix A, and in integer m-vector b, the 0-1 integer programming problem asks whether there is an integer n-vector x with elements in the set (0,1) such that tex2html_wrap_inline15982 . Prove that 0-1 integer programming is NP-hard (hint: reduce from 3-SAT).  

This is really the exact same problem as the previous integer programming problem, slightly concealed by:

Listening to Part 24-8

Vertex Cover

Instance: A graph G=(V, E), and integer tex2html_wrap_inline15992  

Question: Is there a subset of at most k vertices such that every tex2html_wrap_inline15994 has at least one vertex in the subset?

Here, four of the eight vertices are enough to cover. It is trivial to find a vertex cover of a graph - just take all the vertices. The tricky part is to cover with as small a set as possible.

Theorem: Vertex cover is NP-complete.  

Proof: VC in in NP - guess a subset of vertices, count them, and show that each edge is covered.

To prove completeness, we show 3-SAT and VC. From a 3-SAT instance with n variables and C clauses, we construct a graph with 2N+3C vertices.

Listening to Part 24-9

For each variable, we create two vertices connected by an edge:

To cover each of these edges, at least n vertices must be in the cover, one for each pair. For each clause, we create three new vertices, one for each literal in each clause. Connect these in a triangle.

At least two vertices per triangle must be in the cover to take care of edges in the triangle, for a total of at least 2C vertices.

Finally, we will connect each literal in the flat structure to the corresponding vertices in the triangles which share the same literal.

Listening to Part 24-10

Claim: This graph will have a vertex cover of size N+2C if and only if the expression is satisfiable.

By the earlier analysis, any cover must have at least N+2C vertices. To show that our reduction is correct, we must show that:

  1. Every satisfying truth assignment gives a cover.

    Select the N vertices cooresponding to the TRUE literals to be in the cover. Since it is a satisfying truth assignment, at least one of the three cross edges associated with each clause must already be covered - pick the other two vertices to complete the cover.

  2. Every vertex cover gives a satisfying truth assignment.

    Every vertex cover must contain n first stage vertices and 2C second stage vertices. Let the first stage vertices define the truth assignment.

    To give the cover, at least one cross-edge must be covered, so the truth assignment satisfies.

For a cover to have N+2C vertices, all the cross edges must be incident on a selected vertex.

Let the N selected vertices from the first stage coorespond to TRUE literals. If there is a satisfying truth assignment, that means at least one of the three cross edges from each triangle is incident on a TRUE vertex.

By adding the other two vertices to the cover, we cover all edges associated with the clause.

Every SAT defines a cover and Every Cover Truth values for the SAT!

Example: tex2html_wrap_inline15996 , tex2html_wrap_inline15998 .

Listening to Part 25-1

Starting from the Right Problem

As you can see, the reductions can be very clever and very complicated. While theoretically any NP-complete problem can be reduced to any other one, choosing the correct one makes finding a reduction much easier.


As you can see, the reductions can be very clever and complicated. While theoretically any NP-complete problem will do, choosing the correct one can make it much easier.

Maximum Clique

Instance: A graph G=(V,E) and integer tex2html_wrap_inline16002 .  

Question: Does the graph contain a clique of j vertices, ie. is there a subset of v of size j such that every pair of vertices in the subset defines an edge of G?

Example: this graph contains a clique of size 5.

Listening to Part 25-2

When talking about graph problems, it is most natural to work from a graph problem - the only NP-complete one we have is vertex cover!

Theorem: Clique is NP-complete

Proof: If you take a graph and find its vertex cover, the remaining vertices form an independent set, meaning there are no edges between any two vertices in the independent set, for if there were such an edge the rest of the vertices could not be a vertex cover.  

Clearly the smallest vertex cover gives the biggest independent set, and so the problems are equivallent - Delete the subset of vertices in one from the total set of vertices to get the order!

Thus finding the maximum independent set must be NP-complete!

Listening to Part 25-3

In an independent set, there are no edges between two vertices. In a clique, there are always between two vertices. Thus if we complement a graph (have an edge iff there was no edge in the original graph), a clique becomes an independent set and an independent set becomes a Clique!

Thus finding the largest clique is NP-complete:

If VC is a vertex cover in G, then V-VC is a clique in G'. If C is a clique in G, V-C is a vertex cover in G'.

Listening to Part 25-4

36.5-1 Prove that subgraph isomorphism is NP-complete.  

  1. Guessing a subgraph of G and proving it is isomorphism to h takes tex2html_wrap_inline16008 time, so it is in NP.
  2. Clique and subgraph isomorphism. We must transform all instances of clique into some instances of subgraph isomorphism. Clique is a special case of subgraph isomorphism!

    Thus the following reduction suffices. Let G=G' and tex2html_wrap_inline16012 , the complete subgraph on k nodes.

Listening to Part 25-5

Integer Partition (Subset Sum)

Instance: A set of integers S and a target integer t.   

Problem: Is there a subset of S which adds up exactly to t?

Example: tex2html_wrap_inline16014 and T=3754

Answer: 1+16+64+256+1040+1093+1284 = T

Observe that integer partition is a number problem, as opposed to the graph and logic problems we have seen to date.

Theorem: Integer Partition is NP-complete.

Proof: First, we note that integer partition is in NP. Guess a subset of the input number and simply add them up.

To prove completeness, we show that vertex cover tex2html_wrap_inline16020 integer partition. We use a data structure called an incidence matrix to represent the graph G.

Listening to Part 25-6

How many 1's are there in each column? Exactly two.

How many 1's in a particular row? Depends on the vertex degree.

The reduction from vertex cover will create n+m numbers from G.

The numbers from the vertices will be a base-4 realization of rows from the incidence matrix, plus a high order digit:


ie. tex2html_wrap_inline16028 becomes tex2html_wrap_inline16030 .

The numbers from the edges will be tex2html_wrap_inline16032 .

The target integer will be


Why? Each column (digit) represents an edge. We want a subset of vertices which covers each edge. We can only use k x vertex/numbers, because of the high order digit of the target.

tex2html_wrap_inline16034 tex2html_wrap_inline16036 tex2html_wrap_inline16038

Listening to Part 25-7

We might get only one instance of each edge in a cover - but we are free to take extra edge/numbers to grab an extra 1 per column.

VC in G tex2html_wrap_inline16040 Integer Partition in S

Given k vertices covering G, pick the k cooresponding vertex/numbers. Each edge in G is incident on one or two cover vertices. If it is one, includes the cooresponding edge/number to give two per column.

Integer Partition in S tex2html_wrap_inline16042 VC in G

Any solution to S must contain exactly k vertex/numbers. Why? It cannot be more because the target in that digit is k and it cannot be less because, with at most 3 1's per edge/digit-column, no sum of these can carry over into the next column. (This is why base-4 number were chosen).

This subset of k vertex/numbers must contain at least one edge-list per column, since if not there is no way to account for the two in each column of the target integer, given that we can pick up at most one edge-list using the edge number. (Again, the prevention of carrys across digits prevents any other possibilites).

Neat, sweet, and NP-complete!

Notice that this reduction could not be performed in polynomial time if the number were written in unary 5=11111. Big numbers is what makes integer partition hard!

Next: Lecture 22 - techniques Up: Table of contents Previous: Lecture 20 - integer

Lecture 22 - techniques for proving hardness

Hamiltonian Cycle

Instance: A graph G  

Question: Does the graph contains a HC, i.e. an ordered of the vertices tex2html_wrap_inline16072 ?

This problem is intimately relates to the Traveling Salesman.

Question: Is there an ordering of the vertices of a weighted graph such that tex2html_wrap_inline16074 ?

Clearly, tex2html_wrap_inline16076 . Assign each edge in G weight 1, any edge not in G weight 2. This new graph has a Traveling Salesman tour of cost n iff the graph is Hamiltonian. Thus TSP is NP-complete if we can show HC is NP-complete.

Theorem: Hamiltonian Circuit is NP-complete

Proof: Clearly HC is in NP-guess a permutation and check it out. To show it is complete, we use vertex cover. A vertex cover instance consists of a graph and a constant k, the minimum size of an acceptable cover. We must construct another graph. Each edge in the initial graph will be represented by the following component:

All further connections to this gadget will be through vertices tex2html_wrap_inline16078 , tex2html_wrap_inline16080 , tex2html_wrap_inline16082 and tex2html_wrap_inline16084 . The key observation about this gadget is that there are only three ways to traverse all the vertices:  

Note that in each case, we exit out the same side we entered. Each side of each edge gadget is associated with a vertex. Assuming some arbitrary order to the edges incident on a particular vertex, we can link successive gadgets by edges forming a chain of gadgets. Doing this for all vertices in the original graph creates n intertwined chains with n entry points and n exits.

Thus we have encoded the information about the initial graph. What about k? We set up k additional vertices and connect each of these to the n start points and n end points of each chain.

Total size of new graph: GE+K vertices and 12E+2kN+2E edges tex2html_wrap_inline16086 construction is polynomial in size and time.

We claim this graph has a HC iff G has a VC of size k.

  1. Suppose tex2html_wrap_inline16088 is a HC.

    Assume it starts at one of the k selector vertices. It must then go through one of the chains of gadgets until it reaches a different selector vertex.

    Since the tour is a HC, all gadgets are traversed. The k chains correspond to the vertices in the cover.

    Note that if both vertices associated with an edge are in the cover, the gadget will be traversal in two pieces - otherwise one chain suffices.

    To avoid visiting a vertex more than once, each chain is associated with a selector vertex.

  2. Now suppose we have a vertex cover of size tex2html_wrap_inline16090 .

    We can always add more vertices to the cover to bring it up to size k.

    For each vertex in the cover, start traversing the chain. At each entry point to a gadget, check if the other vertex is in the cover and traverse the gadget accordingly.

    Select the selector edges to complete the circuit.

Neat, sweet, and NP-complete.

To show that Longest Path or Hamiltonian Path is NP-complete, add start and stop vertices and distinguish the first and last selector vertices.  

This has a Hamiltonian path from start to stop iff the original graph has a vertex cover of size k.

Listening to Part 26-2

Other NP-complete Problems

Open: Graph Isomorphism, Composite Number, Minimum Length Triangulation.

Listening to Part 26-3

Polynomial or Exponential?

Just changing a problem a little can make the difference between it being in P or NP-complete:

P NP-complete
Shortest Path Longest Path
Eulerian Circuit Hamiltonian Circuit
Edge Cover Vertex Cover

The first thing you should do when you suspect a problem might be NP-complete is look in Garey and Johnson, Computers and Intractability. It contains a list of several hundred problems known to be NP-complete. Either what you are looking for will be there or you might find a closely related problem to use in a reduction.  

Listening to Part 26-4

Techniques for Proving NP-completeness

  1. Restriction - Show that a special case of the problem you are interested in is NP-complete. For example, the problem of finding a path of length k is really Hamiltonian Path.  
  2. Local Replacement - Make local changes to the structure. An example is the reduction tex2html_wrap_inline16094 . Another example is showing isomorphism is no easier for bipartite graphs:  

    For any graph, replacing an edge with makes it bipartite.
  3. Component Design - These are the ugly, elaborate constructions  

Listening to Part 26-5

The Art of Proving Hardness

Proving that problems are hard is an skill. Once you get the hang of it, it is surprisingly straightforward and pleasurable to do. Indeed, the dirty little secret of NP-completeness proofs is that they are usually easier to recreate than explain, in the same way that it is usually easier to rewrite old code than the try to understand it.

I offer the following advice to those needing to prove the hardness of a given problem:

Listening to Part 26-8

Now watch me try it!

To demonstrate how one goes about proving a problem hard, I accept the challenge of showing how a proof can be built on the fly.

I need a volunteer to pick a random problem from the 400+ hard problems in the back of Garey and Johnson.

Listening to Part 27-2

Dealing with NP-complete Problems


Option 1: Algorithm fast in the Average case

Examples are Branch-and-bound for the Traveling Salesman Problem, backtracking algorithms, etc.

Option 2: Heuristics

Heuristics are rules of thumb; fast methods to find a solution with no requirement that it be the best one.

Note that the theory of NP-completeness does not stipulate that it is hard to get close to the answer, only that it is hard to get the optimal answer.

Often, we can prove performance bounds on heuristics, that the resulting answer is within C times that of the optimal one.

Next: Lecture 23 - approximation Up: Table of contents Previous: Lecture 21 - vertex

Lecture 23 - approximation algorithms and Cook's theorem

Listening to Part 26-1

36.5-5 Prove that Hamiltonian Path is NP-complete.

This is not a special case of Hamiltonian cycle! (G may have a HP but not cycle)  

The easiest argument says that G contains a HP but no HC iff (x,y) in G such that adding edge (x, y) to G causes to have a HC, so tex2html_wrap_inline16132 calls to a HC function solves HP.

The cleanest proof modifies the VC and HC reduction from the book:


Listening to Part 27-3

Approximating Vertex Cover

As we have seen, finding the minimum vertex cover is NP-complete. However, a very simple strategy (heuristic) can get us a cover at most twice that of the optimal.   

While the graph has edges

pick an arbitrary edge v, u

add both u and v to the cover

delete all edges incident on either u and v

If the graph is represented by an adjacency list this can be implemented in O(m+n) time.

This heuristic must always produce cover, since an edge is only deleted when it is adjacent to a cover vertex.

Further, any cover uses at least half as many vertices as the greedy cover.

Why? Delete all edges from the graph except the edges we selected.

No two of these edges share a vertex. Therefore, any cover of just these edges must include one vertex per edge, or half the greedy cover!

Listening to Part 27-4

Things to Notice

Listening to Part 27-5

The Euclidean Traveling Salesman

In the traditional version of TSP - a salesman wants to plan a drive to visit all his customers exactly once and get back home.  

Euclidean geometry satisfies the triangle inequality, tex2html_wrap_inline16138 .

TSP remains hard even when the distances are Euclidean distances in the plane.

Note that the cost of airfares is an example of a distance function which violates the triangle inequality.

However, we can approximate the optimal Euclidean TSP tour using minimum spanning trees.

Claim: the cost of a MST is a lower bound on the cost of a TSP tour.

Why? Deleting any edge from a TSP tour leaves a path, which is a tree of weight at least that of the MST!

Listening to Part 27-6

If we were allowed to visit cities more than once, doing a depth-first traversal of a MST, and then walking out the tour specified is at most twice the cost of MST. Why? We will be using each edge exactly twice.

Every edge is used exactly twice in the DFS tour: 1.

However, how can we avoid revisiting cities?

We can take a shortest path to the next unvisited vertex. The improved tour is 1-2-3-5-8-9-6-4-7-10-11-1. Because we replaced a chain of edges by the edge, the triangle inequality ensures the tour only gets shorter. Thus this is still within twice optimal!

Listening to Part 27-1

37.1-3 Give an efficient greedy algorithm that finds an optimal vertex cover of a tree in linear time.  

In a vertex cover we need to have at least one vertex for each edge.

Every tree has at least two leaves, meaning that there is always an edge which is adjacent to a leaf. Which vertex can we never go wrong picking? The non-leaf, since it is the only one which can also cover other edges!

After trimming off the covered edges, we have a smaller tree. We can repeat the process until the tree as 0 or 1 edges. When the tree consists only of an isolated edge, pick either vertex.

All leaves can be identified and trimmed in O(n) time during a DFS.

Formal Languages and the Theory of NP-completeness

The theory of NP-completeness is based on formal languages and Turing machines, and so we will must work on a more abstract level than usual.

For a given alphabet of symbols tex2html_wrap_inline16142 0, 1, &, we can form an infinite set of strings or words by arranging them in any order: `&10', `111111',`&&&', and `&'.

A subset of the set of strings over some alphabet is a formal language.  

Formal language theory concerns the study of how powerful a machine you need to recognize whether a string is from a particular language.

Example: Is the string a binary representation of a even number? A simple finite machine can check if the last symbol is zero:

No memory is required, except for the current state.

Observe that solving decision problems can be thought of as formal language recognition. The problem instances are encoded as strings and strings in the language if and only if the answer to the decision problem is YES!  

What kind of machine is necessary to recognize this language? A Turing Machine!  

A Turing machine has a finite-state-control (its program), a two way infinite tape (its memory) and a read-write head (its program counter)


So, where are we?

Each instance of an optimization or decision problem can be encoded as string on some alphabet. The set of all instances which return True for some problem define a language.

Hence, any problem which solves this problem is equivalent to a machine which recognizes whether an instance is in the language!

The goal of all this is going to be a formal way to talk about the set of problems which can be solved in polynomial time, and the set that cannot be.

Non-deterministic Turing Machines

Suppose we buy a guessing module peripherial for our Turing machine, which looks at a Turing machine program and problem instance and in polynomial time writes something it says is an answer. To convince ourselves it really is an answer, we can run another program to check it.   

Ex: The Traveling Salesman Problem

The guessing module can easily write a permutation of the vertices in polynomial time. We can check if it is correct by summing up the weights of the special edges in the permutation and see that it is less than k.

The class of languages which we can recognize in time polynomial in the size of the string or a deterministic Turing Machine (without guessing module) is called P.

The class of languages we can recognize in time polynomial in the length of the string or a non-deterministic Turing Machine is called NP.

Clearly, tex2html_wrap_inline16144 , since for any DTM program we can run it on a non-deterministic machine, ignore what the guessing module is doing, and it will just as fast.

P ?= NP

Observe that any NDTM program which takes time P(n) can simulated in tex2html_wrap_inline16148 time on a deterministic machine, by running the checking program tex2html_wrap_inline16150 times, once on each possible guessed string.  

The $10,000 question is whether a polynomial time simulation exists, or in other words whether P=NP?. Do there exist languages which can be verified in polynomial time and still take exponential time on deterministic machines?

This is the most important question in computer science. Since proving an exponential time lower bound for a problem in NP would make us famous, we assume that we cannot do it.

What we can do is prove that it is at least as hard as any problem in NP. A problem in NP for which a polynomial time algorithm would imply all languages in NP are in P is called NP-complete.

Turing Machines and Cook's Theorem

Cook's Theorem proves that satisfiability is NP-complete by reducing all non-deterministic Turing machines to SAT.  

Each Turing machine has access to a two-way infinite tape (read/write) and a finite state control, which serves as the program.

A program for a non-deterministic TM is:
  1. Space on the tape for guessing a solution and certificate to permit verification.
  2. A finite set of tape symbols
  3. A finite set of states tex2html_wrap_inline16154 for the machine, including the start state tex2html_wrap_inline16156 and final states tex2html_wrap_inline16158
  4. A transition function, which takes the current machine state, and current tape symbol and returns the new state, symbol, and head position.

We know a problem is in NP if we have a NDTM program to solve it in worst-case time p[n], where p is a polynomial and n is the size of the input.

Cook's Theorem - Satisfiability is NP-complete!

Proof: We must show that any problem in NP is at least as hard as SAT. Any problem in NP has a non-deterministic TM program which solves it in polynomial time, specifically P(n).  

We will take this program and create from it an instance of satisfiability such that it is satisfiable if and only if the input string was in the language.

If a polynomial time transform exists, then SAT must be NP-complete, since a polynomial solution to SAT gives a polynomial time algorithm to anything in NP.

Our transformation will use boolean variables to maintain the state of the TM:

Variable Range Intended meaning
Q[i, j] tex2html_wrap_inline16166 At time i, M is in
tex2html_wrap_inline16168 state tex2html_wrap_inline16170
H[i,j] tex2html_wrap_inline16174 At time i, the read-write head
tex2html_wrap_inline16176 is scanning tape square j
S[i,j,k] tex2html_wrap_inline16180 At time i, the contents of
tex2html_wrap_inline16182 tape square j is symbol tex2html_wrap_inline16184

Note that there are tex2html_wrap_inline16188 literals, a polynomial number if p(n) is polynomial.

We will now have to add clauses to ensure that these variables takes or the values as in the TM computation.

The group 6 clauses enforce the transition function of the machine. If the read-write head is not on tape square j at time i, it doesn't change ....

There are tex2html_wrap_inline16192 literals and tex2html_wrap_inline16194 clauses in all, so the transformation is done in polynomial time!

Polynomial Time Reductions

A decision problem is NP-hard if the time complexity on a deterministic machine is within a polynomial factor of the complexity of any problem in NP.  

A problem is NP-complete if it is NP-hard and in NP. Cook's theorem proved SATISFIABILITY was NP-hard by using a polynomial time reduction translating each problem in NP into an instance of SAT:

Since a polynomial time algorithm for SAT would imply a polynomial time algorithm for everything in NP, SAT is NP-hard. Since we can guess a solution to SAT, it is in NP and thus NP-complete.

The proof of Cook's Theorem, while quite clever, was certainly difficult and complicated. We had to show that all problems in NP could be reduced to SAT to make sure we didn't miss a hard one.

But now that we have a known NP-complete problem in SAT. For any other problem, we can prove it NP-hard by polynomially transforming SAT to it!

Since the composition of two polynomial time reductions can be done in polynomial time, all we need show is that SAT, ie. any instance of SAT can be translated to an instance of x in polynomial time.

Listening to Part 27-7

Finding the Optimal Spouse


  1. There are up to n possible candidates we will see over our lifetime, one at a time.
  2. We seek to maximize our probability of getting the single best possible spouse.
  3. Our assessment of each candidate is relative to what we have seen before.
  4. We must decided either to marry or reject each candidate as we see them. There is no going back once we reject someone.
  5. Each candidate is ranked from 1 to n, and all permutations are equally likely.

Listening to Part 27-8

For example, if the input permutation is


we see (3,1,2) after three candidates.

Picking the first or last candidate gives us a probability of 1/n of getting the best.

Since we seek maximize our chances of getting the best, it never pays to pick someone who is not the best we have seen.

The optimal strategy is clearly to sample some fraction of the candidates, then pick the first one who is better than the best we have seen.

But what is the fraction?

Listening to Part 27-9

For a given fraction 1/f, what is the probability of finding the best?

Suppose i+1 is the highest ranked person in the first n/f candidates. We win whenever the best candidate occurs before any number from 2 to i in the last n (1- 1/f) / f candidates.

There is a 1/i probability of that, so,


In fact, the optimal is obtained by sampling the first n/e candidates.

Next: none Up: Table of contents Previous: Lecture 22 - techniques

About this document ...

This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html all.tex.

The translation was initiated by Algorithms on Mon Jun 2 09:21:39 EDT 1997

Mon Jun 2 09:21:39 EDT 1997