CSE 526
Spring 2005
Stony Brook
Principles of Programming Languages
Annie Liu
Homework 10
Handout H10
Apr. 12, 2005
Due Apr. 19

Implementing Graph Reachability using Set and Map Operations, in Python.

This assignment has two parts, each worth 50% of the grades. A number of bonus problems are also given, together worth an additional 100% of the grades; the points indicated are based on a decent work, and there is no limit on how many points an exceptionally beautiful solution could get. The assignment is due before class on Tuesday April 19. Please submit your file(s) as a single attachment (made using tar or zip if necessary) in an email to cse526ATcsDOTsunysbDOTedu.

Note: All solutions should be written in one Python file, where non-python code is included as comments (which may refer to figures in seperate postscript or pdf files if needed). Besides the solutions, your file should also include instructions for running and testing your programs and reproducing your experimental results.

Note that work in this assignment is also part of the final project. It is given here so that you don't need to do a big job at the end. If you loose points for this assignment, you may still get them for your project if you fix the problems at the end.

Part I. Implementing graph reachability in Python.

You are given complete code in both high-level SETL2 and low-level SETL2 for computing reachable vertices in graphs. You are asked to implement both using Python. Your Python programs should be as close to the given SETL2 programs as possible.

Part II. Testing and analyzing performance.

To test and analyze your programs, you first need to program the following utilities in Python:

1. Reading data. Write a function that reads input strings for sets and maps in SETL syntax and returns the appropriate Python set and map representations; the SETL syntax is simpler and more general. Your program should be general so as to allow arbitrary nesting of sets and maps. For the base case, allow only integers. You don't need to allow general tuples, just pairs. For sets, use Python set; for maps, you may use Python dictionaries or sets of pairs.

2. Comparing equality. Write a function that takes two arguments for sets or maps and returns true if and only if they are equal mathematically. This will help us compare the output with that from the SETL implementation automatically.

3. Generating tests. Write a function that takes a integer parameter and generates a random input graph of the parameter size. Note that the parameter is the number of edges in the graph, not vertices.

Then, you should do two kinds of experiments:

1. Correctness testing. Test your programs on this input as well as on a number of randomly generated inputs (report what kinds of inputs you generated, include size and number) and compare the results.

2. Running time measurements. Measure the running times of your programs systematically (on a number of inputs of a number of different sizes), plot the results, and conclude what kind of curves you are getting.

To run SETL2 programs, you could install SETL2 yourself, following the link under Systems; this is about the easiest system to install in the world. Or, you could simply use my installation by setting the following in your environment and then simply calling stll once to set up your library, stlc to compile your programs, and stlx to run your programs:

set path=(/home/facfs1/liu/tools/setl2-2.3/bin $path)
setenv SETL2_LIB directory_and_file_name_for_your_library.lib 

Bonus Problems.

B1. Making high-level programs clearer and low-level programs more efficient (25%). Can you write a Python program that is clearer than the Python version that is closest to the high-level SETL2 version? Can you write a Python program that is more efficient than the Python version that is closest to the low-level SETL2 version? (Note that if your programs from Part I are not close enough to the given SETL2 programs, and you make them closer here, you can't get bonus points.)

To show that you did any improvements in terms of clarity and/or efficiency, you need to show either better counts or better measurement results, or both. Make sure that you first show that your new programs are correct by giving formal arguments or doing test, or both.

B2. Detailed running time analysis (25%). Measure and analyze running times in detail to show how much of the times is spent running which pieces of code. This may include separate running times for each set operation, for garbage collection, for input and output, etc. This may also include counts for how many times a loop is executed, for how many times certain operations (e.g., membership test, creation of a new set) are performed, etc.

B3. Detailed space usage analysis (25%). Do detailed measurements and analyses for space usage.

B4. Time complexity analysis (25%). Give a realistic cost model for the set and map operations used and analyze the time complexity of both high and low level versions of both SETL2 and Python programs. You should analyze each given operation in SETL2 and Python implementations separately, then analyze small functions written using them, analyze the code fragments in the loop bodies, and finally analyze entire loops and then the entire programs.