CSE 526
Spring 2005
Stony Brook
Principles of Programming Languages
Annie Liu
Project Description
Handout P
Apr 29, 2005
Due May 13

Comparing Programs for Graph Reachability
written in Different Languages and at Different Levels

All major parts of this project have been described in Handouts H10-H12; if you did not do bonus problems in Handouts H11 and H12, then you need to do it now. This description summarizes what you need to do to put the results together, in terms of what needs to be submitted. The percentage of grades for each part is indicated next to that part. The project is due at noon on Friday May 13. Please submit your files as a single attachment (made using tar or zip if necessary) in an email to cse526ATcsDOTsunysbDOTedu. Please also submit a hardcopy of your report to Annie by noon on May 13.

1. Source programs for reachability [30%].

This includes source programs written in:

  1. high-level SETL and low-level SETL (2 programs given),
  2. high-level Python and low-level Python (2 programs, one clearest and one fastest, from H10),
  3. XSB with and without tabling and varying orders of rules and hypotheses (8 versions from H11),
  4. FLORA using attributes and classes and varying orders of rules and hypotheses (8 versions from H12),
  5. your hand-coded C (from H11) and APTS generated C (from H12), and
  6. possibly in Java, C++, and other languages, as described in the bonus part below.

2. Python code for handling input and output [10%].

This should include the functionalities (from H10-H12) of

  1. generating sets and maps randomly,
  2. comparing sets,
  3. reading sets and maps in SETL syntax, and
  4. printing sets and maps in SETL, Prolog, and FLORA syntax, where you may consider only relations used in this problem.

3. Python/Perl/.../Shell. script for tests and performance measurements [10%].

Automate the tests and measurements as much as possible. You may do this in any language you like. Make sure that your tests and measurements run on compserv's (our programs should not need any feature that is machine dependent). We will be running your scripts to confirm the tests and measurements, so make this part as easy as possible to do.

4. Report [40%].

For this part, tables and figures should help significantly; make sure you explain exactly what is said in each table and figure.

4.1. Information about source programs for reachability. This includes code size (number of lines, with/without comments, excluding libraries; may also give number of characters or other metrics) and other characterizations (e.g., 100 small methods, many typecasts, 5 loops, 80% code for init; for each that you think is an important measure, you should report it for all programs).

4.2. What kind of input did you generate and why? Describe (1) how your code for generating sets and maps works (e.g., what are the parameters? number of nodes and/or number of edges), (2) how many and what sizes of input you generate (at least 5 to 10, spread out so that they tell the performance trend), and (3) why these are good for testing correctness and/or measuring performance.

4.3. Correctness test results. Report (1) whether all programs produced the same output on all input (they should all be the same, and if not, explain the difference and why), (2) characterizations of output (output size for this problem, i.e., the number of nodes reachable), and (3) how certain that you think each program always produces correct output and why.

4.4. Performance measurement results. Describe (1) how you performed the measurements and what the exact times are that you measured (e.g., used unix timing for a program that includes parsing, computing reachability, and printing, and did it 10 times and took the average; or used Java Runtime to time a piece of code that called the part for computing reachability 10 times and divided the time by 10; for small input, often you must run each multiple times to get a good precision, e.g., each 10 times or at least 10 minutes), (2) your time measurements for all programs on all inputs, expressed using tables and plotted figures, and (3) conclusion of what kind of curves you got.

4.5. How to do/replay the test you did, and how to do new tests? When grading the project, we will follow this to check your tests and measurements. The results may affect 95% of the grades.

4.6. Summary and conclusion. Describe anything you think is most interesting (e.g., which program is most clear? which program is fastest? what are the interesting ratios?).

5. README file [5%, a must to get any points for other parts].

Say where is what.

6. Comparison of result [5%].

Form groups of 4 each. Compare the results, discuss the differences, and find explanations (but not get code of each other). Each group should choose a person (or multiple persons) to present a summary and comparison of the results in class on May 5 (each group will be given 15 minutes). Each member of a group will be given the same grade based on the presentation, but you may improve the grade by including a comparison of results with others' in your report.

The presentation should include, at least, who the team members are, and a summary of the items in 4.1-4.4 and 4.6 above (with comparison among results from team members, differences and explanations).

You may use slides and put them on a flash drive. We will have a projector and a laptop, and you may come a bit earlier to setup---remember we have only 15 minutes for each team.


Bonus Problems.

B1. Detailed running time and space measurements and analysis [15%]. See the corresponding Bonus Problems in Handout 10 and do so for other programs too. All you did before for this problem, if you submit, will count. You can do more too.

B2. Time complexity analysis (15%). See the corresponding Bonus Problem in Handout 10 and do so for other programs too. All you did before for this problem, if you submit, will count. You can do more too.

B3. Optimizations [open]. Make your hand-coded C code as fast as possible. Demonstrate the performance improvement and show that it is correct (at least by testing).

B4. Reachability in Java and C++ [16.67%, equivalent to 1 homework]. Implement two versions for each, corresponding to the high-level SETL (or as clear as possible) and the low-level SETL (or as efficient as possible), respectively, and do the testing and measurements as for other programs.

B5. Reachability in yet a different language [open]. Write the clearest code and fastest code you can in that language and do the testing and measurements as for other programs. In particular, it would be interesting to write one in the functional programming paradigm (known difficult for graph algorithms) and compare. If you do this problem, let me know what language you plan to use first.