CSE647: Reliable (Distributed) Software
Spring 2002
Scott Stoller

Table of Contents

Course Description


Scott Stoller

Class: Monday and Wednesday, 3:20pm-04:40pm. Life Science 030.

Instructor's Office Hours: Tuesday and Thursday, 3:30pm-4:30pm. Computer Science 1429.

Course Description

Note: The course name "Reliable (Distributed) Software" reflects the content this semester. The registrar's official course name is "Program Semantics and Verification".

The Spring 2002 incarnation of cse647 covers two topics: design of fault-tolerant or secure distributed systems, and automated checking of program correctness properties. Efficient automated techniques for approximate checking of interesting classes of correctness properties are recently showing considerable promise; for example, Engler's techniques (see reference below) found thousands of errors in Linux, OpenBSD, and other systems. IBM T.J. Watson Research Center and Microsoft Research recently started projects with similar aims. One goal of the course is to understand how to extend these techniques to efficiently handle concurrency and distribution. This requires a good understanding of concurrent and distributed system design and the kinds of properties that are most useful to check.

Automated program checking is very useful for finding some important classes of errors, but development techniques that prevent errors are always preferable. While studying distributed systems, we will ask ourselves how the designs could be developed more systematically. This includes development of designs (pseudo-code) from requirements, and development of implementations from designs. Current techniques for both steps are unsatisfactory. For the former step, the state-of-the-art is represented by Butler Lampson's work, which we will study (see below). For the latter step, it would be nice to implement systems in a high-level programming language close to what we now consider pseudo-code. How large is the gap between code and pseudo-code? One small example comes from the cse533 project last semester. Students implemented (in Java) a fault-tolerant and intrusion-tolerant replication algorithm that is described in 96 lines of pseudo-code. The implementations averaged about 3,000 lines of code, excluding blank lines and comments.

The course involves a project. You may work in teams or alone, depending on the size of the project. Different teams may do different projects. I will suggest some possibilities, e.g., implement a distributed algorithm, give a derivation (in the style of Lampson) of a distributed algorithm, implement a checker for some class of correctness properties, design (and, if time permits, implement) a checker that handles concurrency or distribution better than other existing checkers, apply an existing checker to some interesting programs (this last option depends on availability of an existing checker, which is currently uncertain). You are also welcome to propose your own project. A wide variety of projects is acceptable, ranging from theory to implementation. If you already have something in mind, feel free to stop by and discuss it with me.

Most of the course will be spent reading and discussing conference and journal articles. Here is a sampling of articles we might cover.

Distributed System Design

Program Checking

How to Submit an Assignment

To submit a homework that does not involve programming, either email it to
stoller AT cs DOT sunysb DOT edu or put hardcopy under the door of Room 1429. Note that you might not have access to my office door after normal business hours, because the door to the suite is locked.

To submit an assignment that involves programming,

Details regarding the .zip file:


The following information may be slightly inaccurate, due to score adjustments, late submissions, etc.

Item Mean Std.Dev. Histogram
hw1 (out of 20) 18.7 0.9
hw2 (out of 20) 16.1 1.8
hw3 (out of 20) 19.4 1.0
hw4 (out of 20) 19.3 0.7
hw5 (out of 20) 18.7 1.4
hw6 (out of 20) 15.6 3.2


May 15

Grades: Please let me know if you did not receive your course grade by email.

May 1

Homework 6 Solution is available. Homework 5 solution will be distributed in class.

Apr 29

Reading: The next paper we'll discuss is
Tal Lev-Ami and Tom Reps and Mooly Sagiv and Reinhard Wilhelm. Putting Static Analysis to Work for Verification: A Case Study. In Proceedings of the ACM International Symposium on Software Testing and Analysis (ISSTA), 2000.

Apr 21

Homework: Homework6 is available.

Apr 17

Reading: The next paper we'll discuss is by Boyapati and Rinard (see link under Announcments of Apr 4).

Apr 11

Paxos: Here's a new paper about classic Paxos by Lamport, who wrote the original paper on classic Paxos. It would be nice to see a similarly simple treatment of Byzantine Paxos.
Leslie Lamport. Paxos Made Simple. To appear in SIGACT News.

Apr 6

Reading: The next paper we'll discuss is:
John Whaley and Martin Rinard. Compositional pointer and escape analysis for Java programs. In Proceedings of the 14th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), 1999.

Apr 5

Homework: Homework5 is available.

Apr 4

Paper: I mentioned in class that there is a more elaborate type system for locks. In case you're interested, here it is.
Chandrasekar Boyapati and Martin C. Rinard. A Parameterized Type System for Race-Free Java Programs. In Proceedings of the 16th Annual ACM Symposium on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), 2001.

Mar 20

Homework: Homework4 is available. I'll bring printouts to class.

Mar 19

Reading: We'll also briefly discuss
Cormac Flanagan and Stephen Freund. Detecting Race Conditions in Large Programs. In Proceedings of the ACM SIGPLAN Workshop on Program Analysis for Software Tools and Engineering, pp. 90-96, June 2001.
before moving on to another analysis for concurrent Java programs:
Gleb Naumovich, George S. Avrunin, and Lori A. Clarke. An Efficient Algorithm for Computing MHP Information for Concurrent Java Programs. In Proceedings of the Seventh European Software Engineering Conference and Seventh ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 338-354, September 1999.

Mar 11

Reading: The next paper we'll discuss is: Cormac Flanagan and Stephen Freund. Type-based race detection for Java. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. ACM Press, June 2000.

Mar 7

Homework: Homework2 Solution is available.

Homework: Homework3 is available.

PAG: The Program Analyzer Generator is a tool for generating program analyzers from concise high-level descriptions. If you are interested in using it, I suggest starting with this paper and then reading the manual.

Handout: Today I distributed a draft of Engler et al.'s forthcoming PLDI 2002 paper.

Feb 20

Reading: The third paper we'll discuss is: Engler et al, Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions, OSDI 2000.

Feb 19

Office Hours: I finally added official office hours (under Administrivia above). Sorry for the delay. If the proposed time is inconvenient for you, let me know. If enough people respond, I'll re-schedule. You are still welcome to make an appointment to see me any time.

Homework: Homework2 is available.

Feb 5

Homework: Homework1 is available.

Reading: The second paper we'll discuss is The ABCDs of PAxos. You might want to bring a printout to class.

Jan 28

Reading: We'll discuss the PREfix paper first, due in part to its high ranking on the questionnaire. I'll bring hardcopies to today's class.