Class: Monday and Wednesday, 3:20pm-04:40pm. Life Science 030.
Instructor's Office Hours: Tuesday and Thursday, 3:30pm-4:30pm. Computer Science 1429.
The Spring 2002 incarnation of cse647 covers two topics: design of fault-tolerant or secure distributed systems, and automated checking of program correctness properties. Efficient automated techniques for approximate checking of interesting classes of correctness properties are recently showing considerable promise; for example, Engler's techniques (see reference below) found thousands of errors in Linux, OpenBSD, and other systems. IBM T.J. Watson Research Center and Microsoft Research recently started projects with similar aims. One goal of the course is to understand how to extend these techniques to efficiently handle concurrency and distribution. This requires a good understanding of concurrent and distributed system design and the kinds of properties that are most useful to check.
Automated program checking is very useful for finding some important classes of errors, but development techniques that prevent errors are always preferable. While studying distributed systems, we will ask ourselves how the designs could be developed more systematically. This includes development of designs (pseudo-code) from requirements, and development of implementations from designs. Current techniques for both steps are unsatisfactory. For the former step, the state-of-the-art is represented by Butler Lampson's work, which we will study (see below). For the latter step, it would be nice to implement systems in a high-level programming language close to what we now consider pseudo-code. How large is the gap between code and pseudo-code? One small example comes from the cse533 project last semester. Students implemented (in Java) a fault-tolerant and intrusion-tolerant replication algorithm that is described in 96 lines of pseudo-code. The implementations averaged about 3,000 lines of code, excluding blank lines and comments.
The course involves a project. You may work in teams or alone, depending on the size of the project. Different teams may do different projects. I will suggest some possibilities, e.g., implement a distributed algorithm, give a derivation (in the style of Lampson) of a distributed algorithm, implement a checker for some class of correctness properties, design (and, if time permits, implement) a checker that handles concurrency or distribution better than other existing checkers, apply an existing checker to some interesting programs (this last option depends on availability of an existing checker, which is currently uncertain). You are also welcome to propose your own project. A wide variety of projects is acceptable, ranging from theory to implementation. If you already have something in mind, feel free to stop by and discuss it with me.
Most of the course will be spent reading and discussing conference and journal articles. Here is a sampling of articles we might cover.
Distributed System Design
To submit an assignment that involves programming,
zip -rto archive the entire directory. This lets us extract everyone's files without filename conflicts.
|hw1 (out of 20)||18.7||0.9|
|hw2 (out of 20)||16.1||1.8|
|hw3 (out of 20)||19.4||1.0|
|hw4 (out of 20)||19.3||0.7|
|hw5 (out of 20)||18.7||1.4|
|hw6 (out of 20)||15.6||3.2|
Tal Lev-Ami and Tom Reps and Mooly Sagiv and Reinhard Wilhelm. Putting Static Analysis to Work for Verification: A Case Study. In Proceedings of the ACM International Symposium on Software Testing and Analysis (ISSTA), 2000.
Leslie Lamport. Paxos Made Simple. To appear in SIGACT News.
John Whaley and Martin Rinard. Compositional pointer and escape analysis for Java programs. In Proceedings of the 14th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), 1999.
Chandrasekar Boyapati and Martin C. Rinard. A Parameterized Type System for Race-Free Java Programs. In Proceedings of the 16th Annual ACM Symposium on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), 2001.
Cormac Flanagan and Stephen Freund. Detecting Race Conditions in Large Programs. In Proceedings of the ACM SIGPLAN Workshop on Program Analysis for Software Tools and Engineering, pp. 90-96, June 2001.before moving on to another analysis for concurrent Java programs:
Gleb Naumovich, George S. Avrunin, and Lori A. Clarke. An Efficient Algorithm for Computing MHP Information for Concurrent Java Programs. In Proceedings of the Seventh European Software Engineering Conference and Seventh ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 338-354, September 1999.
Homework: Homework3 is available.
PAG: The Program Analyzer Generator is a tool for generating program analyzers from concise high-level descriptions. If you are interested in using it, I suggest starting with this paper and then reading the manual.
Handout: Today I distributed a draft of Engler et al.'s forthcoming PLDI 2002 paper.
Homework: Homework2 is available.
Reading: The second paper we'll discuss is The ABCDs of PAxos. You might want to bring a printout to class.