B649 (Spring 1999)
Fault-Tolerance and Security of Distributed Systems

Instructor

Scott Stoller

A.I.

Ying Feng

Table of Contents

Hours
Course Description
Textbook
Project
Project Page
Grading
Academic Integrity
Homework Guidelines
Exam Guidelines
Course Newsgroup
Syllabus
Grading Statistics
Announcements

Hours

Lectures: Tue and Thu, 5:30pm - 6:45pm, Ballantine Hall 215.

Office Hours: Tue and Thu, 11am - noon. Lindley Hall 201C.

Section: 1350

Course Description

Core Topics

Additional Topics

These topics are representative: we won't have time to discuss all of them, and we might cover some topics not listed here. The overlap with Spring 1997's B649 is relatively small (less than 15%).

Textbook

Several of the readings will be conference or journal articles. Most of the course will not follow a textbook. The following "recommended book" contains some relevant and useful background material, so you might want to buy it (and read it). Another book that covers similar material is

Project

In addition to homeworks and exams, students will do a course project. One or more suggestions will be made; students are encouraged to suggest their own projects as well. Projects may be theoretical or implementation-oriented (ideally both!).

Grading

Grades will be based on homework, two exams, and the project. The weights are:
Homework 35%
Project 30%
Exams 35%

Academic Integrity

Be sure to read and follow the Computer Science Department Statement on Academic Integrity.

Homework Guidelines

You are free to discuss homework with everyone, but you must write and submit your own solutions. Homework will be graded on the basis of correctness and clarity. Homework submitted late is subject to a penalty of up to 20% per day. Furthermore, no credit will be given for homework submitted after solutions have been posted. Solutions to homework will usually be posted sometime during the day after the assignment is due.

By default, all assignments in this course are due by the end (midnight) of the specified day. To submit a problem set, either email it to stoller@cs.indiana.edu or put hardcopy under the door of LH201C or in the homework drop box near Lindley 215. If you use the homework drop box, be sure the instructor's name appears on your problem set, so the staff knows to whom to give it. If you submit electronically, please use ASCII, PostScript, or PDF; Microsoft Word (.doc) files are less convenient.

Exam Guidelines

Exams are potentially cumulative, e.g., exam2 might contain questions about material covered on exam1. On the other hand, it is reasonable to expect that exam2 will focus on material not covered on exam1.

Exams are "open book"; specifically, you may use course handouts, your notes, your homeworks, and the recommended textbook (CDK). Of course, the exams will be on material covered in lecture or in course handouts, so it probably does not matter much whether you have the textbook. You may not use someone else's notes or homeworks or mechanical reproductions of all or part thereof. You may not use other textbooks or mechanical reproductions of all or part thereof. You may use a dictionary, if you like.

Course Newsgroup

The course newsgroup is ac.csci.b649. If you have questions, this is a good place to ask them, so that other students can also benefit. If you know the answer to someone's question, post it! This will help your classmates and show us that you know what's going on.

Syllabus

ClassTopic
1-2Intro. Failure models. Communication paradigms.
3-5Logical Time. Snapshots.
6-7Leader Election.
8-10Group Communication in Amoeba.
11-19Transaction Processing
20-21Cryptography
22-26Electronic Payment. Digital Cash.

Grading Statistics

ItemOut OfMeanStddevHistogram
hw12015.15.6
hw22015.93.1
hw33024.87.9
hw44030.116.1
hw52016.65.0
hw62018.63.3
exam16041.3 (69%)11.4histogram
hw73022.14.7
hw82013.13.1
hw9109.60.6
exam25032.2 (64%)4.3histogram
project10097.95.9


Announcements

27 Apr

Virtual Handout: We will discuss the following paper:
A related paper that you might want to look at is:

26 Apr

Course Eval: Please fill out the CS Dept's official On-line Course Evaluation. Your comments and suggestions would be appreciated.

23 Apr

Project: For people doing the voting project... It would be helpful to include (preferably in your final report) brief descriptions of all the non-trivial testcases you tried and their outcomes; as substantiation, you might consider including pointers to TESTCASE files (as in P536) containing transcripts of those testcases, or to scripts that run those testcases. Also... The grading guidelines said: "Additional extra credit: demo items 3-6 for n=7, b=2, f=2, V=8." It makes more sense to take n=9 here.

22 Apr

Exam: Grading statistics for exam2 have been added above.

Exam: Here is a Exam2 Solution.

21 Apr

Project: Final reports are due on Sat night, 24 Apr. Your final report should explain and justify your design decisions, and describe significant aspects of your data structures and code (if your project involves coding). Your report should clearly distinguish between what you actually achieved and what else (if anything) you would have liked to achieve if more time had been available. If your code does not work in certain situations, your report should say so. For people doing the voting project, you can assume the reader of your report is familiar with the Phalanx paper. Finally, if you have a teammate, recall (from the project page): The status report and final report must describe the contributions of each team member. This information may be integrated into the body of the report or placed in an appendix.

Homework: Grading statistics for hw8 and hw9 have been added above.

Crypto in Java: The crypto package I mentioned in class yesterday is Cryptix, in case anyone is interested. If you get it to work, let me know.

19 Apr

Demo Schedule: Here is the Demo Schedule. If you haven't already signed up, please choose a free timeslot and sign up.

Practice Problems: If you have questions about (the answer to) a practice problem, please post your question to the newsgroup. One of your classmates (perhaps the author of the problem) might be able to answer your question more promptly than I can.

Practice Problems: Here are the Practice Problems from Homework 9 with Solutions.

Practice Problem: Here is the Second "Missing" Practice Problem from Homework 9. I also added it to Practice Problems from Homework 9.

Exam: Exam2 covers material up to and including Topic VIII (Digital Cash), Section 3 (An e-cash protocol) in the lecture notes.

18 Apr

Practice Problem: Here is a "Missing" Practice Problem from Homework 9. I also added it to Practice Problems from Homework 9.

Homework: Here is a Homework 8 Solution.

Practice Problems: Here are the Practice Problems from Homework 9.

16 Apr

Handout: The following paper was distributed during lecture yesterday:

15 Apr

Homework: Grading statistics for hw7 have been added above.

13 Apr

Homework: Here is Homework 9.

Project: For people working on the voting project... Here is a draft of Grading Guidelines for Voting Project.

11 Apr

Reading: Read (at least) sections 3.1-3.3, 3.5-3.7, and 3.11-3.12 of the crypto handout (chapter 3 of O'Mahony et al.).

10 Apr

Project: When you submit your final report (on Apr 24), make all files related to your report readable by Scott and Ying, and include a pointer to those files in your report or in an email. People doing the voting project should use ACLs, instead of making their files world-readable. People doing other projects are free to use either method.

Project: By default, demos will be held in Ying's office (LH 310).

9 Apr

Homework: Here is a Homework 7 Solution.

8 Apr

Handout: Today's handout is chapter 3 of Donal O'Mahony, Michael Peirce, and Hitesh Tewari, Electronic Payment Systems, Artech House, Inc., Boston, 1997. If you didn't get a copy, the leftovers are in the plastic stand outside my office door.

6 Apr

Project: For people working on the voting project... Some teams seem to have some confusion about the following point. If a client performing a read or write operation does not receive responses from an entire quorum, then the operation did not succeed. The client must retry those servers or try a different quorum.

Homework: Homework 8 is distributed today in class.

2 Apr

Reading: Read Bernstein et al., chapters 7 and 8, except sections 7.5, 8.5-8.7, and 8.10. If you didn't receive the handout containing chapters 7 and 8 of Bernstein et al., please print one off the web, or photocopy one from the copy of the book on reserve in Swain Hall Library.

1 Apr

Homework: Homework 7 is distributed today in class.

30 Mar

Schedule Change: We will meet 2pm-3:15pm on Sat, 10 Apr, in our usual location (BH 215).

27 Mar

Project: Reminder: Your status report is due on Mon, 5 Apr.

22 Mar

Project: Everyone should be working on their projects this week.

16 Mar

Reading: Photocopies of Chapters 7 and 8 of Bernstein et al. are available from a container outside my office door. Please take a copy and start reading chapter 7.

13 Mar

ENJOY THE SPRING BREAK!

Exam: I added the following sentence to the solution to problem 2 of exam1: "If this correspondence is not ensured, a message might be delivered more than once (in scenarios in which a server crashes and recovers multiple times)." Constructing such a scenario is a good exercise (if you didn't do it already during the exam).

11 Mar

Exam: Here is an Exam 1 Solution. Grading statistics for exam1 have been added above. If you want to pick up your exam booklet tomorrow (Friday), please come between 3pm and 4pm. If you cannot come then, and you would like me to give your exam booklet to a friend, email me their name.

Homework: I just added the following note to the solution to hw6, problem 2: "This solution assumes that locks are granted one at a time." I believe this assumption holds for typical implementations of Strict 2PL. If anyone has evidence to the contrary, I'd be interested to see it.

10 Mar

Schedule: I will be at a conference during the week after spring break, so b649 is cancelled on 23 Mar and 25 Mar. We will try to re-schedule those classes sometime later in the semester.

Homework: I just updated the solution to hw6, problem 2, as follows. If FCFS order (for granting of locks) is not acceptable for the applications at hand (e.g., a priority-based order is required), then adding Tk-->Ti is better than adding Tk-->Tj. Thanks to Yu Ma for pointing this out.

9 Mar

Grading: I updated the Grading table to show the breakdown between homework and project.

Homework: If you lost points on hw5, problem 1, because your histories were not complete, please put your homework in Ying's mailbox, and it will be re-graded.

5 Mar

Exam: As mentioned in lecture on 2 Mar, exam1 covers material up to and including the topics discussed on 2 Mar, i.e., up to and including Topic VI (Transactions), Section 3.4 (Strict 2PL) in the lecture notes.

Exam: It is a good idea to bring all of the course handouts to the exam.

Homework: Here is a Homework 5 Solution.

3 Mar

Handout: Photocopies of chapter 6 of Bernstein et al. are available in a container outside my office door.

2 Mar

Homework: Homework 6 was distributed today in class. It is due Sunday night, 7 Mar. It must be submitted by email. Homework 6 Solution will become readable at 8am on 8 Mar.

1 Mar

Handout: Photocopies of chapter 3 of Bernstein et al. are available in a container outside my office door.

26 Feb

Voting Project: For teams working on this project, your status report and final report must justify the design of your system. Even if you end up implementing Dalia and Mike's design with no modifications, your report should analyze that design and discuss its strengths and weaknesses.

Turing Award: Jim Gray is the winner of the 1998 A. M. Turing Award, for his contributions to database and transaction processing. His contributions include seminal work on 2 phase locking and 2 phase commit, which we will be studying in the next few weeks.

Homework: The due date for hw5 has been postponed to 4 March. If you have already submitted hw5, you are welcome to re-submit it later, if your answers change.

18 Feb

On-line Book: The entire book by Bernstein et al. is available here.

Handout: Chapter 2 of Bernstein et al. should be available from a box outside my door by about 4pm tomorrow (Friday).

Homework: Homework 5 was distributed today in class.

Handout: Chapter 1 of the following book was distributed in lecture on 16 Feb:

Group Communication: Frans Kaashoek is mailing me a copy of his Ph.D. thesis. I'll let you know if it provides more insight.

17 Feb

Correction: I said in lecture yesterday that the read and write operations in a transaction do not include accesses to auxiliary data structures, such as indices. On second thought, it seems much more sensible to include those accesses as part of the transaction, so that the transaction properties (especially serializability) apply to them. Sorry for the confusion.

16 Feb

Extra Credit: Correct pseudo-code for the Amoeba membership-change algorithm and an informal argument that the code satisfies the requirements outlined informally in lecture (and in the paper on Amoeba) would be extra credit worth approximately as much as 2 typical homework assignments (this is equivalent to getting 200% extra credit on a typical homework assignment). You can submit this any time before April 1. You are free work on this in a team of 2 or 3, in which case you'll each get 2/3 or 1/2 as much extra credit, respectively.

Group Communication: If I had realized sooner how vague the description of Amoeba's membership change algorithm is, I would have discussed the Transis group communication system instead: it is somewhat more complicated (partly because it has more functionality), but at least the presentation is clear and detailed. If you are interested, here is a paper about Transis:

Warning: I believe there are some technical problems with the specification of the membership service (Requisites M.1-M.5) in this paper. Don't let that discourage you; this paper also contains lots of good stuff. :-)

12 Feb

Clarification: My solution to hw4 uses a clique-based spec, instead of an independent-set-based spec, because it is easy to show that a clique-based spec satisfies the desiderata mentioned in lecture, in particular, that the spec is satisfied by algs that (unlike the Invit Alg) always put disconnected nodes in different groups. An independent-set-based spec might also have this desirable property, but it's not obvious (to me). Does anyone have a proof of this?

Homework: Here is a Solution to Homework 4, Problem 1. For problem 2, here are some typos/errors in Chow and Johnson's presentation of the Invitation Alg:

Project: Please notice the due dates at the top of the project page. Your assignment for the coming week is to choose a project and form a team.

11 Feb

Project: Information about the course project is available via the Project Page link in the Table of Contents.

9 Feb

Exams: Exam1 will be on Thu, 11 Mar. Exam2 will be on Thu, 22 Apr. Both exams will be "in class". If you have a schedule conflict with these dates, let me know immediately. Here are Exam Guidelines.

8 Feb

Schedule Change (Reminder): We will meet at 7:45pm-9pm on Wed, 10 Feb, in addition to the usual timeslots on Tue and Thu.

5 Feb

Homework: Homework 4 was distributed yesterday in class.

Homework: Here is a Homework 3 Solution. If you have a mailbox in LH 215, I probably put a copy there; please check before printing it.

4 Feb

Clarification: In the clarification of 29 Jan, I referred to the three wait stmts in Algorithm Listing 10.20. I miscounted; there are four.

3 Feb

Contest Results: Here are some typos/errors: The first error is the one I expected people to find. Congratulations to the two people who found it!

31 Jan

Clarification: In the clarification of 29 Jan, I meant that the three wait stmts and the if stmts following them should not be indented.

29 Jan

Clarification: In Algorithm listing 10.20, I think the three "wait up to T seconds ..." statements should not be indented (i.e., not be in the scope of the for loop), because for each of those statements, the process waits for at most time T (not N*T, where N is the number of processes). At least, that's my interpretation. (This is not an answer to the contest.)

Contest: There is (at least) one typo/error in the pseudo-code for the Bully Algorithm in the handout from Chow and Johnson. Anyone who reports it (by email to me) by noon on Sunday gets 20% extra credit on hw3.

Schedule Change: My office hour on 2 Feb will be moved to 3 Feb, 11am-noon.

Homework: Here is a Homework 2 Solution.

Handout: The following handout was distributed during lecture yesterday: Homework 3.

28 Jan

Schedule Change: We will not meet on 2 Feb. Instead, we will meet at 7:45pm-9pm on Wed, 10 Feb (the only proposed alternative timeslot that no one objected to). There was a typo in my email about rescheduling the class; I typed "Wed, 9 feb" instead of "Web, 10 feb". I hope that doesn't affect anyone's response.

26 Jan

Handout: The following handout was distributed during lecture: Snapshots: Passive Monitor using Approximate Logical Clocks.

25 Jan

Homework: Yu pointed out some inconsistencies in the use of d in the proposed solution to hw1, prob1. I just revised Homework 1 Solution (the revised version contains "(v3)" in the first line).

23 Jan

Homework: Yu pointed out that my solution to 2(b) didn't deal with the possibility that a RECOVERING msg gets lost, so I just revised Homework 1 Solution (the revised version contains "(v2)" in the first line).

22 Jan

Homework: Here is a Homework 1 Solution.

Readings: The next two handouts, which are readings for the next two topics, are available in a tray near the door of LH201C. They are: (1) R. Chow and T. Johnson, Leader Election. (2) M. Frans Kaashoek and Andrew S. Tanenbaum, Efficient Reliable Group Communication for Distributed Systems.

21 Jan

Homework: Homework 2 was distributed during lecture.

19 Jan

Clarification: Regarding hw1, problem 2. In part (c), I accidentally omitted the words "and recovery" from assumption 1. Since the assignment is due soon, it's OK to answer the question either way (i.e., assuming recovery is possible, or assuming it isn't). Sorry for the confusion.

Clarification: Regarding hw1, problem 2. When a host crashes and recovers, assume that its clock is reset to zero. Do not assume the existence of an external time service that can tell the recovering host the "current time". In parts (a) and (b), it is sufficient to consider only server crashes; for extra credit, you can consider client crashes, too. In part (c), consider both client and server crashes.

16 Jan

Current Events: Here's an article related to our discussion of at-most-once delivery: The risks of a first failure.

14 Jan

Clarification: Let r be the rate a clock is advancing relative to the rate a perfect clock advances. The clock drift is r-1. Thus, a perfect clock has r=1 and drift rate equal to 0. In problem 1 of hw1, I meant to say that d is an upper bound on the absolute value of the clock drift. Sorry for the confusion.

Handout: The following handout was distributed during lecture: Homework 1.

Reading: If you decided to get the recommended textbook (hereafter called CDK), you might want to read chapters 1-3 for general background on distributed systems and networks. Also, Section 4.3 of CDK discusses request-reply communication.

Reading: Read the System Models handout.

12 Jan

Handout: The following handout was distributed during lecture: System Models.