CSE626 (Spring `03)
Advanced Programming Languages

[ General Information | Course Outline | Lectures | Handouts | Other Pointers | Requirements ]
[ Announcements ]

General Information

Course description: This course is for students interested in optimization of programs that are written in very high-level languages. We will look particularly at database languages (including XML query languages) and collection classes in object-oriented languages (such as Java collection classes). The course will mainly discuss general and systematic methods for developing correct and efficient algorithms and programs starting from specifications or queries written in these very high-level languages. Each student will do a project that involves designing and implementing an advanced database application, evaluating and improving an XML implementation, developing and implementing a general optimization method, finding and fixing performance bugs in an existing application, or other tasks that exploit or support the methods studied. | Prerequisites: a programming language or compiler course, an algorithm course, a database course, and skills for programming in a high-level language like Java; or permission of the instructor. | Credits: 3.

Instructor: Annie Liu | Email: liuATcsDOTsunysbDOTedu | Office: Computer Science 1433 | Phone: 632-8463.

Hours: Wed 10:00AM-12:30PM, in Computer Science 1211. | Office hours: after class or by appointment.

Textbook: There is no textbook for the course; relevant papers and additional references will be given as the course proceeds. Taking good lecture notes is an important part of the course.

Grading: Weekly homework and a course project, each worth 50% of the grade; no exams. No credit for late submissions.

Course homepage: http://www.cs.sunysb.edu/~liu/cse626/, containing all course-related information.

Course Outline

We will first overview the core of database languages (including relational, object, and XML query languages) and collection classes in object-oriented languages, and discuss the problem of developing correct and efficient algorithms and programs starting from specifications or queries written in these very high-level languages.

We will then focus on the main part of the course: a systematic method for optimization of programs and for design of algorithms and data structures starting with very high-level problem specifications. The method has three important components: the central component embodies the core idea, which is to replace repeated expensive computations with efficient incremental operations; a higher-level component determines how computation proceeds iteratively, and a lower-level component determines appropriate data-structure support.

The method will be discussed for our favorite language X, yet to be defined:-). In particular, we will first concentrate on operations involving sets and maps, the most important high-level constructs for data manipulation. We will then discuss the same method for optimizing the likely more familiar constructs, loops and arrays, and perhaps less familiar constructs, recursive functions and structures. We will finally discuss the method applied to rules and objects in general.

We will use nontrivial problems (taken from many applications in computer science) as examples throughout the course, and we will add more such problems as time permits. We will also motivate many program (i.e., sophisticated data) analysis problems as we proceed.

Course work: The course project will be determined based on the interests of both you and me and will be due on the last day of class. The weekly homework will mainly be to complete and organize the lecture notes you take in each class, and to answer some questions given in class, and will be due on the following Tuesday before 11am; it will be partly graded, possibly with suggestions for improvement, and returned to you before the next class; finally, your entire notes will be completely graded near the end of the class. You will get full credit if you take perfect notes, but since one can not generally do so, you may still get full credit by adding your own examples or explanations, asking good questions, etc.


Lecture 1 (01/22/03): Overview: algorithm design and program construction; correctness, efficiency, productivity, and tradeoffs; DB languages and collection classes in OO languages; very high-level languages and systematic transformation/compilation methods.
Lecture 2: A systematic method for algorithm design and program development based on set-based languages: iterate, incrementalize, and implement; graph reachability example; basic ideas. Reading: Chapter 5 of Introduction to Algorithms by Cormen, Leiserson, and Rivest. Homework: Lecture notes and project ideas.
Lecture 3 (01/29/03): A language with built-in sets and maps: set and map operations, set formers, quantified expressions, for-loop, map assignments, aggregate operations.
Lecture 4: Programming with sets and maps: graph reachability, tree center, topological sort, cycle detection, attribute closure. Reading: The SETL2 Programming Language (Parts 2 and 3). Homework: Lecture notes and more on project ideas.
Lecture 5 (02/05/03): Incrementalize / finite differencing: basic ideas; derivative: pre and post; derivative w.r.t. code block; sequential composition, chain rule.
Lecture 6: Discussion and auxiliary maps: pre vs post derivatives; strict operations; discontinuity removal using auxiliary expressions. Reading: Handout Paige and Koenig paper (pages 402-423, Appendix B). Homework: Lecture notes.
Lecture 7 (02/12/03): Finite differencing examples: canonical forms, expensive computations and auxiliary maps, use of chain rule and derivatives.
Lecture 8: Loop optimization using finite differencing: initialization, vertical and horizontal fusion; eliminating dead code. Reading: Handout Paige and Koenig paper (pages 429-442, Appendix A). Homework: Lecture notes.
Lecture 9 (02/19/03): Iterate / dominated convergence: overview; partial-order set, chain, lattice, monotone, inflationary; language with fixed points.
Lecture 10: Basic theory, chaotic iteration; dominated convergence, workset, increment; simplifying initialization. Reading: Posted Cai and Paige paper (Sec.1-3.2, Th.5, Ex.11, Sec.6-7). Homework: Lecture notes.
No class: February 26. Homework: Project description, plan, initial effort and results.
Lecture 13 (03/05/03): Implement / real-time simulation: set machine vs RAM; problems with hashing and hidden copy; classes of operations; based representations.
Lecture 14: Types and constraints, associating representations (data structures, in picture:-) with types; inferring types based on operations; inferring values. Reading: Handout Paige 89 paper and posted notes by Paige. Homework: Lecture notes.
Lecture 15 (03/12/03): Viewing a program transformation system at work: attribute closure example; fixed-point spec; iterate, min step; incrementalize, an aux map; implement, two bases; complexity.
Lecture 16: Generate C code using APTS, additional examples in APTS. Summary of methods, additional issues and results, other examples, future work. Reading: Handout Paige PLILP 94 paper. Homework: Lecture notes and using APTS.
Spring break: March 17-22.
Lecture 17 (03/25/03): Optimizing loops and arrays: incrementalizing loop body, saving additional values; folding initialization, replacing termination test, minimizing loop body.
Lecture 18: Optimizing aggregate array computations: AAC and subscript update operation, incrementalization, forming optimized loop, caching additional values. Reading: Handout slides for two papers. Homework: An example.
No class: April 2.
Lecture 21 (04/09/03): Optimizing recursive functions: determining increment, caching, incrementalizing, pruning, forming optimized programs; indexed vs recursive data structures, selective caching.
Lecture 22: Transforming recursion to iteration: basic approach; multiple base cases, recursive cases, recursive calls; optimizations on linear data structures.Reading: Handout slides from papers. Homework: An example.
Another break: April 16-18.
Lecture 23 (04/23/03): Optimizing rules and objects: from Datalog rules to analysis programs; iterate, incrementalize, implement; computing worst-case time and space complexity, optimality.
Lecture 24: Analysis of object queries and updates: incrementalizing queries with respect to updates; analyzing expensive queries, primitive updates, and performance. Reading: Handout slides. Homework: Posted Paige and Yang "read" paper.
Lecture 25 (04/30/03): Additional examples and applications: high-level reading and data structure compilation: problem, challenge, 3 stages, AST, ASD.
Lecture 26: Multiset discrimination of objects, tuples, and sets (explained by Tom Rothamel). Reading: More on the "read" paper. Homework: Project report.
Lecture 27 (05/07/03): Program transformations, analysis, and systems: motivation, synthesis vs reuse; synthesis, optimizations (composition, specialization, incrementalization), refinement.
Lecture 28: Program analysis: data-flow analysis, abstract interpretation, type-based, set constraints, automata, formal languages. Program manipulation systems: syntax, semantics, transformations, interface. Reading: Anything you like. Homework: Have a nice summer!


Handout Q: Questionnaire

Papers and slides are handed out in class.

Other Pointers

SETL2: A set-based programming language. Version 2.3 (under directories binaries-2.3 and ide-2.3) supports major Unix platforms and Windows; version is 2.2 (under directory binaries-2.3) supports more older platforms.
SETL2 is documented in The SETL2 Programming Language by W. Kirk Snyder. An introduction by Robert Dewar and a description of developments including object-oriented features by Kirk Synder also come with the distribution.
Some sample programs and inputs are provided by Bob Paige (you could follow the installation instructions there, but you need to use the new ftp site ftp://cs.nyu.edu/pub/languages/setl2 and you can get the newer version 2.3).

Finite differencing of computable expressions, by Robert Paige and Shaye Koenig.
ACM Transactions on Programming Languages and Systems, 4(3):402-454, 1982.

Program Derivation By Fixed Point Computation, by Jiazhen Cai and Robert Paige.
Science of Computer Programming, 11:197-261, 1989. (ps)

Real-time simulation of a set machine on a RAM, by Robert Paige.
Computing and Information, Vol. II, pages 69-73, 1989. (ps | notes by Paige)

Viewing a program transformation system at work, by Robert Paige.
Proceedings of the Joint 6th Intl. Conf. on Programming Language Implementation and Logic Programming (PLILP) and 4th Intl. Conf. on Algebraic and Logic Programming (ALP), pages 5-24, 1994. (ps)

Old APTS: old version of the Automatic Program Transformation System developed by Paige (guide to local installation, usage, examples). Development of old APTS stopped around 1993 when Paige took on an effort to develop new APTS, aiming for a more general automatic program transformation system. Most unfortunately, Paige passed away in 1999, and new APTS was not completed and could not run the transformations developed by Paige.

Papers for the handout slides on optimizing loops [Liu-IFIP 97] and arrays [LS-ICCL 98 / LSLT-TR 02 (under minor revisions to be accepted by TOPLAS)], recursive functions and recursive data structures [LS-ESOP 99 / HOSC 03, LS-PEPM 00, LS-PEPM 02], and rules [LS-TR 02/03] can be found following links under Selected Publications on Annie's homepage

Ghostscript, Ghostview and GSview: Ghostview and GSview are for viewing and printing Postscript documents (some of the handouts for this course will be in this format). If you use Linux, then this software is already installed on your machine. On Windows, you need to download both Ghostview and Ghostscript (and also some fonts). Unzip Ghostview and run setup.exe. It will unpack and install the rest.

The gzip homepage: GNU zip for compression.


The official title of the course is Advanced Programming Languages, but you do not need to have taken CSE526 (or other graduate-level courses); the most important prerequisites are programming skills, knowledge of algorithms and data structures, and motivation for studying a systematic method for developing correct and efficient algorithms and programs.

You should learn all information on the course homepage. Check the homepage periodically for Announcements.

Do all course work. The homeworks and projects are integral parts of the course as they provide concrete experiences with the abstract ideas covered in the class.

Your handins, whether on papers or in electronic forms, should include the following information at the top: your name, course number, homework/project number, and due date.

Your work should be submitted in a neat and organized fashion; for handins on papers, if your handwriting is hard to read, then your work needs to be typed.

Your approach to solving problems is as important as your final solutions; you need to show how you arrived at your solutions and include appropriate explanations.

You are strongly encouraged to discuss with others and look up references, but you should write up your own solutions independently and credit all sources that you used. Any plagiarism or other forms of cheating will result in an F or worse.

Disability: If you have a physical, psychological, medical or learning disability that may have an impact on your ability to carry out assigned course work, please contact the staff in the Disabled Student Services office (DSS), Room 133 Humanities, 632-6748/TDD. DSS will review your concerns and determine with you what accommodations are necessary and appropriate. All information and documentation of disability are confidential.

Annie Liu