CSE 304: Compiler Design

[Announcements][Projects]



Instructor: Radu Grosu ( grosu@cs.sunysb.edu),
Class: Mon/Wed 3:50pm-5:10pm, SB union 236 West campus
Office hours: Mon/Wed 2:30pm-3:30pm, CS Building room 1425, and by appointment

TA: Zhengyu Su ( < zhsu@cs.stonybrook.edu>),
Office hours: Tue 2:30pm-3:30pm, CS Building room 2110


Contents


Motivation

Compilers and interpreters are very widely used. Every programmer uses these tools to execute programs written in high level languages. More importantly, every programmer writes an interpreter whenever it has to design a new software tool and a compiler whenever it has to connect two existing software tools. The latter is often the first task a programmer gets assigned when joining a company. Compiler technology is also a core technology for cybersecurity and embedded systems. Last but not least, it teaches you best software engineering practice and how automata theory can be put to work.

It is hence essential for a computer scientist to understand the process by which programs written in high-level languages are translated and executed. The main objective of this course is to gain an in depth understanding of the above process. Most complex software systems are not monolithic; they are programmable using special purpose languages. An understanding of the language translation process thus plays an important role in the design of real life systems.


Objectives

By the end of this course the students will be provided with detailed knowledge and substantial experience in:

  • The major phases of compilation, particularly lexical analysis, parsing, semantic analysis, and code generation.
  • The use of formal attributed grammars for specifying the syntax and semantics of programming languages, and their impact on compiler design.
  • Software engineering by designing and implementing a significant portion of a compiler for a language chosen by the instructor.

  • Prerequisites

    You must have completed CSE 214, CSE 220, and CSE 303.

    Being a project intensive course, a good working experience with Java programming and the UNIX environment (not just familiarity) is essential. You will also need to know how to set up "make" files and use symbolic debuggers. At the very least, you should be able to pick these up within the first couple of weeks.

    Familiarity with concepts from object oriented languages (classes, inheritance etc.) will be useful but not necessary: I will briefly cover the fundamental concepts we will need for the projects.


    Overview

    In the class, we will discuss the theoretical aspects of designing a compiler. In the projects, you will then apply the theory you have learned in the class to develop a complete compiler for a high level language. The problem of language translation is traditionally decomposed into many phases. Most common are: All phases use a symbol table to keep track of the properties (types, number of parameters etc.) of each symbol (variables, procedures etc.). In the end, code is generated for an abstract machine which may be interpreted (as in Java), or may be translated further into machine instructions (as in C). We will cover each of these topics in detail.

    The compiler will be written in Java.  Advanced topics such as program optimization, machine code generation and storage management will be discussed in the lectures.


    Reading

    The course will use the textbook titled  Compilers: Principles, Techniques, and Tools, by Aho, Lam, Sethi, and Ullman, Addison-Wesley, 2007, 1009pp. ISBN 0-321-48681-1.  Slides will be posted as links in the tentative schedule below.

    Following books are recommended for additional reading:


    Software

    We will use the Java programming language and the JavaCC compiler compiler.  Documentation and free download may be found at the associated web pages. In doing the projects you might want to use the  JBuilder javacc support plugin. It is very small and easy to install and does syntax highlighting. I also displays the .jj and .jjt file structure in the structure pane.
     


    Communication

    This class is project intensive and you may want to exchange ideas and experience with each other. The best way to do this is by using the news group created specially for cse304. To subscribe to this group, proceed as follows (e.g. with  the netscape browser): (1) right click the news.sunysb.edu, (2) select subscribe to Newsgroups, (3) choose the search panel and search for sbcs.cse304, (4) select the sbcs.cse304 newsgroup and click on the subscribe button. Now you are ready to send and retrieve messages.


    Grading

    Your performance on the programming projects will be the major part of the final grades. In addition, there will be a mid-term exam and a final exam. The relative weights for the above components
    will be: The grade for projects is split between programming assignments (30%) and project presentation (10%). Near the end of the semester, each student will make a presentation of a particular portion of his/her project work to the instructor and TAs. The portion of project to be presented will be determined by the instructor and the TAs at that time; the student is expected to come prepared on all aspects of his/her code. The motivation for the presentation is to guage the understanding and insight the student had obtained by performing the project. There will be 6 projects in all. Students will work in teams of two on the projects. Both members of the team are expected to know the code.

    You are advised to start working on the projects at the earliest possible time even if the deadlines are far away. The date the projects are due will be clearly specified in the project handout, on the web; any changes in deadlines will be posted on the course homepages.

    Projects are due by 11:59pm on the specified date. Late projects will be penalized at the rate of 20% per day. So, there is no point submitting a project more than 4 days late! Each team can submit one project late by seven calendar days (1 week) without late penalty. You don't need to send me mail about this. No extensions will be allowed beyond this.


    Your Rights and Responsibilities


    Special Needs

    If you have a physical, psychological, medical or learning disability that may impact on your ability to carry out assigned course work, you are urged to contact the staff in the Disabled

    Student Services office (DSS), Room 133 Humanities, 632-6748/TDD. DSS will review your concerns and determine, with you, what accommodations are necessary and appropriate.

    All information and documentation of disability is confidential.

    The Importance of Being Earnest

    Because a primary goal of the course is to teach professionalism, any academic dishonesty will be viewed as evidence that this goal has not been achieved. Any act of cheating will be treated with utmost seriousness.

    You can discuss the course material with other students, but not the homework assignments themselves. In effect, you can discuss the problems but not the solutions. If you help another student with a homework, use examples that do not resemble those in the homework. Remember that there are many different ways to solve the same problem; even solutions with the same central idea can be formulated in many different ways. Therefore, suspiciously similar homework solutions will be considered as evidence of disallowed collaboration or copying.

    In case you have any questions about whether an act of collaboration may constitute "cheating", please come and talk to the instructor beforehand to clarify the issue.

    Copying an assignment from another student in this class or obtaining a solution from some other source will lead to an automatic F for this course and to a disciplinary action. Allowing another student to copy one's work will be treated as an act of academic dishonesty, leading to the same penalty as copying. You should learn how to protect your data. Failure to do so is also unprofessional and it may expose you to the danger that someone will copy your homework and will submit it as his or her own (see above). In this case, you may be given a score of 0 for the assignment in question (and the other party will get an F).

     All cases of academic dishonesty will be reviewed by the Engineeing College's committee (CASA).

    Survival Tips

    Do not postpone working on assignments. Start working on programming assignments as soon as they are handed out. Do not wait till the day before the deadline. You will see that assignments take much more time when you work on them under pressure, than when you are more relaxed. Remember that no late submissions are allowed.

    Do not postpone working on assignments! This cannot be understated. Despite the above warning, most students will end up working only around the deadline. Remember, the homeworks usually take more time that it initially appears. Furthermore, I expect both the TA and me to be swamped on the office hours before projects are due. So, you, being wiser than the rest, should start earlier and beat the rush!

    Design before you code. Writing a well designed piece of code is always easier than staring with some code that "almost works" and adding patches to make it "really work".

    Follow good programming discipline. Why spend hours debugging when you can party? A little bit of thought before you do some esoteric pointer manipulation will save you a lot of time, headache and may be a grade!


    Tentative Schedule


     
      Date Topic Chapter Projects
    1. Aug, 29 Organization. Overview of Compilation.  l1 1  
    2. Aug, 31 Building a simple compiler 2  
    *** Sep 5 Rosh Hashanah (no class)    
    3. Sep 7 Building a simple compiler 2  
    4. Sep 12 Tokens, Regular exp.  l2 3.1-3.3  
    5. Sep 14 Recognizers, NFA, DFA   l4, l5 3.4, 3.6
    6. Sep 19 Regexp2NFA, NFA2DFA, Lex.  l5 3.5, 3.7,3.8  
    7. Sep 21 Grammars, Recursive Descent Parsing.  l6, l7 4.4  
    8.  Sep 26 Javacc 1 2 3 4 5 6 7 8 9 10, LL Parsing.   l8, l9 4.4  P1 Out  
    9. Sep 28 Bottom-up Parsing,   l10 4.5
    10. Oct 3 LR Parsers,   l11 4.7  
    11. Oct 5 Item set construction, 1234 SLR, LR, LALR,   l12 4.7  
    12. Oct 10 Attributes,   l16, l17 4.9 P1 due, P2 out  
    *** Oct 12 Yom Kipur (no class)    
    18. Oct 17 Syntax-Directed Translation,   l15,  att 5.1
    14. Oct 19 Symbol Tables,   l13,st 7.6  
    15. Oct 24 Name Resolution,    l14, l15 -- P2 due, P3 out  
    16. Oct 26 Review for Mid-Term Exam    
    17. Oct 31 Mid-Term Exam.     Previous:00e,00a;98e,98a;97e,97a,  02    
    18. Nov 2 Abstract Syntax Trees l18,  ast 5.2
    19. Nov 7 Interpreters l18 2.8, 2.9  P3 due, P4 out  
    20. Nov 9 Runtime storage organization l23 7.2, 7.3  
    21. Nov 14 Types: type checking expressions and operations l19 6.1, 6.2  
    22. Nov 16 Types & Equivalences l20 6.3, 6.4  
    23. Nov 21 Types in OO-languages: method resolution l21 6.5 P4 due, P5 out  
    24. Nov 23 Code generation: expressions l22  8.1-8.3
    25. Nov 28 Runtime Storage Organization l23,   pCall 8.4-8.7  
    26. Nov 30 Code generation: control structures l24  10.1-10.3  
    27. Dec 5 Code generation: continuations, optimization l25  10.5 P5 due
    30. Dec 14 Final Exam (5-7:30 pm)  Previous: 98e,98a;97e,97a01    


    Last updated on 2011 by Radu Grosu