CSE 304/504: Compiler Design

Spring 2016

Homework #2

Due: Feb. 29 Mar. 2


Contents:


Read the following description carefully before starting on the assignment. Also, go through the brief overview of the project path that describes the steps involved. A list of what materials are available and what you are expected to submit are also given.

Homework Description

This assignment expects you to build a lexical analyzer and parser for Decaf. The description of Decaf's syntax is given in its manual. Your lexical analyzer and parser will be built using PLY. In this assignment, you are expected to write a "syntax checker" that uses your lexical analyzer and parser. The syntax checker takes a single command-line argument, which is a file name, and determines whether or not that file contains a syntactically valid Decaf program. The syntax checker should simply output "Yes" if the program is syntactically valid, and reasonable error message(s) if the program has syntax errors.

CSE 504 vs CSE 304:

Project Path

  1. Read through Decaf manual, and make sure you understand the overall structure of the language.

  2. Identify the set of tokens that will be recognized by your lexical analyzer. You may need to revise this set when you work on the later steps as well. It may be useful to begin with a slice of Decaf, e.g. a (rather useless) subset of the language with no constructor or method definitions, and build a lexical analyzer for the tokens in this subset. Moreover, you will find that definition of certain tokens will be significantly simpler than others; you can specify stubs for the difficult ones at first, and then come back to finish the job.

  3. In its manual, Decaf's syntax is given in Extended Backus-Naur Form (EBNF). Specifically, grammar rules in EBNF are of the form

    L ::= r1 | r2 | ... | rn

    where L is a non-terminal symbol (left hand side of a production) and each ri is a (possibly empty) right hand side of a production. In the standard notation of grammars described in class, the right hand side of a production is a sequence of terminal and non-terminal symbols. In EBNF, the right hand side may be written using a regular expression-like notation as well. For instance, the following EBNF rule represents a string of one or more a's separated by b's (e.g. ababa):
    S ::= a (b a)*

    The single rule above can be represented by the set of productions in standard notation below:

    Sa T
    T → ε
    Tb a T

    Your task in this assignment is to come up with a PLY specification for Decaf syntax specified in EBNF. PLY/yacc generates an LALR(1) parser, which means that it can handle only a subset (but a large subset) of context free grammar specifications. Hence you will need to make sure that the expanded grammar can be handled by PLY. If your specification is ambiguous, make sure that you also specify disambiguating rules unless you are sure that PLY's default behaviour is what you want.

    As you construct the PLY/yacc specification, you may revise the set of tokens you defined in step 2.

  4. Once your token set is nearly completely determined, complete the construction of the lexical analyzer using PLY/lex.

  5. Interface your lexical analyzer properly with your parser. Write glue code to take input from a specified file name (argument to your syntax checker), and make your lexical analyzer generate tokens from the contents of that file.

  6. Wrap up by adding code to write suitable messages in case of scanning or parsing errors. Make sure the error messages refer to appropriate line/character position in the input.

Hints:

Deliverables

Your syntax checker for Decaf should be organized into the following files:
  1. Lexer:    decaflexer.py, PLY/lex scanner specification file.
  2. Parser:    decafparser.py, PLY/yacc parser specification file.
  3. Main Top-Level:    decafch.py, containing the main python function to put together the parser and lexer, take input from given file, etc., and perform syntax checking.
  4. Documentation:    README.txt which documents the contents of all the other files.
Your PLY/yacc-generated parser may have shift-reduce conflicts. Although in most cases such conflicts indicate coding errors, your may have verified that the conflicts and the way PLY/yacc resolves them are ok for your syntax checker. If this is the case, make sure you explain this in the README file.

We will be looking for a submission with the four files specified above, with the specified names. Deviating from these specifications may entail loss of points.

Grading

This maximum score on this assignment will be 20, divided as follows. 7 points for token specifications for lexical analysis; 7 points for grammar specifications; 2 points for the interfaces between lexical analyzer and parser, and between these two modules and your main syntax checker program; 3 points for quality of error messages; and 1 point for documentation/program readability.

As usual, complicated programs may not get the full grade unless accompanied by comments documenting it.

Submission

Submit your homework using Git, as done for HW1, before midnight of Mon., Feb 29.

We will assume that HW2 will be submitted by the same teams that submitted HW1. If there is any change in the team structure, you must inform the instructor the Monday before the submission deadline (Mon. Feb 22) and get the submission repositories for the new teams set up.

You may submit the homework multiple times; the last submission will be graded.

Change Log


C. R. Ramakrishnan
Last modified: Mon Feb 15 22:46:22 EST 2016