CSE 304/504: Compiler Design
Spring 2016
Homework #2
Due: Feb. 29 Mar. 2
Contents:
Read the following description carefully before starting on the
assignment. Also, go through the brief overview of the project path
that describes the steps involved. A list of what materials are
available and what you are expected to submit are also given.
This assignment expects you to build a lexical analyzer and parser for
Decaf. The description of Decaf's syntax is given
in its manual. Your lexical analyzer and
parser will be
built using PLY.
In this assignment, you are expected to
write a "syntax checker" that uses your lexical analyzer and parser.
The syntax checker takes a single command-line argument, which is a
file name, and determines whether or not that file contains a
syntactically valid Decaf program. The syntax checker should
simply output "Yes" if the program is syntactically valid, and
reasonable error message(s) if the program has syntax
errors.
CSE 504 vs CSE 304:
- CSE 504 students are expected to build a syntax checker for the entire
Decaf language. Their syntax checkers are also expected to
identify multiple errors in a source program, and not just the first
error.
- CSE 304 students are expected to build a syntax checker for the
following subset of Decaf:
- Simple string constants. String constants may
themselves not contain the delimiting "
character. Thus, a string constant begins a " and ends
with a "
with no " in between.
- Simple floating point constants. Floating point
constants will be only of the first kind (page 2 of the manual): a
decimal point `.' with a sequence of one or more digits on either
side of it. That is, floating point constants will not use the
exponent notation.
- No arrays! This subset will not support the declaration of
array-valued fields and variables (page 3 of the manual), array expressions
(array_access in page 6), and array creation (new_array in page
7).
Moreover, their syntax checkers are expected to identify only the
first syntax error in a program.
- Read through Decaf manual, and
make sure you understand the overall structure of the language.
- Identify the set of tokens that will be recognized
by your lexical analyzer. You may need to revise this set when you
work on the later steps as well. It may be useful to begin with a
slice of Decaf, e.g. a (rather useless) subset of the
language with no constructor or method definitions, and build a
lexical analyzer for the tokens in this subset. Moreover, you will
find that definition of certain tokens will be significantly simpler
than others; you can specify stubs for the difficult ones at first,
and then come back to finish the job.
- In its manual, Decaf's syntax is given in Extended
Backus-Naur Form (EBNF).
Specifically, grammar rules in EBNF are of the form
L ::= r1 | r2 | ... | rn
where L is a non-terminal symbol (left hand side of a production)
and each ri is a (possibly empty) right hand side of a production.
In the standard notation of grammars
described in class, the right hand side of a
production is a sequence of terminal and non-terminal symbols. In
EBNF, the right hand side may be written using a regular
expression-like notation as well. For instance, the following EBNF
rule represents a string of one or more a's separated by
b's (e.g. ababa):
S ::= a (b a)*
The single rule above can be represented by the set of productions in
standard notation below:
S → a T
T → ε
T → b a T
Your task in this assignment is to come up with a PLY specification
for Decaf syntax specified in EBNF. PLY/yacc
generates an LALR(1) parser, which means that it can handle only a
subset (but a large subset) of context free grammar specifications.
Hence you will need to make sure that the expanded grammar can be
handled by PLY. If your specification is ambiguous, make sure that
you also specify disambiguating rules unless you are sure that PLY's
default behaviour is what you want.
As you construct the PLY/yacc specification, you may revise the set of
tokens you defined in step 2.
- Once your token set is nearly completely determined, complete
the construction of the lexical analyzer using PLY/lex.
- Interface your lexical analyzer properly with your
parser. Write glue code to take input from a specified file
name (argument to your syntax checker), and make your lexical analyzer
generate tokens from the contents of that file.
- Wrap up by adding code to write suitable messages in case of
scanning or parsing errors. Make sure the error messages refer to
appropriate line/character position in the input.
Hints:
- NOTE: To avoid running into difficulties with
conflicts in the grammar, you should add the grammar rules gradually --
as few at a time as possible. Then run through PLY/yacc to ensure that you
do not run into any conflicts. If you do run into conflicts this way,
then the problem is most likely to be in the rules that you just added.
You may start with only a part of the grammar, as long as the
portion you take is self-contained, to gain experience with the
construction process. For example, you may start with the top-level
structure of the language, going through
class definitions, field definitions and method definitions in that order.
You may also build a complete syntax checker
for a different small grammar (e.g. the set of valid propositional
boolean formulae) to figure out how the plumbing works, and then
build the checker for Decaf.
- Make sure you can read and understand the parser.out
file produced by PLY/yacc. This
file contains the productions, states and transitions used by the parser.
This file will also tell you where the conflicts are, i.e., which states
and which productions.
From this information, you should be able to make out what the offending
grammar rules are and how they should be modified. If, after several
attempts, you are unable to resolve the problem, contact the Instructor or
the TA.
Your syntax checker for Decaf should be organized into the
following files:
- Lexer: decaflexer.py, PLY/lex scanner
specification file.
- Parser: decafparser.py, PLY/yacc parser
specification file.
- Main Top-Level: decafch.py, containing the main
python function to put together the parser and lexer, take input
from given file, etc., and perform syntax checking.
- Documentation: README.txt which documents the contents of all the other
files.
Your PLY/yacc-generated parser may have shift-reduce conflicts. Although
in most cases such conflicts indicate coding errors, your may have
verified that the conflicts and the way PLY/yacc resolves them are ok for
your syntax checker. If this is the case, make sure you explain this
in the README file.
We will be looking for a submission with the four files specified
above, with the specified names. Deviating from these specifications
may entail loss of points.
This maximum score on this assignment will be
20, divided as follows. 7 points for token specifications for
lexical analysis; 7 points for grammar specifications; 2 points
for the interfaces between lexical analyzer and parser, and
between these two modules and your main syntax checker program; 3
points for quality of error messages; and 1 point for
documentation/program readability.
As usual, complicated programs may not get the full grade unless
accompanied by comments documenting it.
Submit your homework using Git, as done for HW1,
before midnight of Mon., Feb 29.
We will assume that HW2 will be submitted by the same teams that
submitted HW1. If there is any change in the team structure, you
must inform the instructor the Monday before the submission
deadline (Mon. Feb 22) and get the submission repositories for the
new teams set up.
You may submit the homework multiple times; the last submission
will be graded.
- Original posting on Mon Feb 15 22:13:15 EST 2016.
C. R. Ramakrishnan
Last modified: Mon Feb 15 22:46:22 EST 2016