CSE-304 Programming Assignment #4
Abstract Syntax Tree
Read the following description carefully before starting the assignment.
Also, go through the brief overview of the project path
that describes the steps involved. A list of what materials
are available and what you are expected to submit
are also given. Finally, make sure you understand and follow the
submission requirements.
Description:
In this assignment, you will build the abstract syntax tree (AST) for the
Decaf compiler. The structure of the AST is given below. You will
build the AST by adding actions to the parser for Decaf. The lexical
analysis and symbol table modules that you defined will also be needed.
Below, Notational conventions are defined first,
followed by the definition of the AST for Decaf.
Notational Conventions
In the following, we use a term-notation to represent trees.
Each node in a tree is labeled with a symbol whose arity
specifies how many children (references) the node can have
in any tree. Clearly, symbols that occur only at the leaves of a tree have
arity of zero.
For instance, ROOT_SYM(tree1, tree2)
denotes a tree that has the binary symbol ROOT_SYM as the root,
with tree1 as the first child (reference) and tree2
as the second child (reference).
We denote the symbols that occur in the trees by identifiers that contain
only capital letters. We denote sets of trees by identifiers that contain
lower-case letters. Sets of trees are also called tree types.
We use a notation similar to context free grammars to define tree types.
For instance, the notation Tt0 :: ROOT_SYM(Tt1,Tt2)
means that the set of trees represented by Tt0 includes
all and only those trees that can be constructed with ROOT_SYM as
the root symbol and two trees of type Tt1 and Tt2
as the first and second children respectively. For example, INT_EXPR(5) is an element of the tree type INT_EXPR(Int_val)., where Int_val
is the primitive type int.
As ususal, we use T* to represent a list of zero or more trees
of type T and T1|T2 to represent the
union of the tree sets (types) T1 and T2.
NOTE: in Java, the tree type Tt0 may be implemented as a class
with constructor ROOT_SYM and with two attributes of (tree) types T1
and T2, respectively. Moreover, a tree type
Tt :: T1 | T2 may
be implemented by defining a "dummy" class Tt which is extended to T1
and T1.
Abstract Syntax Tree
We build abstract syntax trees only for the expressions, statements
and types that occur in a program. On encountering declarations, such as class
declarations, method (function) declarations and field (variable) declarations,
in addition to creating abstract syntax trees and binding their entries to the
corresponding entity-blocks, we update the symbol table
entry for the entities declared.
We will use the info field of Entity_Block
(from the symbol tables assignment) to hold the information about each symbol.
Below, we first describe the AST-structure for Types, Expressions and
Statements, followed by a desciption of how symbol tables are updated.
Types, Expressions and Statements
Type :: TYPE_ERROR
| TYPE_VOID
| TYPE_INT
| TYPE_BOOL
| TYPE_FLOAT
| TYPE_USER(Entity)
| TYPE_ARROW(Type*, Type)
| TYPE_UNKNOWN
Expression :: INT_EXPR(Int_val)
| FLOAT_EXPR(Float_val)
| BOOL_EXPR(Bool_val)
| STRING_EXPR(String_val)
| ID_EXPR(Entity, Name)
| NEW_CLASS(Entity, Expression*)
| NEW_ARRAY(Type, Expression)
| METHOD_EXPR(Expression, Expression*)
| ARRAY_EXPR(Expression, Expression)
| FIELD_EXPR(Expression, Name)
| BINARY_EXPR(BinaryOp, Expression, Expression)
| UNARY_EXPR(UnaryOp, Expression)
| NULL_EXPR
| SUPER_EXPR
| THIS_EXPR
BinaryOp :: BINOP_PLUS
| BINOP_MINUS
| BINOP_MULTIPLY
| BINOP_DIVIDE
| BINOP_AND
| BINOP_OR
| BINOP_EQUAL_EQUAL
| BINOP_NOT_EQUAL
| BINOP_LESSER
| BINOP_GREATER
| BINOP_LESSER_OR_EQUAL
| BINOP_GREATER_OR_EQUAL
| BINOP_ASSIGNMENT
UnaryOp :: UNARY_MINUS
| UNARY_NOT
Statement :: IF_THEN_ELSE_STMT(Expression, Statement, Statement)
| WHILE_STMT(Expression, Statement)
| FOR_STMT(Statement, Expression, Statement, Statement)
| BLOCK_STMT(Statement*)
| VARIABLE_DECL_STMT(Entity*)
| EMPTY_STMT
| EXPRESSION_STMT(Expression)
| RETURN_STMT(Expression)
| BREAK_STMT
| CONTINUE_STMT
The following tree types correspond to (primitive) types:
Int_val == int
Float_val == float
String_val == String
Bool_val == boolean
Tree type Entity corresponds to (references to) the entity blocks used in the symbol
table module. Name is a string, represented by a String object.
Declarations
When a field (variable), method (function) or class is defined, the symbol
table is updated with the appropriate information. For VarDecl, the list of
entities declared by this statement are entered in the AST --
see VARIABLE_DECL_STMT(Entity*) above.
For other declarations, the AST is simply the list of entities that
were declared: Declaration :: Entity*.
When a class is defined, an entity of kind CLASS_ENTITY must be
created. The relevant information that must be stored with a class entity are:
- its visibility (public/private/static),
- a pointer to the super-class (if the class declaration extends another class), and
- a list of pointers to the entities corresponding to all fields and methods defined in the class.
When a method is defined, an entity of kind METHOD_ENTITY must be
created. The relevant information for a method entity are:
- its visibility (public/private/static),
- list of entities that correspond to the method's formal parameters,
- the type of its return value (i.e., the AST corresponding to its type), and
- the AST of the statements in the body of the method.
The return type of constructor methods is the class in which the method is
defined. For instance, constructor methods in class A return objects
of type TYPE_USER(A).
When a variable is defined, an entity of kind VARIABLE_ENTITY must
be created. The relevant information stored with a variable entity are:
- its visibility (public/private/static),
- its type (i.e., the AST corresponding to its type),
The other fields in class, method and variable entities will be filled in
and used in subsequent phases of compilation; you need not worry about them
for the current project.
Block Structure
While parsing, whenever we see a {, we enter a new scope; similarly,
we leave the current scope and re-enter the enclosing one upon encountering
}. You must invoke enter_scope() and leave_scope()
at the appropriate times when parsing a program, to ensure that the symbol
table is properly maintained.
Error handling
A nominal amount of error handling, such as detecting duplicate class
and field (variable) definitions, is expected. Do not attempt to
check for type-correctness at this stage; that will be the next
assignment. Your program is expected to continue parsing and constructing the
AST even after errors have been detected.
Project Path
To create the AST you will have to use jjtree. To generate only
the appropriate nodes, rename them appropriately, insert
actions and proceed as discussed in the class. You may also want to look at a
the examples provided by the jjtree package. To define the appropriate fields
inside the nodes, you have to edit the classes generated by jjtree and
javacc. The number of children is already stored in each node and accessible via
the jjtree node interface. The symbol table, has to be declared at the top level
of your program, such that it is accessible to the actions.
Available
Material
- Sample input files in* are in ~cse304/pub/ast/tests.
Deliverables
The handin must contain:
- a jjtree file that can be compiled with make file and run.
- the main program should dump at the end the created AST.
Submission Requirements
Please note the following submission requirements. These are not simply
conventions, but are requirements. They make grading of the projects
easier, and lets us spend more time wading through the code to award partial
credit! They are not difficult to follow, so failure to do so will result
in severe penalties (project may not be graded!). If you have trouble with
the set-up, sen me an email.
- The Makefile must be such that it can successfully recompile your program
with the files you submit.
- You will not be submitting .class files for any class. You
are strongly encouraged to get your Makefile to such a shape that your classes
can be successfully compiled in any directory using only the Makefile
itself, and those source files that you have written. That is, instead of
copying the common modules, change you Makefile to refer to the common files
directly. Again, if you have any trouble with Makefile (how they work etc.
etc.) done hesitate to ask me.
- Make sure that there are no absolute references in your Makefile to
files in your own directory. Remember: the files you submit are packaged
and sent to a grading account; so the files you submit must be capable of
building a stand-alone system.
- Here is a suggestion for making sure that you submit all the necessary
files and that `make' will work.
$ mkdir Handin
$ ...copy all the files to be submitted into Handin ...
$ cd Handin
$ make # check that all goes well; if so, proceed
$ make clean # remove unnecessary .class files
$ rm DecafParser # as well as the executable
$ handin ast *
If your submission is in multiple directories, there MUST be a Makefile
at the top level, and it must generate the executable DecafParser also at
the top level.
Please note: The Makefile MUST SUCCESSFULLY MAKE DecafParser
WITHOUT REQUIRING ANY ADDITIONAL FILES. OTHERWISE, YOUR PROJECT MAY NOT BE
GRADED!
Radu Grosu