CSE-304 Programming Assignment #4

Abstract Syntax Tree


Read the following description carefully before starting the assignment. Also, go through the brief overview of the project path that describes the steps involved. A list of what materials are available and what you are expected to submit are also given. Finally, make sure you understand and follow the submission requirements.


Description:

In this assignment, you will build the abstract syntax tree (AST) for the Decaf compiler. The structure of the AST is given below. You will build the AST by adding actions to the parser for Decaf. The lexical analysis and symbol table modules that you defined will also be needed.  Below, Notational conventions are defined first, followed by the definition of the AST for Decaf.


Notational Conventions

In the following, we use a term-notation to represent trees. Each node in a tree is labeled with a symbol whose arity specifies how many children (references) the node can have in any tree. Clearly, symbols that occur only at the leaves of a tree have arity of zero.

For instance, ROOT_SYM(tree1, tree2) denotes a tree that has the binary symbol ROOT_SYM as the root, with tree1 as the first child (reference) and tree2 as the second child (reference).

We denote the symbols that occur in the trees by identifiers that contain only capital letters. We denote sets of trees by identifiers that contain lower-case letters. Sets of trees are also called tree types.

We use a notation similar to context free grammars to define tree types. For instance, the notation Tt0 :: ROOT_SYM(Tt1,Tt2) means that the set of trees represented by Tt0 includes all and only those trees that can be constructed with ROOT_SYM as the root symbol and two trees of type Tt1 and Tt2 as the first and second children respectively. For example, INT_EXPR(5) is an element of the tree type INT_EXPR(Int_val)., where Int_val is the primitive type int.

As ususal, we use T* to represent a list of zero or more trees of type T and T1|T2 to represent the union of the tree sets (types) T1 and T2.

NOTE: in Java, the tree type Tt0 may be implemented as a class with constructor ROOT_SYM and with two attributes of (tree) types T1 and T2, respectively. Moreover, a tree type Tt :: T1 | T2 may be implemented by defining a "dummy" class Tt which is extended to T1 and T1.


Abstract Syntax Tree

We build abstract syntax trees only for the expressions, statements and types that occur in a program. On encountering declarations, such as class declarations, method (function) declarations and field (variable) declarations, in addition to creating abstract syntax trees and binding their entries to the corresponding entity-blocks, we update the symbol table entry for the entities declared.

We will use the info field of Entity_Block (from the symbol tables assignment) to hold the information about each symbol. Below, we first describe the AST-structure for Types, Expressions and Statements, followed by a desciption of how symbol tables are updated.

Types, Expressions and Statements

          Type :: TYPE_ERROR
                | TYPE_VOID
                | TYPE_INT
                | TYPE_BOOL
                | TYPE_FLOAT
                | TYPE_USER(Entity)
                | TYPE_ARROW(Type*, Type)
                | TYPE_UNKNOWN


    Expression :: INT_EXPR(Int_val)
                | FLOAT_EXPR(Float_val)
                | BOOL_EXPR(Bool_val)
                | STRING_EXPR(String_val)
                | ID_EXPR(Entity, Name)
                | NEW_CLASS(Entity, Expression*)
                | NEW_ARRAY(Type, Expression)
                | METHOD_EXPR(Expression, Expression*)
                | ARRAY_EXPR(Expression, Expression)
                | FIELD_EXPR(Expression, Name)
                | BINARY_EXPR(BinaryOp, Expression, Expression)
                | UNARY_EXPR(UnaryOp, Expression)
                | NULL_EXPR
                | SUPER_EXPR
                | THIS_EXPR

      BinaryOp :: BINOP_PLUS
                | BINOP_MINUS
                | BINOP_MULTIPLY
                | BINOP_DIVIDE
                | BINOP_AND
                | BINOP_OR
                | BINOP_EQUAL_EQUAL
                | BINOP_NOT_EQUAL
                | BINOP_LESSER
                | BINOP_GREATER
                | BINOP_LESSER_OR_EQUAL
                | BINOP_GREATER_OR_EQUAL
                | BINOP_ASSIGNMENT

       UnaryOp :: UNARY_MINUS
                | UNARY_NOT

     Statement :: IF_THEN_ELSE_STMT(Expression, Statement, Statement)
                | WHILE_STMT(Expression, Statement)
                | FOR_STMT(Statement, Expression, Statement, Statement)
                | BLOCK_STMT(Statement*)
                | VARIABLE_DECL_STMT(Entity*)
                | EMPTY_STMT
                | EXPRESSION_STMT(Expression)
                | RETURN_STMT(Expression)
                | BREAK_STMT
                | CONTINUE_STMT

The following tree types correspond to (primitive) types:
         Int_val == int
       Float_val == float
      String_val == String  
        Bool_val == boolean     
Tree type Entity corresponds to (references to) the entity blocks used in the symbol table module. Name is a string, represented by a String object.

Declarations

When a field (variable), method (function) or class is defined, the symbol table is updated with the appropriate information. For VarDecl, the list of entities declared by this statement are entered in the AST -- see VARIABLE_DECL_STMT(Entity*) above. For other declarations, the AST is simply the list of entities that were declared: Declaration :: Entity*.

When a class is defined, an entity of kind CLASS_ENTITY must be created. The relevant information that must be stored with a class entity are: When a method is defined, an entity of kind METHOD_ENTITY must be created. The relevant information for a method entity are: The return type of constructor methods is the class in which the method is defined. For instance, constructor methods in class A return objects of type TYPE_USER(A).

When a variable is defined, an entity of kind VARIABLE_ENTITY must be created. The relevant information stored with a variable entity are: The other fields in class, method and variable entities will be filled in and used in subsequent phases of compilation; you need not worry about them for the current project.

Block Structure

While parsing, whenever we see a {, we enter a new scope; similarly, we leave the current scope and re-enter the enclosing one upon encountering }. You must invoke enter_scope() and leave_scope() at the appropriate times when parsing a program, to ensure that the symbol table is properly maintained.

Error handling

A nominal amount of error handling, such as detecting duplicate class and field (variable) definitions, is expected. Do not attempt to check for type-correctness at this stage; that will be the next assignment. Your program is expected to continue parsing and constructing the AST even after errors have been detected.

Project Path

To create the AST you will have to use jjtree. To generate only the appropriate nodes, rename them appropriately, insert actions and proceed as discussed in the class. You may also want to look at a the examples provided by the jjtree package. To define the appropriate fields inside the nodes, you have to edit the classes generated by jjtree and javacc. The number of children is already stored in each node and accessible via the jjtree node interface. The symbol table, has to be declared at the top level of your program, such that it is accessible to the actions.
 


Available Material


Deliverables

The handin must contain:

Submission Requirements

Please note the following submission requirements. These are not simply conventions, but are requirements. They make grading of the projects easier, and lets us spend more time wading through the code to award partial credit! They are not difficult to follow, so failure to do so will result in severe penalties (project may not be graded!). If you have trouble with the set-up, sen me an email.



Radu Grosu