CSE 304/504: Compiler Design

Spring 2016

Homework #4, Type Checking & Name Resolution

Due: April 4

Homework Description (Reference: Decaf manual, in PDF)
Project Path
Deliverables
Grading
Submission
Change Log

Read the following description carefully before starting on the assignment. Also, go through the brief overview of the project path that describes the steps involved. A list of what materials are available and what you are expected to submit are also given.

Homework Description

In this assignment, you will build a type checker for Decaf. Recall the description of Decaf's syntax is given in its manual, and the AST you built in HW #3. The type checker you will build in this assignment will:

Add type information to AST
Resolve names used in field access and method invocation expressions

As in the past, you will produce a compiler front end that will take a single command-line argument, which is a file name; if that file contains a syntactically valid Decaf program, then it will construct an AST for that program. Upon successful construction of the AST, it check the type constraints as outlined in this handout (see section on type checking), and resolve any names that could not be resolved when the AST was constructed (see section on name resolution). Any errors found during type checking and name resolution should be reported with sufficient detail (including the nature of the error and the line numbers in the source program corresponding to the error). If type checking and name resolution proceeded without any error, it will then print the AST decorated with types to standard output (see section on printing).

Changes to the AST Structure
Type Checking Constraints
Name Resolution
Additional Error Checks
Printing the AST
Initialization
CSE 504 vs. CSE 304

Changes to AST Structure:

The AST structure remains mostly unchanged from HW #3. The changes are as follows. (AST structures not mentioned below are unchanged).

Type Record:

From HW #3:

Each type is either an array type are an elementary type. Elementary types, in turn, may be one of built-in types: int, float, boolean, or string; or a user-defined type, which is a class name.
An array type is parameterized by its "base type", which is another type.

To the above, we add void as a built-in elementary type.
This was an omission in HW#3.
We add a class-literal type, which has a class name as a parameter. While user(A) denotes an object belonging to class "A", class-literal(A) denotes the class "A" itself.
The use of class-literal types will be described when type constraints for expressions are explained (see section type checking below).
We also add error and null as a built-in elementary type to indicate type errors and null object references (respectively).

Subtype Relation:

The type checker will rely on a subtype relation defined over types. This relation follows the discussion in class, with one addition: for the purposes of type checking, we will consider int to be a subtype of float. In summary:

Type T is a subtype of itself (i.e. the subtype relation is reflexive).
int is a subtype of float.
user(A) is a subtype of user(B) if A is a subclass of B.
null is a subtype of user(A) for any class A.
array(T1) is a subtype of array(T2) if T1 is a subtype of T2.
null is a subtype of array(T) for any type T.
class-literal(A) is a subtype of class-literal(B) if A is a subclass of B.

Changes to Expressions:

Three expressions--- field access, method invocation, and new object creation--- will all have a new piece of information in the AST denoting the actual field/method/constructor accessed in this expression. This information is determined by name resolution. For instance, consider a field access expression e.x. With name resolution, (in the absence of any errors), we will be able to determine that the expression accesses field with id j (note field id's are unique). Then the field access expression e.x will contain information that the expression refers to field j. Similarly, with name resolution, we will determine the unique method or constructor used in a method invocation or new object creation expressions (respectively).

Type Checking Constraints

Statement Record:

Each statement record will now have an additional field indicating whether or not the statement is type correct (i.e. has no type errors).

If-stmt: This statement is type correct if if the condition part is of boolean type; and the "Then" and "Else" parts are type correct.
While-stmt: This statement is type correct if if the condition part is of boolean type; and the loop body is type correct.
For-stmt: This statement is type correct if if the condition part is of boolean type; and the initializer expression, update expression and the loop body are all type correct.
Return-stmt: This statement is type correct if the type of the expression matches the declared return type of the current method.
- If the expression is empty, then the declared return type of the method must be void (and vice versa).
- If the expression is not empty, then its must be type correct, and its type must be a sub-type of the declared return type of the method.
Expr-stmt: This statement is type correct if the expression is type correct.
Block-stmt: This statement is type correct if all the statements in the sequence are type correct.

Expression Record:

Each expression record will now have an additional field indicating its type.

Constant-expressions:
- Integer-constant: has type int.
- Float-constant: has type float.
- String-constant: has type string.
- True and False have type boolean.
- Null has type null.
Var-expression: has, as its type, the declared type of the corresponding variable.
Unary-expressions:
- uminus e: has the same type as e if e's type is int or float; has type error otherwise.
- neg e: has type boolean if e's type is boolean; has type error otherwise.
Binary-expressions:
- Arithmetic operations add, sub, mul, div: have type int if both operands have type int; otherwise, have type float if both operands have type int or float; otherwise have type error.
- Boolean operations and, or: have type ~~int~~ boolean if both operands have type boolean; otherwise have type error.
- Arithmetic comparisons lt, leq, gt, geq: have type boolean if both operands have type int or float; otherwise have type error.
- Equality comparisons eq, neq: have type boolean if the type of one operand is the subtype of another; otherwise have type error.
Assign-expression: Consider e1 = e2. The type of this expression is e2's type, provided:
- e1 and e2 are type correct,
- e2's type is a subtype of e1's type.
Auto-expression: has the same type as its argument expression e if e's type is int or float; has type error otherwise.
Field-access-expression: Let p.x be the field access expression, and let z be the field obtained by name resolution. Then this expression's type is same as z's declared type, provided:
- p's type is user(A), and z is a non-static field.
- p's type is class-literal(A) and z is a static field.
This expressions type is error if name resolution had errors or the above conditions do not hold.
Method-call-expression: Let p.f(e1, ..., en) be the method access expression, and let h be the method obtained by name resolution. Then this expression's type is same as h's declared return type, provided:
- p's type is user(A), and h is a non-static method.
- p's type is class-literal(A) and h is a static method.
This expressions type is error if name resolution had errors or the above conditions do not hold.
New-object-expression: Let A(e1,...,en) be the constructor and arguments in this expression. The type of this expression is user(A) if name resolution succeeds.
This-expression: has type user(A) where A is the current class.
Super-expression: has type user(B) if B is the immediate superclass of the current class; and error if the current class has no superclass.
Class-reference-expression: has type class-literal(A) where A is the literal class name in this expression.
Array-access-expression:^{504 only} has type T if the base expression's type is array(T) and the index expression's type is int; has type error otherwise.
New-array-expression:^{504 only} Let T be the base type and e1,...,en be the sequence of array dimensions. Then, this expression has type array(array(...repeated n times (T)...)), provided each expression ei is of type int. This expression has type error if any of the dimension expressions is not of int type.

Name Resolution:

Decaf resolves overloaded names statically, at compile-time, using type information.

Note that all local variables are resolved while constructing the AST. The only names that are not resolved at that time are the fields, methods and constructors.

Fields:

Consider a field access expression of the form e.x for some base expression e.

If e's type is user(A): Then the expression resolves to field with id j if B is the most specific superclass of A such that
- B has a declared non-static field named x with id j,
- and the field is accessible (based on the public/private declarations) from the current method.
If e's type is class-literal(A): same as above except B has a static field named x.

Methods:^{Simpler for 304}

Consider a field access expression of the form e.f(e1,...,en) for some base expression e.

Let the types of e1,...,en be T1,...,Tn, and all the argument expressions are type correct.

If e's type is user(A): Then the expression resolves to method with id j if B is the most specific superclass of A such that
- B has a declared non-static method named f with id j such that
- the method j is applicable: i.e., the types of the formal parameters of that method are T1',...,Tn' and each Ti is a subtype of Ti',
- there is no other non-static method named f in B that is also applicable and has formal parameter types T1'',...,Tn'' that are not all subtypes of T1',...,Tn'.
- and the method is accessible (based on the public/private declarations) from the current method.
If e's type is class-literal(A): same as above except B has a static method named f.

For CSE 304 students: note that you do not have to deal with method overloading. Hence, you need to find the applicable method, and if successful, whether it is accessible.

Constructors:^{Simpler for 304}

Consider a new object creation expression of the form A(e1,...,en) for some class A.

Let the types of e1,...,en be T1,...,Tn, and all the argument expressions are type correct.

The constructor resolved for this expression is the constructor with id j declared in class A such that:

the constructor j is applicable: i.e., the types of the formal parameters of that constructor are T1',...,Tn' and each Ti is a subtype of Ti',
there is no other constructor in A that is also applicable and has formal parameter types T1'',...,Tn'' that are not all subtypes of T1',...,Tn'.
and the constructor is accessible (based on the public/private declarations) from the current method.

For CSE 304 students: note that you do not have to deal with constructor overloading. Hence, you need to find the applicable constructor, and if successful, whether it is accessible.

Additional error checks:504 only

While performing type checking, when an error is encountered, you are expected to print out information about the error (e.g. "Type error at line... int expression found when boolean is expected."). In addition, you should detect the following errors related to overloaded methods and constructors.

Note that there may be multiple methods (and, of course, constructors) with the same name defined in a single class. For overloaded methods/constructors, we will now impose an additional constraint. For every pair of methods with the same name (or constructors) defined in a single class:

Let T1 → S1 and T2 → S2 be their signatures. Then either T1 is a strict subclass of T2 or T2 is a strict subclass of T1.
Here, we say that T is a strict subclass of T' if T is a subclass of T', and T is not identical to T'.

Your type checker is expected to check the above constraint and report any errors.

Printing the AST

AST printing will follow the same form as in HW #3 with the following changes:

Each field access, method invocation and new object creation expression will contain additional information: the id of the resolved field/method/constructor respectively. For instance, instead of
```
Field-access(This, x)
```
you should print
```
Field-access(This, x, 1)
```
(if "1" is the id of the resolved field named "x").
Each assignment expression will additionally contain the types of the LHS and RHS expressions. Note this is only for assignment expressions, and not for others. For instance, instead of
```
  Expr( Assign(Variable(1), Method-call(This, f, [])) )
```
you should print
```
  Expr( Assign(Variable(1), Method-call(This, f, []), int, int) )
```

Initialization

Same as in HW #3.

CSE 504 vs CSE 304

CSE 504 students are expected to construct a type checker for the entire Decaf language.
CSE 304 students are expected to construct a type checker for the a subset of Decaf that is the whole language except the following:
- Arrays: there are no arrays in the CSE 304 subset.
- Implicit field accesses: all field accesses in the CSE 304 subset will specify the object explicitly.
- Implicit method calls: all method calls in the CSE 304 subset will specify the object explicitly.
- No overloaded methods/constructors: Assume that in each class, there is at most one constructor, and at most one method with a given name. Also assume that names of methods defined in a class are distinct from method names in its super classes. You do not need to check for these constraints. Your type checker can safely assume these are true!.
- The above assumption leads to a significant simplification in name resolution. This also eliminates additional error checks.

Project Path

Start with a working AST builder.
Assume no method invocations, constructors or field accesses; Write a type checker that works on this restricted subset of programs. You may want to work with even smaller subsets than these.
Proceed by adding field accesses first, and finally method invocations and constructors.

Deliverables

Your front-end for a Decaf compiler should be organized into the following files:

Lexer: decaflexer.py, PLY/lex scanner specification file. (unchanged from HW2)
Parser: decafparser.py, PLY/yacc parser specification file. (unchanged from HW3)
AST: ast.py, table and class definitions for Decaf's AST (modified from HW3 to add type-related fields and definitions).
Type Checker: typecheck.py, definitions for evaluating the type constraints and for name resolution.
Note: This may be folded into ast.py as long as readability is maintained.
Main Top-Level: decafch.py, containing the main python function that ties the modules together, takes command line argument of input file name, and processes the Decaf program in that file.
Documentation: README.txt which documents the contents of all the other files.

We will be looking for a submission with the five/six files specified above, with the specified names. Deviating from these specifications may entail loss of points.

Grading

This maximum score on this assignment will be 40, divided as follows: 2 points for correct type inference for simple expressions (those that do not have other expressions as parts), 4 points for other simple expressions (binary/unary/auto) 8 points for correct resolution of field names and typechecking of field expressions; 8 points for correct resolution of method names and typechecking of method invocations; 4 points for correct resolution of constructors and typechecking of new object creation expressions; 8 points for correct typechecking of other expressions and statements; 4 points for error reporting; and 2 points for documentation/program readability.

As usual, complicated programs may not get the full grade unless accompanied by comments documenting it.

Submission

Submit your homework using Git, as done for HW1, before midnight of Mon., April 4.

We will assume that HW4 will be submitted by the same teams that submitted HW3. If there is any change in the team structure, you must inform the instructor the Friday before the submission deadline (Fri. April 1) and get the submission repositories for the new teams set up.

You may submit the homework multiple times; the last submission will be graded.

Change Log

Tue Mar 29 20:08:43: Types for "and" and "or" binary operators were incorrectly specified in the original handout. Expressions with these operators at root should be boolean, not int!

C. R. Ramakrishnan

Last modified: Tue Mar 29 20:10:00 EDT 2016