CSE 304/504: Compiler Design
Spring 2016
Homework #4, Type Checking & Name Resolution
Due: April 4
Contents
Read the following description carefully before starting on the
assignment. Also, go through the brief overview of the project path
that describes the steps involved. A list of what materials are
available and what you are expected to submit are also given.
In this assignment, you will build a type checker for Decaf.
Recall the description of Decaf's syntax is given
in its manual, and the AST you built in HW #3. The type checker you will build in this
assignment will:
- Add type information to AST
- Resolve names used in field access and method invocation
expressions
As in the past, you will produce a compiler front end that
will take a single command-line argument, which is a
file name; if that file contains a
syntactically valid Decaf program, then it will
construct an AST for that program. Upon successful construction of
the AST, it check the type constraints as outlined in this handout
(see section on type checking),
and resolve any names that could not be resolved when the AST was
constructed (see section on name resolution).
Any errors found during type checking and name resolution should be
reported with sufficient detail (including the nature of the error and
the line numbers in the source program corresponding to the error).
If type checking and name resolution proceeded without any error,
it will then print the AST decorated with types to standard output
(see section on printing).
- Changes to the AST Structure
- Type Checking Constraints
- Name Resolution
- Additional Error Checks
- Printing the AST
- Initialization
- CSE 504 vs. CSE 304
Changes to AST Structure:
The AST structure remains mostly unchanged from HW #3. The changes
are as follows. (AST structures not mentioned below are unchanged).
Type Record:
From HW #3:
Each type is either an array
type are an elementary type. Elementary types, in turn,
may be one of built-in types: int, float,
boolean, or string; or a user-defined type, which is a class
name.
An array type is parameterized by its "base type", which
is another type.
Subtype Relation:
The type checker will rely on a subtype relation defined over types.
This relation follows the discussion in class, with one addition: for
the purposes of type checking, we
will consider int to be a subtype of float. In
summary:
- Type T is a subtype of itself (i.e. the subtype relation
is reflexive).
- int is a subtype of float.
- user(A) is a subtype of
user(B) if A is a subclass of
B.
- null is a subtype of user(A)
for any class A.
- array(T1) is a subtype of
array(T2) if T1 is a subtype of
T2.
- null is a subtype of array(T)
for any type T.
- class-literal(A) is a subtype of
class-literal(B) if A is a subclass of
B.
Changes to Expressions:
Three expressions--- field access, method invocation, and new object
creation--- will all have a new piece of information in the AST
denoting the actual field/method/constructor accessed in this
expression. This information is determined by name resolution. For instance, consider a
field access expression e.x. With name resolution, (in the
absence of any errors), we will be
able to determine that the expression accesses field with id j
(note field id's are unique). Then the field access expression
e.x will contain information that the expression refers to
field j. Similarly, with name resolution, we will determine
the unique method or constructor used in a method invocation or new
object creation expressions (respectively).
Type Checking Constraints
Statement Record:
Each statement record will now have an additional field indicating
whether or not the statement is type correct (i.e. has no type errors).
- If-stmt: This statement is type correct if
if the condition part
is of boolean type; and the "Then" and "Else" parts are type correct.
- While-stmt: This statement is type correct if
if the condition part
is of boolean type; and the loop body is type correct.
- For-stmt: This statement is type correct if
if the condition part
is of boolean type; and the initializer expression, update
expression and the loop body are all type correct.
- Return-stmt: This statement is type correct if
the type of the expression matches the declared return type of the
current method.
- If the expression is empty, then the declared return type of
the method must be void (and vice versa).
- If the expression is not empty, then its must be type
correct, and its type must be a sub-type of the declared
return type of the method.
- Expr-stmt: This statement is type correct if the
expression is type correct.
- Block-stmt: This statement is type correct if all the
statements in the sequence are type correct.
Expression Record:
Each expression record will now have an additional field indicating
its type.
- Constant-expressions:
- Integer-constant: has type int.
- Float-constant: has type float.
- String-constant: has type string.
- True and False have type boolean.
- Null has type null.
- Var-expression: has, as its type, the declared type of
the corresponding variable.
- Unary-expressions:
- uminus e: has the same type as e if
e's type is int or float; has type
error otherwise.
- neg e: has type boolean if
e's type is boolean; has type
error otherwise.
- Binary-expressions:
- Arithmetic operations add, sub,
mul, div: have type int if both
operands have type int; otherwise, have type float
if both operands have type int or float; otherwise
have type error.
- Boolean operations and, or:
have type
int boolean if both
operands have type boolean; otherwise
have type error.
- Arithmetic comparisons lt, leq, gt,
geq: have type boolean if both
operands have type int or float; otherwise
have type error.
- Equality comparisons
eq,
neq:
have type boolean if the type of one operand is the
subtype of another; otherwise
have type error.
- Assign-expression: Consider e1 = e2. The type of
this expression is e2's type, provided:
- e1 and e2 are type correct,
- e2's type is a subtype of e1's type.
- Auto-expression: has the same type as its argument
expression e if
e's type is int or float; has type
error otherwise.
- Field-access-expression: Let p.x be the field
access expression, and let z be the field
obtained by name resolution. Then this
expression's type is same as z's declared type, provided:
- p's type is user(A), and z is a
non-static field.
- p's type is class-literal(A) and z is a
static field.
This expressions type is error if name resolution had
errors or the above conditions do not hold.
- Method-call-expression: Let p.f(e1, ..., en) be the method
access expression, and let h be the method
obtained by name resolution. Then this
expression's type is same as h's declared return type, provided:
- p's type is user(A), and h is a
non-static method.
- p's type is class-literal(A) and h is a
static method.
This expressions type is error if name resolution had
errors or the above conditions do not hold.
- New-object-expression: Let A(e1,...,en) be the
constructor and arguments in this expression. The type of this
expression is user(A) if name
resolution succeeds.
- This-expression: has type
user(A) where A is the current class.
- Super-expression: has type
user(B) if B is the immediate
superclass of the current class; and error if the current
class has no superclass.
- Class-reference-expression: has type
class-literal(A) where
A is the literal
class name in this expression.
- Array-access-expression:504 only has type T if
the base expression's type is
array(T) and the index expression's type is
int; has type error otherwise.
- New-array-expression:504 only Let T be the base type and
e1,...,en be the sequence of array dimensions. Then, this
expression has type array(array(...repeated n
times (T)...)), provided each expression ei is
of type int. This expression has type error if
any of the dimension expressions is not of int type.
Name Resolution:
Decaf resolves overloaded names statically, at compile-time,
using type information.
Note that all local variables are resolved while constructing the
AST. The only names that are not resolved at that time are the
fields, methods and constructors.
Fields:
Consider a field access expression of the form
e.x for some base expression e.
- If e's type is user(A): Then the
expression resolves to field with id j if B is the
most specific superclass of A such that
- B has a
declared non-static field named x with id j,
- and
the field is accessible (based on the public/private declarations)
from the current method.
- If e's type is class-literal(A): same as above
except B has a static field named x.
Consider a field access expression of the form
e.f(e1,...,en) for some base expression e.
Let the types of e1,...,en be T1,...,Tn, and all the
argument expressions are type correct.
- If e's type is user(A): Then the
expression resolves to method with id j if B is the
most specific superclass of A such that
- B has a
declared non-static method named f with id
j such that
- the method j is applicable: i.e., the types of the formal
parameters of that method are T1',...,Tn' and each
Ti is a subtype of Ti',
- there is no other non-static method named f in
B that is also applicable and has formal parameter types
T1'',...,Tn'' that are not all subtypes of
T1',...,Tn'.
- and
the method is accessible (based on the public/private declarations)
from the current method.
- If e's type is class-literal(A): same as above
except B has a static method named f.
For CSE 304 students: note that you do not have to deal with method
overloading. Hence, you need to find the
applicable method, and if successful, whether it is accessible.
Consider a new object creation expression of the
form A(e1,...,en) for some class A.
Let the types of e1,...,en be T1,...,Tn, and all the
argument expressions are type correct.
The constructor resolved for this expression is the constructor with
id j declared in class A such that:
- the constructor j is applicable: i.e., the types of the formal
parameters of that constructor are T1',...,Tn' and each
Ti is a subtype of Ti',
- there is no other constructor in
A that is also applicable and has formal parameter types
T1'',...,Tn'' that are not all subtypes of
T1',...,Tn'.
- and
the constructor is accessible (based on the public/private declarations)
from the current method.
For CSE 304 students: note that you do not have to deal with constructor
overloading. Hence, you need to find the
applicable constructor, and if successful, whether it is accessible.
Additional error checks:504 only
While performing type checking, when an error is encountered, you are
expected to print out information about the error (e.g. "Type error
at line... int expression found when boolean is expected."). In
addition, you should detect the following errors related to overloaded
methods and constructors.
Note that there may be multiple methods (and, of course, constructors)
with the same name defined in a single class. For overloaded
methods/constructors, we will now impose an additional constraint.
For every pair of methods with the same name (or constructors)
defined in a single class:
- Let T1 → S1 and T2 → S2 be their
signatures.
Then either T1 is a strict subclass of T2 or
T2 is a strict subclass of T1.
Here, we say that T is a strict subclass of T' if
T is a subclass of T', and T is not identical
to T'.
Your type checker is expected to check the above constraint and report
any errors.
Printing the AST
AST printing will follow the same form as in HW #3 with the following changes:
- Each field access, method invocation and new object creation
expression will contain additional information: the id of the resolved
field/method/constructor respectively. For instance, instead of
Field-access(This, x)
you should print
Field-access(This, x, 1)
(if "1" is the id of the resolved field named "x").
- Each assignment expression will additionally contain
the types of the LHS and RHS expressions.
Note this is only for assignment expressions, and not for others.
For instance, instead of
Expr( Assign(Variable(1), Method-call(This, f, [])) )
you should print
Expr( Assign(Variable(1), Method-call(This, f, []), int, int) )
Initialization
Same as in HW #3.
CSE 504 vs CSE 304
- CSE 504 students are expected to construct a type checker for the entire
Decaf language.
- CSE 304 students are expected to construct a type checker for the a subset of
Decaf that is the whole language except the following:
- Arrays: there are no arrays in the CSE 304 subset.
- Implicit field accesses: all field accesses in the CSE 304
subset will specify the object explicitly.
- Implicit method calls: all method calls in the CSE 304
subset will specify the object explicitly.
- No overloaded methods/constructors:
Assume that in each class, there is at most one constructor, and
at most one method with a given name. Also assume that names of
methods defined in a class are distinct from method names in its
super classes. You do not need to check for these
constraints. Your type checker can safely assume these are
true!.
- The above assumption leads to a significant simplification in
name resolution. This also eliminates
additional error checks.
- Start with a working AST builder.
- Assume no method invocations, constructors or field accesses;
Write a type checker that works on this restricted subset of
programs. You may want to work with even smaller subsets than these.
- Proceed by adding field accesses first, and finally method
invocations and constructors.
Your front-end for a Decaf compiler should be organized into the
following files:
- Lexer: decaflexer.py, PLY/lex scanner
specification file. (unchanged from HW2)
- Parser: decafparser.py, PLY/yacc parser
specification file. (unchanged from HW3)
- AST: ast.py, table and class
definitions for Decaf's AST (modified from HW3 to add
type-related fields and definitions).
- Type Checker: typecheck.py,
definitions for evaluating the type constraints and for name
resolution.
Note: This may be folded into ast.py as long as
readability is maintained.
- Main Top-Level: decafch.py, containing the main
python function that ties the modules together, takes command line
argument of input file name, and processes the Decaf program in that file.
- Documentation: README.txt which documents the contents of all the other
files.
We will be looking for a submission with the five/six files specified
above, with the specified names. Deviating from these specifications
may entail loss of points.
This maximum score on this assignment will be
40, divided as follows:
2 points for correct type inference for simple expressions (those that do not have other expressions as parts),
4 points for other simple expressions (binary/unary/auto)
8 points for correct resolution of field names and typechecking of
field expressions;
8 points for correct resolution of method names and typechecking of
method invocations;
4 points for correct resolution of constructors and typechecking of
new object creation expressions;
8 points for correct typechecking of other expressions and statements;
4 points for error reporting; and
2 points for documentation/program readability.
As usual, complicated programs may not get the full grade unless
accompanied by comments documenting it.
Submit your homework using Git, as done for HW1,
before midnight of Mon., April 4.
We will assume that HW4 will be submitted by the same teams that
submitted HW3. If there is any change in the team structure, you
must inform the instructor the Friday before the submission
deadline (Fri. April 1) and get the submission repositories for the
new teams set up.
You may submit the homework multiple times; the last submission
will be graded.
- Tue Mar 29 20:08:43: Types for "and" and "or" binary
operators were incorrectly specified in the original handout.
Expressions with these operators at root should be boolean, not int!
C. R. Ramakrishnan
Last modified: Tue Mar 29 20:10:00 EDT 2016