Read the following description carefully before starting on the assignment. Also, go through the brief overview of the project path that describes the steps involved. A list of what materials are available and what you are expected to submit are also given.
In this assignment, you will complete intermediate code generation for Decaf. Recall the description of Decaf's syntax is given in its manual, the AST you built in HW #3, and the type checker you built in HW #4. This assignment will be based on the AST obtained after type checking and name resolution. From the set of classes in the AST, you will generate intermediate code for a register machine.
As in the past, you will produce a compiler front end that will take a single command-line argument, which is a file name; if that file contains a syntactically valid Decaf program, then it will construct an AST, perform type checking and name resolution. If there are no errors until this stage, the compiler than generates code for the target register machine, and prints the code out (see section on printing).
When a procedure is invoked, its arguments should be supplied in a0, a1, ... (upto the number of arguments). The caller should assume that the values in these registers may be updated by the called procedure; so it should not assume that the values in these registers will be preserved. Register a0 will be used to communicate the return value; so the callee should update a0 with the computed return value before returning to the caller.
For procedure calls, the caller should assume that the values in these registers may be updated by the called procedure.
The machine has instructions to save and restore register values on an internal stack. All arithmetic, comparison and branch instructions in the machine refer to argument or temporary registers.
In addition to these two sets of registers, the machine has one special-purpose read-only register, called the "static area pointer" (sap). This register holds the base address of for all static fields; the static fields themselves are accessed at specific offsets from sap. The sap register is set by the machine when it loads a program.
Abstract Machine Instructions:
In the following, r represents an argument or temporary
register; i represents an literal integer value, f
represents a literal floating point value, l represents a
symbolic label.
There is a special label attached to the first instruction in each method. This label will be of the form "M_%s_%d", where the %s argument will be the name of the method, and %d will be the unique id of the method. Note that method ids are unique across the entire program. For example, consider a method fie with id 12. Then the first instruction of the method will have label M_fie_12, and any call to the method will generate the instruction call M_fie_12.
The translated abstract program should follow the static single assignment form, with the following structure. That is, no register can be a target in more than one instruction in a function.
phi tn, < sequence of predecessor blocks>, < sequence of registers in predecessor blocks>where tn is the target register.
For instance, consider registers t3 being assigned in two different blocks, labelled L1 and L2. Let t3 be used in a block labeled L3 that is a successor of L1 and L2. One way to make turn this into SSA form is to rename t3 in its definition (and subsequent uses) in blocks L1 and L2. Let's say we rename t3 to t1 in L1 and t2 in L2. We will then introduce a φ-node in L3 to merge these two values:
phi t3, [L1, L2], [t1, t2]Based on SSA form, this instruction says that t3 takes its value from t1 if we came from block L1; and t2 if we came from L2.
The SSA form is in preparation for the next homework where data flow
analysis, register allocation and optimization will exploit it.
Storage Allocation
For the purposes of this homework, you may assume that static methods do not refer to "this".
Each local variable or an intermediate value computed when evaluating an expression is kept in a temporary register.
Consider an object a of class C. Object a contains a unique location for all non-static fields in class C as well as all its superclasses. The object layout is as follows. Let B be a base class (with no super classes), with two non-static fields w and x. Then w and x are given unique offsets, starting at 0, so that the fields can be accessed from the base address of B objects. Let the offsets of w and x be 0 and 1 respectively. Whether the fields x and y are public or not does not matter. Let C be derived from B and have an additional non-static field y. Fields defined in C are given offsets that follow all fields in B (regardless of whether they are public or not). Hence y should be at offset 2. Moving further, let D be derived from C with two new fields, one called z, and another coincidentally called x. D's fields will have offsets following all fields of C. Hence field z will be at offset 3 and field x of D will be at offset 4. For computing the offsets and determining the layout, it does not matter whether a field is private or public.
The abstract machine has a special-purpose register called sap, the static area pointer, that holds the reference to the base of static area. All static fields should be allocated in the static area, with each field having a unique offset. It does not matter if the field is public or private for the purposes of allocation.
For instance, let there be two classes B, a base class, and C, a class derived from B. Let B have two static fields x and y defined in it; and C have two static fields y and z defined in it. Then the four fields will be allocated in the static area with directive ".static_data 4"; x and y of B will be at offsets 0 and 1, respectively; and y and z of C will be at offsets 2 and 3, respectively.
Continuing with the above example, let all the static fields in the example be public, and consider an assignment in a source program of the form "C.x = C.y + C.z;". The following abstract machine code will be a valid translation of the assignment:
move_immed_i t0, 2 # Offset of y in C hload t1, sap, t0 # t1 = C.y move_immed_i t2, 3 # Offset of z in C hload t3, sap, t0 # t3 = C.y iadd t4, t1, t3 move_immed_i t5, 0 # Offset of x in C (which is x in B) hstore sap, t5, t4In the above, "#..." is used to represent comments in order to explain the code.
For instance, consider a constructor of the form A(int i). This constructor will be compiled to an initializer function that takes a reference to a newly created object as a0, the value of formal parameter i as a1, executes the code corresponding to the body of the constructor. The first instruction in this sequence will be labeled as C_%d, where %d is the unique id of the constructor. Recall that constructor ids are unique across the whole program. Initializer functions return nothing.
Code for new object creation will have two parts. The first part should create an object of the correct size on the heap (using the halloc instruction). The second should call the initializer method with appropriate arguments. For instance, let the instances of the above class A have 5 instance fields (note: this includes non-static fields of superclasses, if any). Let the id of the matching constructor be 7. Then the expression new A(2) may be translated to the following sequence of machine instructions:
move_immed t0, 5 halloc t0, 5 # allocate 5 cells for object move_immed_i t1, 2 # Actual argument save a0 # save current argument registers save a1 save t0 # save reference to new object move a0, t0 move a1, t1 call C_7 # call initializer function restore t0 # restore original register values restore a1 restore a0Note: the above code is to illustrate the use of the initializer function; the saves are restores may be necessary only in certain contexts.
Print one instruction per line. Each instruction will be its opcode, followed by the sequence of arguments. Include at least one space between opcode and the arguments. Arguments shall be comma separated. The entire instruction--- opcode and all arguments--- should be placed in a single line; an instruction shall not spill over two lines.
Labels of instructions, if any shall be printed on a line of their own. Label of an instruction shall be followed by a ":". For instance, an "iadd" instruction with label "L12" will be printed as in:
L12: iadd t3, t1, t2If an instruction has more than one label, each label shall be printed on a line of their own.
Abstract machine programs may have single-line comments, starting with "#" and ending at the end of line. Comments may appear at end of instructions or as a line of their own.
Note: This may be folded into ast.py as long as readability is maintained.
Note: This may be folded into ast.py as long as readability is maintained.
decafc takes a single command line argument, with the name of the Decaf program to be translated. For example, if the Decaf program is in fum.decaf, then the invocation will be as
python decafc.py fumor
python decafc.py fum.decafThe translated machine instructions will be written to a file with the same base name with ".ami" extension (fum.ami in the above example).
Note the change of main top-level. It is no longer decafch; we are compiling not doing checking alone anymore.
As usual, complicated programs may not get the full grade unless accompanied by comments documenting it.
We will assume that HW5 will be submitted by the same teams that submitted HW4. If there is any change in the team structure, you must inform the instructor the Monday before the submission deadline and get the submission repositories for the new teams set up.
You may submit the homework multiple times; the last submission will be graded.