CSE 304/504: Compiler Design

Spring 2016

Homework 1

Due: Fri. Feb 12, Mon. Feb 15, 2015


This homework has two related parts. The first part is to be done by CSE 504 as well as 304 students. The second part is for CSE 504 students only.

Part 1: SSM Interpreter:

In the first part, you are expected to implement, in Python, an interpreter for SSM, a simple stack machine-based assembly language. Your program should take input from standard input and write output to standard output.

The program's input consists of a sequence of instructions for a stack machine. This machine has no general purpose registers. It has two separate memory areas: the first, called the store that is directly addressed; and the second, called the stack that holds operands for operations. There are two kinds of machine instructions: load/store to move values between the two memory areas, and the rest which manipulate the top (few) entries on the stack. The machine manipulates only integer data.

The assembly language consists of the following instructions.

An assembly language program is a sequence of instructions. Each instruction may be preceded by an optional label and a semi-colon (":"). A label is a sequence of alphabetic characters or numeric characters or underscore ("_"), beginning with an alphabetic character. An integer num is a sequence of numeric characters, optionally preceded by a minus ("-"). The instructions in the assembly language should always be in lower-case.

Note that instructions ildc, jz, jnz and jmp take one argument; the other instructions have no arguments.

The label may be separated from the following instruction by a sequence of zero or more whitespace (blank, tab, newline) characters. An instruction and its argument will be separated by a sequence of one or more whitespace characters. In the examples below, each instruction is on a line of its own; in general, though, there may be more than one instruction in a line; or a single instruction may be split over multiple lines.

The input assembly program may also have comments. A comment begins with "#" symbol and ends at the end of the line.

When a program is evaluated, it starts with an empty stack. Program evaluation continues until the last instruction in the program is evaluated. When an input program is completely evaluated, the interpreter should print the top-most value on the stack and exit.

If an input program has errors (e.g., improperly formed labels, invalid instruction, invalid format for numbers, labels that appear in destinations of jumps but not defined elsewhere, etc.), the interpreter should give an error message and exit without executing a single instruction. The details (i.e. the source of the error, line number, etc.) in the error message are up to you. All I expect is that your interpreter should at least say that there is some error and exit instead of trying to execute an erroneous program.

Similarly, when executing a program, if the program tries to access values in an empty stack, the interpreter should signal an error and exit; and if the program tries to access a value in a cell in store without initializing the cell first, it should signal an error and exit. Again, you can choose the error message format.

Part 2: SC Compiler

SC is a simple calculator language. An SC program consists of a sequence of assignment statements. Values are stored in variables (as in any procedural language). For this assignment, SC manipulates only integer values. As stated earlier, an SC program is a sequence of statements. There is no need to complicate SC with comments and other kinds of statements at this time.

Your task in Part 2 is to write a compiler that translates SC programs to SSM programs. Your compiler should take, from standard input, text representing an SC program. It should then output, on standard output, an SSM program that performs the same computation as the SC program.

Static Semantics: Your compiler will check if the input program is syntactically valid. It should also check that every variable is initialized before it is used. It should end with appropriate error messages if the above two constraints are violated. It should output a valid SSM program if the above two constraints are satisfied.

Examples:

Project Path:

Work on SSM first. Initially assume that your input will always be only valid (error-free) programs, and that too, without labels. Example 1 above is one such program.

Second, consider SSM programs without labels that may have errors. Now you have to first check for errors before you try to execute an input program. Modify your code to do this.

Only after you have successfully completed the first two steps, consider programs with labels and comments.

After finishing SSM, take up SC. Again, start with assuming that the input is always a valid SC program (i.e. syntactically correct and does all variables are initialized before use). Work on signalling syntax errors next, and finally, work on checking for initialization.

Grading:

I am not asking for extensive documentation, but it must be easy to understand what is going on from your code and associated comments.

Note that CSE 304 students are expected to submit only Part 1, which means they will be graded out of 20 points. Howeve, CSE 504 students are expected to submit both parts, so their maximum grade will be 30 points.

Submission:

Submit the assignment by placing the program files (one for each part) in HW1 folder in your local homework working copy. Tag each submission with a tag such as "s1", "s2" etc. referring to submitted versions. Commit and push these folders to the remote git repository. We will grade the latest version unless you specifically request an earlier version to be graded.

Submissions received before end of day (11:59pm) on the due date will be considered on time. Submissions done a little after midnight may also be considered without penalty, at the instructor's and grader's discretion.


Change Log:


C. R. Ramakrishnan
Last modified: Wed Feb 10 20:08:51 EST 2016