CSE-304 Programming Project #3

Symbol Table Class

Read the following description carefully before starting on the assignment. Also, go through the brief overview of the project path that describes the steps involved. A list of what materials are available and what you are expected to submit are also given. Finally, make sure you understand and follow the submission requirements.

Description:

In this assignment, you will build the symbol table class for the Decafcompiler.

The symbol table class is used throughout the compiler to hold information about the various entities in the source program (classes, instances, methods, variables etc.) and their properties (types, arguments, visibility, memory requirement etc.). This information is collected incrementally and used by various phases of the compiler. The symbol table keeps track of the associations of names in the program with the entities they denote; these associations are established (by the parser and later phases) based on the syntactic structure of the program.

The symbol table you will build will have two main components, the Name Table and the Entity Table. The name table is used to uniquely identify different names that occur in the program, and each entity in the program is represented by a unique entry in the entity table. Note that symbols with the same name may refer to different objects (methods, classes, variables, etc.). Each distinct use of a symbol, however, refers to a distinct entity, and is associated with a different entry in the entity table. The following example will make this point clearer. Consider the program:

public class Example1{
  int f;
  String s;

  int f(int arg) {
    String s;
    return arg + 1;
  }
}

In the above program, the symbol f is used to represent two distinct entities: a variable and a method. While there is only one entry for the symbol f in the name table, the variable f and the method f are two distinct entities. The two entities are distinguished by what we call as the kind of the name: whether it is a variable, method, or class name. Moreover, observe the two variables named s in the above program. Both variables have the same name (even the same type) but refer to two different entities: the first occurrence is a variable in class Example1 while the second occurrence is a local variable in method f.

Thus, the same name may denote different entities in different scopes (e.g., method declarations) and contexts (i.e., the kind of the name). We want to maintain a unique representation for each name throughout the compilation. At the same time, we want to ensure that while the parser moves around in different scopes, each name is associated with the correct entity. This association is maintained by a careful implementation of two abstract datatypes Name Table and Entity Table described below. Please note that you may implement the two datatypes in any way you see fit. For an efficient implementation of Name Table, you may use hashing techniques described in Chapter 7.6 of the text book. The Name Table datastructure is completely internal to the symbol table class: it is not directly used from the outside; it is described below solely as a hint for coding. An efficient datastructure for implementing entity tables, which you may find useful in your implementation, is also sketched below. Look also at the symbol table slides from the class. They are on the web right now.

Name Table

The Name Table entries have class NameBlock. They are used to access the entity blocks associated with a given name. A NameBlock holds a name (a string), a reference to the "current" entity block associated with that name and a reference to the next name block in the same hash list..

The NameTable class exports the following method to the SymbolTable:

NameBlock name2nameBlock(String): Given a string find the corresponding entry (NameBlock) in the name table.

If the string does not exist in the table, create a new entry and add it in to the table.

Implementation note: Given a name, we need a mechanism to quickly access its entry in the name table. A hash table is traditionally used for this purpose. See pages 433--438 of the text book (Aho/Sethi/Ullman) for a description of hash tables.

The text book goes into a little bit of detail about hash functions and how to select them. Note that your projects will not be graded on efficiency, but only on correctness (and, if you want partial credit, readability). So, choose a simple hash function. (In fact, you may implement the name table in any way you want; the hashing scheme is only a suggestion.) A simple hash function is obtained by adding the ASCII values of all characters in the given string (ignoring overflows) and then taking the modulus of the sum with the size of the hash table.

Entity Table

The Entity Table entries have the class EntityBlock. Each is associated with an entity declared in the program (variables, methods classes, etc.). We will recognize four kinds of entities: variables, methods, classes and labels. Other information to be stored in an EntityBlock is suggested below.

The SymbolTable class exports the following methods to manipulate the EntityTable and NameTable:

1. EntityBlock lookup(String, EntityKind, Integer): find an entity accessible from the current scope.
2. EntityBlock insert(String, EntityKind): create a new entity block in the current scope, with the given name and kind. Note: this function does not check for duplicate entities in the same scope. We will assume that the rest of the compiler will invoke this function only after ensuring (using lookup) that duplicate entries will not be created.

First parameter: the name of the entity. Type in.

Second parameter: the kind of the entity. Type in.

Return value: pointer to the created entity block.
Clearly, the parser needs to inform the symbol table class whenever a new scope is entered or exited. The parser does so by using the two functions:
3. void enterBlock(): invoked whenever the compiler enters a new scope.
4. void leaveBlock(): invoked when the compiler leaves a scope (and reenters the outer one).

As you can see, the entity blocks are stored in a relatively complex data structure. The symbol table class must provide a function constructor that the rest of the compiler can use to initialize any part of the entity datastructure that needs initialization at the beginning of compilation.

Suggested Implementation

At first sight, implementing the above functions may appear daunting. However, armed with knowledge of elementary data structures (stacks, lists and trees) we can implement the symbol table class with very little (and relaively straightforward) code. The following describes a suggested implementation strategy. Please note that this is only a suggestion; you may implement the symbol table class in any manner you feel best, as long as it respects the symbol table interface.

The scopes defined by a program can be viewed as a tree, with the outermost scope as the root, the next inner scopes as its children and so on. We will call such a tree as a scope tree. The entities accessible from any given scope, say s, are only those that are defined in s itself, or a scope that is an ancestor of s in the scope tree. In particular, to find an entity of a given a name and kind, we can search for definition of such an entity in the current scope, and if not found, progressively search in the parent scope, grandparent scope and so on until the root scope is reached. If such an entity was not found even upon reaching the root, then no entity of that given name and kind is accessible from the current scope.

This tree of scopes can be built incrementally. We can first ensure that, at any given point, the entry for a name in the name table points to the entity of that name (any kind) that is defined in the nearest enclosing scope. That entity, in turn, points to the entity of the same name (again, any kind) that is defined in the nearest enclosing scope, and so on. What we now have is a stack of entities, all having the same name, that were defined in all the enclosing scopes. We call this the same-name stack. Entities defined closer to the current scope are nearer to the top of the same-name stack than entities defined in outer scopes. Clearly, any new entity that is added is added to the top of the corresponding same-name stack. Using a simple list traversal, we can find if there is an entity of a given name and kind accessible from the current scope.

To distinguish between entities defined in the current scope and those defined in the enclosing scopes, we can associate a level number with each scope, and mark each entity with the level number of scope in which it was defined. The outermost scope (the entire file) has level number 0, the next inner scope has level number 1 and so on. Note that a level number does not uniquely identify a scope, but only specifies the depth of nesting of a scope with respect to the outermost scope. We can easily maintain the current level number by incrementing and decrementing a (global) counter when enterScope() and leaveScope() are invoked.

Observe that entities declared in a scope are not accessible from the outer (parent) scope. Hence, when we leave a scope, we have to discard the entities declared in this scope from the sameName stack. This can be efficiently done by keeping all entities that are declared in the same scope in a list, called the sameScope list. When leaving a scope, we can simply traverse the sameScope list, and remove each element of sameScope list from the corresponding sameName stack. An important point must be noted: the entities declared in the scope that we are leaving must be removed only from the sameName stacks, and must not be deallocated (free-ed)! A multi-pass compiler may enter the same scope in different passes, so we need to preserve the entities declared in each scope, even if they are not accessible from the current scope. We will return to this point later in the course, but for now, just make sure that the entity blocks live on even after the scope of their definition is exited.

Using the sameScope list, we know which entities were declared in the current scope. How do we preserve this information when we enter and later leave new (children) scopes? Elementary data structures to the rescue! We can maintain the head of sameScope list on a stack, called the scopeStack. The top of the scope stack points to the head of sameScope list of the current scope, the next element of the scope stack points to the head of the sameScope list of the parent scope and so on. When a new scope is entered, a NULL element is pushed on the scope stack (since no entities have been declared yet in the new scope). When leaving a scope, the top element is simply popped. (Remember to do so only after removing each element in the sameScope list from the corresponding sameName stack.) The scope stack itself can be implemented using an array of EntityBlock pointers, and using the level number counter as the stack pointer.

Note that the only one-time initialization needed is for the level number counter. All other initialization can be performed on the fly by ensuring that whenever a new entity block is allocated.

The data structures that I used (following the scheme outlined above) looks like:

class NameBlock
{
  String id;
  EntityBlock entity;
  NameBlock next;
}

class EntityBlock
{
  NameBlock name;
  EntityKind kind;
  int nestingLevel;
  EntityBlock sameName;   /* Next entity of same name */
  EntityBlock sameScope;  /* Next entity created in same scope */
}

Grading

If your program can correclty handle any sequence of lookup and insert operations within one block (i.e., only one enterScope() in the beginning and one leaveScope() at the end), you will get 65% of the points. If you can correctly handle lookups and inserts in one nested sequence of enter- and leave- blocks, (i.e., no leaveScope()s before an enterScope()) you will receive 90%. You will get full credit if you can correcty handle any sequence of operations, as long as enterScope()s and leaveScope()s are paired (i.e., at any point there are no more leaveScope()s than enterScope()s).

Available Material

Sample input file in ~cse304/pub/sym/tests.

Deliverables

The parser handin must contain:

the source code for symbol table class that implements the 5 methods described above.
a "DecafParser" demo driver that reads commands from the standard input and updates the symbol table. While reading input, the demo should dump the created and updated symbol table objects on the standard output.
a Makefile that builds the class and links it with the provided driver, and if needed the list library. The Makefile must create an executable called ``DecafParser''.
Any other files that are necessary to successfully recompile your program. Please read carefully the submission requirements below.

Submission Requirements

Please note the following submission requirements. These are not simply conventions, but are requirements. They make grading of the projects easier, and lets us spend more time wading through the code to award partial credit! They are not difficult to follow, so failure to do so will result in severe penalties (project may not be graded!). If you have trouble with the set-up, post your problem to sbcs.cse304.

The Makefile must be such that it can successfully recompile your program with only the files you submit.
Please make sure that you submit all source files needed to compile your program. If you use any of the common classes (those for the earlier assignments, or the driver class), you MUST ensure that your Makefile refers to these common files. Please note this point very very carefully. The handin program will not transmit .class files (look carefully at its output to check if all files have been submitted). Again, if you have any trouble with Makefile (how they work etc. etc.) done hesitate to ask me or the TA (better yet, post it in the newsgroup).
Make sure that there are no absolute references in your Makefile to files in your own directory. Remember: the files you submit are packaged and sent to a grading account; so the files you submit must be capable of building a stand-alone system.
Here is a suggestion for making sure that you submit all the necessary files and that `make' will work.

        $ mkdir Handin
        $ ...copy all the files to be submitted into Handin ...
        $ cd Handin
        $ make        # check that all goes well; if so, proceed
        $ make clean  # remove unnecessary .class files
        $ handin sym *

If your submission is in multiple directories, there MUST be a Makefile at the top level, and it must generate the executable demo also at the top level.
Please note: The Makefile MUST SUCCESSFULLY MAKE demo WITHOUT REQUIRING ANY ADDITIONAL FILES. OTHERWISE, FOR YOUR PROJECT MAY NOT BE GRADED!

Radu Grosu