The symbol table class is used throughout the compiler to hold information about the various entities in the source program (classes, instances, methods, variables etc.) and their properties (types, arguments, visibility, memory requirement etc.). This information is collected incrementally and used by various phases of the compiler. The symbol table keeps track of the associations of names in the program with the entities they denote; these associations are established (by the parser and later phases) based on the syntactic structure of the program.
The symbol table you will build will have two main components, the Name Table and the Entity Table. The name table is used to uniquely identify different names that occur in the program, and each entity in the program is represented by a unique entry in the entity table. Note that symbols with the same name may refer to different objects (methods, classes, variables, etc.). Each distinct use of a symbol, however, refers to a distinct entity, and is associated with a different entry in the entity table. The following example will make this point clearer. Consider the program:
In the above program, the symbol f is used to represent two distinct entities: a variable and a method. While there is only one entry for the symbol f in the name table, the variable f and the method f are two distinct entities. The two entities are distinguished by what we call as the kind of the name: whether it is a variable, method, or class name. Moreover, observe the two variables named s in the above program. Both variables have the same name (even the same type) but refer to two different entities: the first occurrence is a variable in class Example1 while the second occurrence is a local variable in method f.public class Example1{ int f; String s; int f(int arg) { String s; return arg + 1; } }
Thus, the same name may denote different entities in different scopes (e.g., method declarations) and contexts (i.e., the kind of the name). We want to maintain a unique representation for each name throughout the compilation. At the same time, we want to ensure that while the parser moves around in different scopes, each name is associated with the correct entity. This association is maintained by a careful implementation of two abstract datatypes Name Table and Entity Table described below. Please note that you may implement the two datatypes in any way you see fit. For an efficient implementation of Name Table, you may use hashing techniques described in Chapter 7.6 of the text book. The Name Table datastructure is completely internal to the symbol table class: it is not directly used from the outside; it is described below solely as a hint for coding. An efficient datastructure for implementing entity tables, which you may find useful in your implementation, is also sketched below. Look also at the symbol table slides from the class. They are on the web right now.
The NameTable class exports the following method to the SymbolTable:
NameBlock name2nameBlock(String): Given a string find the corresponding entry (NameBlock) in the name table.
If the string does not exist in the table, create a new entry and add it in to the table.
Implementation note: Given a name, we need a mechanism to quickly access its entry in the name table. A hash table is traditionally used for this purpose. See pages 433--438 of the text book (Aho/Sethi/Ullman) for a description of hash tables.
The text book goes into a little bit of detail about hash functions and how to select them. Note that your projects will not be graded on efficiency, but only on correctness (and, if you want partial credit, readability). So, choose a simple hash function. (In fact, you may implement the name table in any way you want; the hashing scheme is only a suggestion.) A simple hash function is obtained by adding the ASCII values of all characters in the given string (ignoring overflows) and then taking the modulus of the sum with the size of the hash table.
The SymbolTable class exports the following methods to manipulate the EntityTable and NameTable:
The scopes defined by a program can be viewed as a tree, with the outermost scope as the root, the next inner scopes as its children and so on. We will call such a tree as a scope tree. The entities accessible from any given scope, say s, are only those that are defined in s itself, or a scope that is an ancestor of s in the scope tree. In particular, to find an entity of a given a name and kind, we can search for definition of such an entity in the current scope, and if not found, progressively search in the parent scope, grandparent scope and so on until the root scope is reached. If such an entity was not found even upon reaching the root, then no entity of that given name and kind is accessible from the current scope.
This tree of scopes can be built incrementally. We can first ensure that, at any given point, the entry for a name in the name table points to the entity of that name (any kind) that is defined in the nearest enclosing scope. That entity, in turn, points to the entity of the same name (again, any kind) that is defined in the nearest enclosing scope, and so on. What we now have is a stack of entities, all having the same name, that were defined in all the enclosing scopes. We call this the same-name stack. Entities defined closer to the current scope are nearer to the top of the same-name stack than entities defined in outer scopes. Clearly, any new entity that is added is added to the top of the corresponding same-name stack. Using a simple list traversal, we can find if there is an entity of a given name and kind accessible from the current scope.
To distinguish between entities defined in the current scope and those defined in the enclosing scopes, we can associate a level number with each scope, and mark each entity with the level number of scope in which it was defined. The outermost scope (the entire file) has level number 0, the next inner scope has level number 1 and so on. Note that a level number does not uniquely identify a scope, but only specifies the depth of nesting of a scope with respect to the outermost scope. We can easily maintain the current level number by incrementing and decrementing a (global) counter when enterScope() and leaveScope() are invoked.
Observe that entities declared in a scope are not accessible from the outer (parent) scope. Hence, when we leave a scope, we have to discard the entities declared in this scope from the sameName stack. This can be efficiently done by keeping all entities that are declared in the same scope in a list, called the sameScope list. When leaving a scope, we can simply traverse the sameScope list, and remove each element of sameScope list from the corresponding sameName stack. An important point must be noted: the entities declared in the scope that we are leaving must be removed only from the sameName stacks, and must not be deallocated (free-ed)! A multi-pass compiler may enter the same scope in different passes, so we need to preserve the entities declared in each scope, even if they are not accessible from the current scope. We will return to this point later in the course, but for now, just make sure that the entity blocks live on even after the scope of their definition is exited.
Using the sameScope list, we know which entities were declared in the current scope. How do we preserve this information when we enter and later leave new (children) scopes? Elementary data structures to the rescue! We can maintain the head of sameScope list on a stack, called the scopeStack. The top of the scope stack points to the head of sameScope list of the current scope, the next element of the scope stack points to the head of the sameScope list of the parent scope and so on. When a new scope is entered, a NULL element is pushed on the scope stack (since no entities have been declared yet in the new scope). When leaving a scope, the top element is simply popped. (Remember to do so only after removing each element in the sameScope list from the corresponding sameName stack.) The scope stack itself can be implemented using an array of EntityBlock pointers, and using the level number counter as the stack pointer.
Note that the only one-time initialization needed is for the level number counter. All other initialization can be performed on the fly by ensuring that whenever a new entity block is allocated.
The data structures that I used (following the scheme outlined above) looks like:
class NameBlock { String id; EntityBlock entity; NameBlock next; } class EntityBlock { NameBlock name; EntityKind kind; int nestingLevel; EntityBlock sameName; /* Next entity of same name */ EntityBlock sameScope; /* Next entity created in same scope */ }
$ mkdir Handin $ ...copy all the files to be submitted into Handin ... $ cd Handin $ make # check that all goes well; if so, proceed $ make clean # remove unnecessary .class files $ handin sym *