ISE 390 -- Programming Challenges -- Week 13 Notes
This Week's Goals:
To review some basic principles of grammars and parsing.
Administrative Notes:
Please email me all the programs you have worked on since the
previous submission (or if you prefer just send the whole semesters
programs) to me (skiena@cs.sunysb.edu) as a tar, shar, or zip file.
Final Exam:
I would like to have a genuine, timed, team programming contest
as the course final exam. The date of the final is Thursday 5/10
from 11AM-1:30PM. Is there any chance we can get a longer slot?
The previous exam period is (TuThu 9:50-1110AM), the subsequent
period is (TuThu 2:20-3:40PM)
I want to assign students to teams so as to practice for the
contest next year (i.e. non-graduating students together)
The Language Hierarchy
In an automata theory course, you learn about different types
of mathematical machines to recoginize classes of languages:
finite state automata: a graph of states with transitions
out from each state based on the next input symbol.
regular expressions: sequences which allow controlled repetition
and alternation. For example:
(a+b)* - all sequences on {a,b}
(b*ab*ab*)* -- all sequences with an even number of a's
These are exactly as powerful as finite state automata.
context free grammars: collections of rules where the symbol on
the left hand side is defined by a sequences of symbols
on the right hand side.
there are divisions among context-free grammers based
on how many symbols you must "look ahead" to resolve the
structure
context sensitive grammers, which have multiple symbols
on the left hand of each rule, are much harder to recognize.
Parsing context free grammars
Recursive descent parsers mirror the recursive structure of
a grammar, consisting of one routine for each non-terminal symbol
which look ahead in the text to decide which rule to apply next.
Not all grammars can be parsed in such a top-down recursive way,
but the simple ones you are likely to encounter likely will be
parse-able by recursive descent.
Sample grammar to recognize regular expressions:
::= | +
::= |
::= () | v | * | v*
where v denotes any single letter terminal.
Code (from Sedgewick)
expression()
{
term();
if (p[j] == '+')
{ j++; expression(); }
}
term()
{
factor();
if ((p[j] == '(') || letter(p[j])) term();
}
factor()
{
if (p[j] == '(')
{
j++; expression();
if (p[j] == ')') j++; else error();
}
else if (letter(p[j])) j++; else error();
if (p[j] == '*') j++;
}
/* The parse tree of (A*B+AC)D */
expression
term
factor
(
expression
term
factor A *
term
factor B
+
expression
term
factor A
term
factor C
)
term
factor D
Ambiguities and other trouble
Certain grammars are ambiguous, meaning there are more than one
ways to parse it. Beware.
Precidences among symbols (e.g. * being higher than +) can be
specified with the proper grammar. So can left or right associative
precidences.
In a recursive descent parser, decisions are made on observed
terminals. We may have to look ahead more than one rule to decide.
For example, the term routine above.
Rules which are left recursive are big trouble, meaning you can't
apply recursive descent:
badexpression();
{
if (letter(p[j])) j++; else
{
badexpression();
if (p[j] == '+') { j++; term(); }
else error();
}
}
Assigned Problems:
172 -- Parse a simple calculator language and interpret it.
271 -- Parse a recursively defined langauge, but write the grammar first.
310 -- Can a given string be derived from a set of simple productions?
Can you find a brute force solution, since the strings are small?
Is there a more elegant solution?
327 -- Parse and evaluate a more substantial (C-like) calculator language,
where ++ and -- operators require some lookahead.
384 -- Recognize a set of strings described (implicitly) by a
regular expression or simple grammar. Make the expression
explicit and try recursive descent parsing.