ISE 390 -- Programming Challenges -- Week 13 Notes

This Week's Goals: 

	To review some basic principles of grammars and parsing.

Administrative Notes:

	Please email me all the programs you have worked on since the
	previous submission (or if you prefer just send the whole semesters
	programs) to me (skiena@cs.sunysb.edu) as a tar, shar, or zip file.

Final Exam:

	I would like to have a genuine, timed, team programming contest
	as the course final exam.   The date of the final is Thursday 5/10
	from 11AM-1:30PM.   Is there any chance we can get a longer slot?
	The previous exam period is (TuThu 9:50-1110AM), the subsequent
	period is (TuThu 2:20-3:40PM)

	I want to assign students to teams so as to practice for the
	contest next year (i.e. non-graduating students together)


The Language Hierarchy

	In an automata theory course, you learn about different types
	of mathematical machines to recoginize classes of languages:

	finite state automata: a graph of states with transitions
		out from each state based on the next input symbol.

	regular expressions: sequences which allow controlled repetition
		and alternation.  For example:

			(a+b)*	- all sequences on {a,b}
			(b*ab*ab*)* -- all sequences with an even number of a's

		These are exactly as powerful as finite state automata.

	context free grammars: collections of rules where the symbol on
		the left hand side is defined by a sequences of symbols
		on the right hand side.

		there are divisions among context-free grammers based
		on how many symbols you must "look ahead" to resolve the
		structure

		context sensitive grammers, which have multiple symbols
		on the left hand of each rule, are much harder to recognize.
	

Parsing context free grammars

	Recursive descent parsers mirror the recursive structure of
	a grammar, consisting of one routine for each non-terminal symbol
	which look ahead in the text to decide which rule to apply next.

	Not all grammars can be parsed in such a top-down recursive way,
	but the simple ones you are likely to encounter likely will be
	parse-able by recursive descent.


Sample grammar to recognize regular expressions:

	<expression> ::= <term> | <term> + <expression>
	<term> ::= <factor> | <factor><term>
	<factor> ::= (<expression>) | v | <expression>* | v*

	where v denotes any single letter terminal.


Code (from Sedgewick)

expression()
  {  
    term();
    if (p[j] == '+') 
      { j++;  expression(); }
  }
 
term()
  {
    factor();
    if ((p[j] == '(') || letter(p[j])) term();
  }
 
factor()
  {
    if (p[j] == '(')
      {
        j++; expression();
        if (p[j] == ')') j++; else error();
      }
    else if (letter(p[j])) j++; else error();
    if (p[j] == '*') j++;
  }
 

/* The parse tree of (A*B+AC)D	*/

expression
   term
      factor
         (
         expression
            term
               factor   A   *
               term
                  factor   B
            +
            expression
               term
                  factor   A
                  term
                     factor   C
         )
      term
         factor   D
                  

Ambiguities and other trouble

	Certain grammars are ambiguous, meaning there are more than one
	ways to parse it.  Beware.

	Precidences among symbols (e.g. * being higher than +) can be
	specified with the proper grammar.  So can left or right associative
	precidences.

	In a recursive descent parser, decisions are made on observed 
	terminals.  We may have to look ahead more than one rule to decide.
	For example, the term routine above.

	Rules which are left recursive are big trouble, meaning you can't
	apply recursive descent:

badexpression();
  {
    if (letter(p[j])) j++; else
      {
        badexpression();
        if (p[j] == '+') { j++; term(); }
        else error();
      }
  }


Assigned Problems:

172 -- Parse a simple calculator language and interpret it.

271 -- Parse a recursively defined langauge, but write the grammar first.

310 -- Can a given string be derived from a set of simple productions?
	Can you find a brute force solution, since the strings are small?
	Is there a more elegant solution?

327 -- Parse and evaluate a more substantial (C-like) calculator language,
	where ++ and -- operators require some lookahead.

384 -- Recognize a set of strings described (implicitly) by a
	regular expression or simple grammar.   Make the expression
	explicit and try recursive descent parsing.