ISE 390 -- Programming Challenges -- Week 13 Notes This Week's Goals: To review some basic principles of grammars and parsing. Administrative Notes: Please email me all the programs you have worked on since the previous submission (or if you prefer just send the whole semesters programs) to me (skiena@cs.sunysb.edu) as a tar, shar, or zip file. Final Exam: I would like to have a genuine, timed, team programming contest as the course final exam. The date of the final is Thursday 5/10 from 11AM-1:30PM. Is there any chance we can get a longer slot? The previous exam period is (TuThu 9:50-1110AM), the subsequent period is (TuThu 2:20-3:40PM) I want to assign students to teams so as to practice for the contest next year (i.e. non-graduating students together) The Language Hierarchy In an automata theory course, you learn about different types of mathematical machines to recoginize classes of languages: finite state automata: a graph of states with transitions out from each state based on the next input symbol. regular expressions: sequences which allow controlled repetition and alternation. For example: (a+b)* - all sequences on {a,b} (b*ab*ab*)* -- all sequences with an even number of a's These are exactly as powerful as finite state automata. context free grammars: collections of rules where the symbol on the left hand side is defined by a sequences of symbols on the right hand side. there are divisions among context-free grammers based on how many symbols you must "look ahead" to resolve the structure context sensitive grammers, which have multiple symbols on the left hand of each rule, are much harder to recognize. Parsing context free grammars Recursive descent parsers mirror the recursive structure of a grammar, consisting of one routine for each non-terminal symbol which look ahead in the text to decide which rule to apply next. Not all grammars can be parsed in such a top-down recursive way, but the simple ones you are likely to encounter likely will be parse-able by recursive descent. Sample grammar to recognize regular expressions: ::= | + ::= | ::= () | v | * | v* where v denotes any single letter terminal. Code (from Sedgewick) expression() { term(); if (p[j] == '+') { j++; expression(); } } term() { factor(); if ((p[j] == '(') || letter(p[j])) term(); } factor() { if (p[j] == '(') { j++; expression(); if (p[j] == ')') j++; else error(); } else if (letter(p[j])) j++; else error(); if (p[j] == '*') j++; } /* The parse tree of (A*B+AC)D */ expression term factor ( expression term factor A * term factor B + expression term factor A term factor C ) term factor D Ambiguities and other trouble Certain grammars are ambiguous, meaning there are more than one ways to parse it. Beware. Precidences among symbols (e.g. * being higher than +) can be specified with the proper grammar. So can left or right associative precidences. In a recursive descent parser, decisions are made on observed terminals. We may have to look ahead more than one rule to decide. For example, the term routine above. Rules which are left recursive are big trouble, meaning you can't apply recursive descent: badexpression(); { if (letter(p[j])) j++; else { badexpression(); if (p[j] == '+') { j++; term(); } else error(); } } Assigned Problems: 172 -- Parse a simple calculator language and interpret it. 271 -- Parse a recursively defined langauge, but write the grammar first. 310 -- Can a given string be derived from a set of simple productions? Can you find a brute force solution, since the strings are small? Is there a more elegant solution? 327 -- Parse and evaluate a more substantial (C-like) calculator language, where ++ and -- operators require some lookahead. 384 -- Recognize a set of strings described (implicitly) by a regular expression or simple grammar. Make the expression explicit and try recursive descent parsing.