CSE307
Spring 2009
Stony Brook
Principles of Programming Languages
Annie Liu
Assignment 2
Handout A2
Feb. 10, 2009
Due Feb. 24

Syntax Analysis --- Reading Rules

This assignment has three goals: (1) experiencing with parsing using a parser generator, (2) writing programs in Python and comparing the use of Python and a parser generator with what you did in Assignment 1, and (3) parsing the security policy rules for a proposed national electronic health record (EHR) service. For our assignments, we will use Python 2.6 instead of Python 3.0.

Parsing security policy rules

The EHR policy rules are in four files: spine, pds, hospital, and ra. These rules are more complicated than logic rules of the form

p0(args0) <- p1(args1), ..., pk(argsk) //P0(args0) holds if p1(args1) through pk(argsk) hold
in two ways. First, a predicate p may be of the form loc@iss.p, meaning predicate p of issuer iss at location loc. Second, a hypothesis may not only be of the form p(args) but also be a constraint on integers, ranges, or sets. Note that a hypothesis or conclusion of the form p(args) is called an atom. These four files are from Moritz Becker (slightly modified to remove a few syntax errors); these rules are a result of his dissertation work at Cambridge.

You may copy these four files into your working directory for this assignment.

Using a parser generator

TPG is a little parser generator for Python. While you can get everything about it on its homepage, you only need the file tpg.py to run an actual parsing program.

For parsing the EHR policy rules, you can use the file ehrparse.py, which contains a TPG grammar and classes for representing the parsed policy rules. This file was written by Tuncay Tekle (then cleaned up in various ways), as part of one of my research projects at Stony Brook.

You may copy both Python files above into your working directory. Then, in another Python file, you may write

import ehrparse
def count():
    rules = ehrparse.parse()
    print 'number of rules:', len(rules)
When count() is called, it will print the number of rules parsed.

To show that you understand parsing and the objects built from parsing the EHR policy, your are asked to write Python code to count the following
1. the number of rules, as shown above;
2. the number of facts, i.e., rules that have an empty list of hypotheses;
3. the number of different predicates that appear in the conclusion atoms of rules;
4. the number of different predicates that appear in the hypothesis atoms of rules;
5. the number of remote atoms that are used in hypotheses of rules.

Programming in Python and using TPG

Write a program to do the task of reading facts as in Assignment 1, but use Python and TGP. Try to write the simplest, most clear program.

Extra credit suggestions

Here are some ideas. (1) Write items 1 to 5 using comprehensions instead of loops. (2) Built on items 3 and 4 above, sort all different predicates that appear in rules, and print the sorted list. (3) Contrast to item 5 above, count the number of different remote atoms that are used in hypotheses of rules. (4) Think of other interesting and/or useful things to analyze about the EHR policy, and do them; feel free to discuss with me. (5) Think of ways to improve the EHR grammar, and do them; again, feel free to discuss with me.


Handins

Hand in everything electronically, using Blackboard, by 5pm on the due date. Your handin should include your README file, code, test data, and anything else you have to show your work.

Grading

This assignment will be graded based on 100 points, allocated in proportion to the estimated amount of work for each part. You may do this assignment in a team of two people; the two people will receive the same points. Exceptionally well thought-out and well written work will receive appropriate extra credit. Extra credit work will be graded based on the estimated amount of extra work.