CSE307 Spring 2009 Stony Brook |
Principles of Programming Languages
Annie Liu Assignment 2 |
Handout A2 Feb. 10, 2009 Due Feb. 24 |
Syntax Analysis --- Reading Rules
This assignment has three goals: (1) experiencing with parsing using a parser generator, (2) writing programs in Python and comparing the use of Python and a parser generator with what you did in Assignment 1, and (3) parsing the security policy rules for a proposed national electronic health record (EHR) service. For our assignments, we will use Python 2.6 instead of Python 3.0.
Parsing security policy rules
The EHR policy rules are in four files: spine, pds, hospital, and ra. These rules are more complicated than logic rules of the form
p0(args0) <- p1(args1), ..., pk(argsk) //P0(args0) holds if p1(args1) through pk(argsk) holdin two ways. First, a predicate
p
may be of the form
loc@iss.p
, meaning predicate p
of issuer
iss
at location loc
. Second, a hypothesis
may not only be of the form p(args)
but also be a
constraint on integers, ranges, or sets. Note that a hypothesis or
conclusion of the form p(args) is called an atom. These four
files are from Moritz Becker (slightly modified to remove a few syntax
errors); these rules are a result of his dissertation
work at Cambridge.
You may copy these four files into your working directory for this assignment.
Using a parser generator
TPG is a little
parser generator for Python. While you can get everything about it on
its homepage, you only need the file tpg.py
to run an actual parsing program.
For parsing the EHR policy rules, you can use the file ehrparse.py
, which contains a TPG
grammar and classes for representing the parsed policy rules. This
file was written by Tuncay Tekle (then cleaned up in various ways), as
part of one of my research projects at Stony Brook.
You may copy both Python files above into your working directory. Then, in another Python file, you may write
import ehrparse def count(): rules = ehrparse.parse() print 'number of rules:', len(rules)When
count()
is called, it will print the number of rules
parsed.
To show that you understand parsing and the objects built from
parsing the EHR policy, your are asked to write Python code to count
the following
1. the number of rules, as shown above;
2. the number of facts, i.e., rules that have an empty list of hypotheses;
3. the number of different predicates that appear in the conclusion atoms of rules;
4. the number of different predicates that appear in the hypothesis atoms of rules;
5. the number of remote atoms that are used in hypotheses of rules.
Programming in Python and using TPG
Write a program to do the task of reading facts as in Assignment 1, but use Python and TGP. Try to write the simplest, most clear program.
Extra credit suggestions
Here are some ideas. (1) Write items 1 to 5 using comprehensions instead of loops. (2) Built on items 3 and 4 above, sort all different predicates that appear in rules, and print the sorted list. (3) Contrast to item 5 above, count the number of different remote atoms that are used in hypotheses of rules. (4) Think of other interesting and/or useful things to analyze about the EHR policy, and do them; feel free to discuss with me. (5) Think of ways to improve the EHR grammar, and do them; again, feel free to discuss with me.
Handins
Hand in everything electronically, using Blackboard, by 5pm on the due date. Your handin should include your README file, code, test data, and anything else you have to show your work.
Grading
This assignment will be graded based on 100 points, allocated in proportion to the estimated amount of work for each part. You may do this assignment in a team of two people; the two people will receive the same points. Exceptionally well thought-out and well written work will receive appropriate extra credit. Extra credit work will be graded based on the estimated amount of extra work.