CSE 391
Spring 2004
Stony Brook
Web Queries: Methods and Tools
Annie Liu
Assignment 2
Handout A2
Feb. 3, 2004
Due Feb. 17

XML Data Description

This homework is on XML data description using DTD and XML Schema for a database application.

We consider the following kinds of data:

  1. person: name, organizational unit, email address, home page URL;
  2. funding: id, source, amount, start date, end date;
  3. project: id, name, description, members, fundings, products;
  4. product: id, name, form, date, authors;
  5. general description (may contain names of some person, project, or product);
with the following kinds of additional constraints:
  1. each person has a unique email address;
  2. each person belongs to exactly one organizational unit;
  3. each product has at least one author;
  4. each product of a project has at least one author that is a member of the project;
and consider the following kinds of queries:
  1. find the organization unit that has the most people;
  2. list all information of people who are not members of any project or authors of any product;
  3. find the total amount of fundings of a project (assuming that the amount is divided equally among multiple projects with the same funding);
  4. list names of all products of any project that has a funding with the most recent start date;
  5. find the source that funded every project whose all products' names are contained in the general description.
The complete application might have more than one kind of users, different kinds of transactions by different kinds of users, and GUIs, as treated in a standard database course; we ignore those in this course and focus on XML data description and manipulation.

For this assignment, you only need to work on data descriptions, using both DTD and XML Schema, not on the queries. Make as many constraints as possible, including necessary referential integrity constraints, be part of your data descriptions.

You also need to prepare sample XML data. It is your responsibility to come up with reasonable test data. If your data consists of just a few items that do not illustrate the schemas and constraints well, your grade will be affected accordingly.

Your XML data must conform to your data descriptions, and you must try to validate it using XMLSPY (installed on machines in the translab) or other tools that you can find. To reduce some work, you may use such a tool to generate data descriptions from XML data and convert between the descriptions as much as possible.

Besides your DTD, XML Schema, and sample XML data, you should also document the effort and result of your design (including important considerations and comparison among alternatives that you thought of, if any) as well as your effort and results of using XMLSPY or other tools.


The course projects, starting with this homework, are to be done in teams of two. Please form teams as soon as possible and by Thursday Feb. 5. After two of you decide to form a team, email me both names and email addrs, and I will post them with our wish list (of handins from assignment 1) that contains a list of all students. Those who do not yet have a teammate may look for one based on the list.


Before class on the due date, send your completed homework to cse391@cs.sunysb.edu (not to the instructor or TA). If there is more than one file in your homework, please zip the files before you send. There should be one file (either the README file, or the single file for your homework) that contains a description of where is what and instructions for how to test your homework.

Also, in class, hand in a printout that contains the documentation part of your work. You do not need to print out code and data.


This homework is worth about 6% of the course grade. Exceptionally well thought-out and well written homeworks will receive appropriate extra credit.