CSE532 (Fall `02)
Advanced Database Systems

[ General Information | Course Outline | Lectures | Handouts | Other Pointers | Requirements ]
[ Selected Projects ] [ Announcements ] [ Proposed Projects ]


General Information

Course description: This course is for students interested in advanced database system design and implementation. The course will quickly cover relational databases and then cover advanced topics in modern database systems, including object-oriented databases, XML databases, distributed databases, and on-line analytical processing. We will discuss various data description and query languages, database design, and query processing and optimization, and also look at distributed object model, and data mining and data warehouses. Students undertake a semester project that includes the design and implementation of a database system, which includes the use of object-oriented features and XML. | Prerequisites: CSE305; or discrete math (sets, relations, functions, logic, etc), data structures and algorithms, good knowledge of Java, plus quick grasp of 305 materials. | Credits: 3.

Instructor: Annie Liu | Email: liuATcsDOTsunysbDOTedu | Office: Computer Science 1433 | Phone: 632-8463.

TA: Bin Tang | Email: bintangATcsDOTsunysbDOTedu

Hours: Tue Thu 11:20-12:40AM, in Computer Science 2129/31.
Annie's office hours: Tue 10-11:20AM, Thu 2-3:20PM, in CS 1433.
Bin's office hours: Mon 1-3 PM, in CS 2110.

Textbook: Database and Transaction Processing: An Application-Oriented Approach by Philip M. Lewis, Arthur Bernstein, and Michael Kifer. Addison Wesley, 2002. You will be given access to the slides from the textbook, but you should also take good course notes yourself.

Grading: A project description and two parts of the course project are worth 5%, 20%, and 10%, respectively, of the grade. A midterm exam and a final exam are worth 25% and 40%, respectively, of the grade. There will also be bonus points with the projects and exams as they fit. No late handins will receive any credit. All work must be done individually unless permitted explicitly; you may discuss with others and look up references, but you must write up your own solutions independently and credit all sources that you used. Any plagiarism or other forms of cheating will result in an F or worse.

Course homepage: http://www.cs.sunysb.edu/~liu/cse532/, containing all course related information.


Course Outline

We will first quickly review relational databases, including entity-relationship model, relational algebra, SQL, relational normalization, and SQL in application programs, and then more carefully discuss relational calculus. Just before discussing relational calculus, we will have a quiz.

We will then focus on two main parts of the course: (1) object-oriented databases, and (2) XML databases. For the first part, we will discuss the motivation and concepts of object data model, the ODMG standard, object-oriented extensions of SQL, and the distributed object model CORBA. For the second part, we will first discuss semistructured data, XML basics and DTD, and XML schema, and then discuss XML query languages including XPath, XSLT, and XQuery. Just before covering XML query languages, we will have a review lecture followed by an in-class midterm exam.

We will next discuses how queries are processed and optimized, followed by a study of distributed databases, and finally cover OLAP together with data mining and data warehouses. At the end, there will be a review lecture and a final exam.

The course project is on the design and implementation of a database system and has two parts. The first part includes the use of object-oriented features; it will start right after the first quiz and be due on the day of the midterm review. The second part includes the use of XML; it will start right after the midterm exam and be due on the day of the final review.


Lectures

Lecture 1 (09/03/02): Overview of course. Review of relational databases: ER model/diagrams to relations/SQL DDL, keys and constraints. Reading: Ch.5. Homework: Handout H1.
Lecture 2 (09/05/02): Relational algebra and SQL DML, aggregates, views, and updates. Reading: Ch.6.
Lecture 3 (09/10/02): Relational normalization: redundancy, anomaly, functional dependence, normal forms BCNF and 3NF, decomposition, lossless, dependency preserving. Reading: Ch.8. Homework: Self review.
Lecture 4 (09/12/02): SQL in the real world: embedded SQL, data passing, status, cursor, connection, transaction, stored procedure; dynamic SQL, parameter, descriptor; JDBC; SQLJ; ODBC. Reading: Ch.10.
No class on Tuesday September 17; follow Monday schedule.
Lecture 5 (09/19/02): Quiz: closed all. Project Part I starts. Homework: Complete project description & team.
Lecture 6 (09/24/02): Relational calculus: tuple RC, relationship with SQL, domain RC. Reading: Ch.7:1-3.
Lecture 7 (09/26/02): Visual query languages, calculi vs algebra, recursion in SQL. Reading: Ch.7:4-7.
Lecture 8 (10/01/02): Object databases: from relational to object-oriented: less redundancies, simpler quries, etc.; conceptural object data model: objects (oid, value) & values (prim, ref, tuple, set), classes (type, method signatures, extent) & types (also 4 kinds), subtypes, database schema & instance. Reading: Ch.16:1-3.
Lecture 9 (10/03/02): Object-relational databases, objects in SQL:1999. Reading: Ch.16:3.4&5.
Lecture 10 (10/08/02): ODMG standard: architecture, ODL, OQL, language bindings. Reading: Ch.16:4.
Lecture 11 (10/10/02): CORBA: IDL, ORB, dynamic invocation, DB services. Reading: Ch.16:6-8.
Lecture 12 (10/15/02): Review. XML databases: description and query of semi-structured, nested, complex data; XML basics: elements, attributes, vs database objects, attribute types. Reading: Ch.17:1-2.2.
Lecture 13 (10/17/02): XML namespaces; DTD; XML Schema: namespaces, include and import, simple and complex types. Reading: Ch.17:2.3-3.3. Homework: Handout H2.
Lecture 14 (10/22/02): XML Schema:more on types,integrity constraints; XPath,XPointer. Reading:Ch.17:3.4-4.1.
Lecture 15 (10/24/02): Project Part I due.   Midterm review: scope, XML review, queries, recursion in SQL.
Midterm (10/29/02): In-class exam: open your own handwritten notes only.   Project Part II starts.
Lecture 17 (10/31/02): XML query languages. XSLT: stylesheet, templates, evaluation. Reading: Ch.17:4.2.
Lecture 18 (11/05/02): XQuery: FLWR expression, evaluation, built-in functions, user-defined functions, aggregation, quantification. Reading: Ch.17:4.3.
Lecture 19 (11/07/02): More XQuery: data and types; XQuery and XML Schema; proj, sel, construction, group, join, recursive function, wildcard types, XqueryX; XPath and XQuery, laws. Reading: Ch.17:4.4-4.5, extra slides.
Lecture 20 (11/12/02): Query processing: external sort, duplicate removal, computing projections, computing selections, access path. Reading: Ch.13:1-3.
Lecture 21 (11/14/02): Computing joins: block nested loops, index-nested loops, sort-merge, hash; star joins: using join index, bitmap indices; choosing indices. Reading: Ch.13:4-8.
Lecture 22 (11/19/02): Query optimization: equivalence rules; heuristics: pushing selections and projections in, pipelining; cost estimation. Reading: Ch.14:1-3.
Lecture 23 (11/21/02): Estimating output size; choosing query evaluation plan; incremental update for optimization, example of computing averages. Reading: Ch.14:4-7.
Lecture 24 (11/26/02): Distributed databases: kinds of data distribution, views of developers; data fragmentation, replication, performance analysis. Reading: Ch.18:1-2.
Thanksgiving break: November 27-30.
Lecture 25 (12/03/02): Query planning,cost analysis,semijoin,pushing. Reading:Ch.18:3-4. Homework:Handout H3.
Lecture 26 (12/05/02): OLAP, data mining, and data warehouses: OLAP, vs OLTP, vs data mining; multidimensional model, star schema; aggregation, drilling, rolling, slicing, dicing; CUBE, ROLLUP. Reaching: Ch.19:1-3.
Lecture 27 (12/10/02): Materialized views, ROLAP and MOLAP; data mining, associations, priori algorithm, other kinds, machine learning; data warehouse, ETL tools, metadata, incremental updates. Reading: Ch.19:4-8.
Lecture 28 (12/12/02): Project Part II due.   Final review: scope: data/schema, op/query, impl/cost.
Final Exam (12/19/02): 11:00AM-1:30PM, CS 2129. Final exam: open your own handwritten notes only.

Handouts

Handout Q: Questionnaire

Slides for Ch.  5: Entity-Relationship Model
Slides for Ch.  6: Relational Algebra and SQL
Slides for Ch.  8: Relational Normalization Theory
Slides for Ch.10: SQL in the Real World

Slides for Ch.  7: Relational Calculus and Visual Query Languages

Slides for Ch.16: Object Databases

Slides for Ch.17: XML and Web Data
Slides for Ch.17 new version of the part on XQuery: XQuery
Slides for an XQuery tutorial that includes types in XQuery: XQuery Tutorial

Slides for Ch.13: Query Processing Basics
Slides for Ch.14: Query Optimization Overview

Slides for Ch.18: Distributed Databases
Slides for Ch.19: OLAP and Data Mining

Handout H1: Homework 1: Description of a Database System Application

Handout Qz: Quiz

Handout P1: Project Part I: An Object-Relational Database System

Handout H2: Homework 2 and Solution: Practice Problems for Midterm

Handout M: Midterm Exam
Handout MS: Solution and Grading Guide for Midterm Exam

Handout P2: Project Part II: An XML-Based Database System

Handout H3: Homework 3 and Solution: Practice Problems for Final

Handout F: Final Exam
Handout FS: Solution to Final Exam


Other Pointers

Windows NT Transaction Processing Labs: policies, FAQ, connecting to DB2, DB2 Docs, JBuilder, links, etc.

IBM's DB2 V7 manuals, including
SQL Getting Started (html) | (pdf) | (pdf at Trans Labs)
SQL References, V7.1 (html) | (pdf) | (pdf at Trans Labs)
SQL References, V7.2 (pdf)
Application Development Guide, V7.1 (html) | (direct link to html found by Alex, faster too!) | (pdf)
Application Development Guide, V7.2 (pdf) | (direct link to pdf found by Alex)

Java 1.4.1 API Specification
JDBC 3.0 API Specification

W3C XML Query page: specifications, implementations, links, etc.
Quick reference for XML Path Language (XPath) 1.0
Quick reference for XML Schema (XSD) Part I Structures 1.0 | Part II Datatypes 1.0

Ghostscript, Ghostview and GSview: Ghostview and GSview are for viewing and printing Postscript documents (some of the handouts for this course will be in this format). If you use Linux, then this software is already installed on your machine. On Windows, you need to download both Ghostview and Ghostscript (and also some fonts). Unzip Ghostview and run setup.exe. It will unpack and install the rest.

The gzip homepage: GNU zip for compression.


Requirements

This course includes both theory and implementation aspects, so you should have the prerequisites for both. The official title of the course is Theory of Database Systems, but the official content is really Advanced Database Systems, hence the title we use. For this semester, the exact content is as described at the top of the course homepage, which is very similar to the content of the course in Spring 2002.

You should learn all information on the course homepage. Check the homepage periodically for Announcements.

Do all course work. The projects and exercises are integral parts of the course as they provide concrete experiences with the abstract ideas covered in the class.

The following information must be prominently visible in program file headers and in items submitted as a hard copy: student number, name, email address, course number, project number, due date.

All work must be submitted in a neat and organized fashion, and project reports must be typed and stapled; you may lose points if these conditions are not met.

Your approach to solving problems is as important as your final solution; you must show how you arrived at your solutions and include appropriate explanations.

If you feel your grade was assigned incorrectly, please see the TA who graded your work no later than two weeks after the assignment was returned to the class.

Computing facilities: You will be given an account in the graduate NT machines lab and sbpus Solaris machines room. Never let anyone else use your account; it is against the rules. Please be conscious of security in the lab; theft or vandalism will be punished severely. If you have any problems with the hardware or software in the lab (other than with the requirements of the project itself), please email ntadminATcsDOTsunysbDOTedu with a copy to me; neither the TA nor I could fix such problems.

Disability: If you have a physical, psychological, medical or learning disability that may have an impact on your ability to carry out assigned course work, please contact the staff in the Disabled Student Services office (DSS), Room 133 Humanities, 632-6748/TDD. DSS will review your concerns and determine with you what accommodations are necessary and appropriate. All information and documentation of disability are confidential.


liuATcsDOTsunysbDOTedu