CSE532 (Fall `02)
Advanced Database Systems

[ General Information | Course Outline | Lectures | Handouts | Other Pointers | Requirements ]
[ Selected Projects ] [ Announcements ] [ Proposed Projects ]


General Information

Course description: This course is for students interested in advanced database system design and implementation. The course will quickly cover relational databases and then cover advanced topics in modern database systems, including object-oriented databases, XML databases, distributed databases, and on-line analytical processing. We will discuss various data description and query languages, database design, and query processing and optimization, and also look at distributed object model, and data mining and data warehouses. Students undertake a semester project that includes the design and implementation of a database system, which includes the use of object-oriented features and XML. | Prerequisites: CSE305; or discrete math (sets, relations, functions, logic, etc), data structures and algorithms, good knowledge of Java, plus quick grasp of 305 materials. | Credits: 3.

Instructor: Annie Liu | Email: liuATcsDOTsunysbDOTedu | Office: Computer Science 1433 | Phone: 632-8463.

TA: Bin Tang | Email: bintangATcsDOTsunysbDOTedu

Hours: Tue Thu 11:20-12:40AM, in Computer Science 2129/31.
Annie's office hours: Tue 10-11:20AM, Thu 2-3:20PM, in CS 1433.
Bin's office hours: Mon 1-3 PM, in CS 2110.

Textbook: Database and Transaction Processing: An Application-Oriented Approach by Philip M. Lewis, Arthur Bernstein, and Michael Kifer. Addison Wesley, 2002. You will be given access to the slides from the textbook, but you should also take good course notes yourself.

Grading: A project description and two parts of the course project are worth 5%, 20%, and 10%, respectively, of the grade. A midterm exam and a final exam are worth 25% and 40%, respectively, of the grade. There will also be bonus points with the projects and exams as they fit. No late handins will receive any credit. All work must be done individually unless permitted explicitly; you may discuss with others and look up references, but you must write up your own solutions independently and credit all sources that you used. Any plagiarism or other forms of cheating will result in an F or worse.

Course homepage: http://www.cs.sunysb.edu/~liu/cse532/, containing all course related information.


Course Outline

We will first quickly review relational databases, including entity-relationship model, relational algebra, SQL, relational normalization, and SQL in application programs, and then more carefully discuss relational calculus. Just before discussing relational calculus, we will have a quiz.

We will then focus on two main parts of the course: (1) object-oriented databases, and (2) XML databases. For the first part, we will discuss the motivation and concepts of object data model, the ODMG standard, object-oriented extensions of SQL, and the distributed object model CORBA. For the second part, we will first discuss semistructured data, XML basics and DTD, and XML schema, and then discuss XML query languages including XPath, XSLT, and XQuery. Just before covering XML query languages, we will have a review lecture followed by an in-class midterm exam.

We will next discuses how queries are processed and optimized, followed by a study of distributed databases, and finally cover OLAP together with data mining and data warehouses. At the end, there will be a review lecture and a final exam.

The course project is on the design and implementation of a database system and has two parts. The first part includes the use of object-oriented features; it will start right after the first quiz and be due on the day of the midterm review. The second part includes the use of XML; it will start right after the midterm exam and be due on the day of the final review.


Lectures

Lecture 1 (09/03/02): Overview of course. Review of relational databases: ER model/diagrams to relations/SQL DDL, keys and constraints. Reading: Ch.5. Homework: Handout H1.
Lecture 2 (09/05/02): Relational algebra and SQL DML, aggregates, views, and updates. Reading: Ch.6.
Lecture 3 (09/10/02): Relational normalization: redundancy, anomaly, functional dependence, normal forms BCNF and 3NF, decomposition, lossless, dependency preserving. Reading: Ch.8. Homework: Self review.
Lecture 4 (09/12/02): SQL in the real world: embedded SQL, data passing, status, cursor, connection, transaction, stored procedure; dynamic SQL, parameter, descriptor; JDBC; SQLJ; ODBC. Reading: Ch.10.
No class on Tuesday September 17; follow Monday schedule.
Lecture 5 (09/19/02): Quiz: closed all. Project Part I starts. Homework: Complete project description & team.
Lecture 6 (09/24/02): Relational calculus: tuple RC, relationship with SQL, domain RC. Reading: Ch.7:1-3.
Lecture 7 (09/26/02): Visual query languages, calculi vs algebra, recursion in SQL. Reading: Ch.7:4-7.
Lecture 8 (10/01/02): Object databases: from relational to object-oriented: less redundancies, simpler quries, etc.; conceptural object data model: objects (oid, value) & values (prim, ref, tuple, set), classes (type, method signatures, extent) & types (also 4 kinds), subtypes, database schema & instance. Reading: Ch.16:1-3.
Lecture 9 (10/03/02): Object-relational databases, objects in SQL:1999. Reading: Ch.16:3.4&5.
Lecture 10 (10/08/02): ODMG standard: architecture, ODL, OQL, language bindings. Reading: Ch.16:4.
Lecture 11 (10/10/02): CORBA: IDL, ORB, dynamic invocation, DB services. Reading: Ch.16:6-8.
Lecture 12 (10/15/02): Review. XML databases: description and query of semi-structured, nested, complex data; XML basics: elements, attributes, vs database objects, attribute types. Reading: Ch.17:1-2.2.
Lecture 13 (10/17/02): XML namespaces; DTD; XML Schema: namespaces, include and import, simple and complex types. Reading: Ch.17:2.3-3.3. Homework: Handout H2.
Lecture 14 (10/22/02): XML Schema:more on types,integrity constraints; XPath,XPointer. Reading:Ch.17:3.4-4.1.
Lecture 15 (10/24/02): Project Part I due.   Midterm review: scope, XML review, queries, recursion in SQL.
Midterm (10/29/02): In-class exam: open your own handwritten notes only.   Project Part II starts.
Lecture 17 (10/31/02): XML query languages. XSLT: stylesheet, templates, evaluation. Reading: Ch.17:4.2.
Lecture 18 (11/05/02): XQuery: FLWR expression, evaluation, built-in functions, user-defined functions, aggregation, quantification. Reading: Ch.17:4.3.
Lecture 19 (11/07/02): More XQuery: data and types; XQuery and XML Schema; proj, sel, construction, group, join, recursive function, wildcard types, XqueryX; XPath and XQuery, laws. Reading: Ch.17:4.4-4.5, extra slides.
Lecture 20 (11/12/02): Query processing: external sort, duplicate removal, computing projections, computing selections, access path. Reading: Ch.13:1-3.
Lecture 21 (11/14/02): Computing joins: block nested loops, index-nested loops, sort-merge, hash; star joins: using join index, bitmap indices; choosing indices. Reading: Ch.13:4-8.
Lecture 22 (11/19/02): Query optimization: equivalence rules; heuristics: pushing selections and projections in, pipelining; cost estimation. Reading: Ch.14:1-3.
Lecture 23 (11/21/02): Estimating output size; choosing query evaluation plan; incremental update for optimization, example of computing averages. Reading: Ch.14:4-7.
Lecture 24 (11/26/02): Distributed databases: kinds of data distribution, views of developers; data fragmentation, replication, performance analysis. Reading: Ch.18:1-2.
Thanksgiving break: November 27-30.
Lecture 25 (12/03/02): Query planning,cost analysis,semijoin,pushing. Reading:Ch.18:3-4. Homework:Handout H3.
Lecture 26 (12/05/02): OLAP, data mining, and data warehouses: OLAP, vs OLTP, vs data mining; multidimensional model, star schema; aggregation, drilling, rolling,