CSE532 (Fall `02)
Advanced Database Systems
[ General Information
| Course Outline
| Other Pointers
[ Selected Projects
[ Proposed Projects
This course is for students interested in advanced database system
design and implementation. The course will quickly cover relational
databases and then cover advanced topics in modern database systems,
including object-oriented databases, XML databases, distributed
databases, and on-line analytical processing. We will discuss various
data description and query languages, database design, and query
processing and optimization, and also look at distributed object
model, and data mining and data warehouses. Students undertake a
semester project that includes the design and implementation of a
database system, which includes the use of object-oriented features
and XML. | Prerequisites: CSE305; or discrete math (sets,
relations, functions, logic, etc), data structures and algorithms,
good knowledge of Java, plus quick grasp of 305 materials. |
Instructor: Annie Liu |
Email: liuATcsDOTsunysbDOTedu |
Office: Computer Science 1433 | Phone: 632-8463.
TA: Bin Tang |
Hours: Tue Thu 11:20-12:40AM, in Computer Science 2129/31.
Annie's office hours: Tue 10-11:20AM, Thu 2-3:20PM, in CS 1433.
Bin's office hours: Mon 1-3 PM, in CS 2110.
Textbook: Database and Transaction Processing: An
Application-Oriented Approach by Philip M. Lewis, Arthur
Bernstein, and Michael Kifer. Addison Wesley, 2002.
You will be given
access to the slides from the textbook, but you should also take good
course notes yourself.
Grading: A project description and two parts of the course
project are worth 5%, 20%, and 10%, respectively, of the grade. A
midterm exam and a final exam are worth 25% and 40%, respectively, of
the grade. There will also be bonus points with the projects and
exams as they fit. No late handins will receive any credit. All work
must be done individually unless permitted explicitly; you may discuss
with others and look up references, but you must write up your own
solutions independently and credit all sources that you used. Any
plagiarism or other forms of cheating will result in an F or worse.
Course homepage: http://www.cs.sunysb.edu/~liu/cse532/,
containing all course related information.
We will first quickly review relational databases, including
entity-relationship model, relational algebra, SQL, relational
normalization, and SQL in application programs, and then more
carefully discuss relational calculus. Just before discussing
relational calculus, we will have a quiz.
We will then focus on two main parts of the course: (1)
object-oriented databases, and (2) XML databases. For the first part,
we will discuss the motivation and concepts of object data model, the
ODMG standard, object-oriented extensions of SQL, and the distributed
object model CORBA. For the second part, we will first discuss
semistructured data, XML basics and DTD, and XML schema, and then
discuss XML query languages including XPath, XSLT, and XQuery. Just
before covering XML query languages, we will have a review lecture
followed by an in-class midterm exam.
We will next discuses how queries are processed and optimized,
followed by a study of distributed databases, and finally cover OLAP
together with data mining and data warehouses. At the end, there
will be a review lecture and a final exam.
The course project is on the design and implementation of a
database system and has two parts. The first part includes the use of
object-oriented features; it will start right after the first quiz and
be due on the day of the midterm review. The second part includes the
use of XML; it will start right after the midterm exam and be due on
the day of the final review.
Lecture 1 (09/03/02): Overview of course. Review of
relational databases: ER model/diagrams to
relations/SQL DDL, keys and constraints. Reading: Ch.5. Homework: Handout H1.
Lecture 2 (09/05/02): Relational algebra and
SQL DML, aggregates, views, and updates. Reading: Ch.6.
Lecture 3 (09/10/02): Relational
normalization: redundancy, anomaly, functional dependence, normal
forms BCNF and 3NF, decomposition, lossless, dependency
preserving. Reading: Ch.8.
Homework: Self review.
Lecture 4 (09/12/02): SQL in the real world:
embedded SQL, data passing, status, cursor, connection, transaction,
stored procedure; dynamic SQL, parameter, descriptor; JDBC; SQLJ;
ODBC. Reading: Ch.10.
No class on Tuesday September 17; follow Monday schedule.
Lecture 5 (09/19/02): Quiz: closed
all. Project Part I starts. Homework: Complete project description & team.
Lecture 6 (09/24/02): Relational calculus: tuple RC, relationship with SQL, domain RC. Reading: Ch.7:1-3.
Lecture 7 (09/26/02): Visual query languages,
calculi vs algebra, recursion in SQL. Reading:
Lecture 8 (10/01/02): Object databases: from relational to object-oriented: less redundancies,
simpler quries, etc.; conceptural object data model: objects (oid,
value) & values (prim, ref, tuple, set), classes (type, method
signatures, extent) & types (also 4 kinds), subtypes, database schema
& instance. Reading: Ch.16:1-3.
Lecture 9 (10/03/02): Object-relational
databases, objects in SQL:1999. Reading:
Lecture 10 (10/08/02): ODMG standard:
architecture, ODL, OQL, language bindings. Reading: Ch.16:4.
Lecture 11 (10/10/02): CORBA: IDL, ORB,
dynamic invocation, DB services. Reading:
Lecture 12 (10/15/02): Review. XML databases:
description and query of semi-structured, nested,
complex data; XML basics: elements, attributes, vs database objects,
attribute types. Reading: Ch.17:1-2.2.
Lecture 13 (10/17/02): XML namespaces; DTD;
XML Schema: namespaces, include and import, simple and complex
types. Reading: Ch.17:2.3-3.3.
Homework: Handout H2.
Lecture 14 (10/22/02): XML Schema:more on
types,integrity constraints; XPath,XPointer. Reading:Ch.17:3.4-4.1.
Lecture 15 (10/24/02): Project Part I due.
Midterm review: scope, XML review, queries,
recursion in SQL.
Midterm (10/29/02): In-class exam: open your own handwritten notes only.
Project Part II starts.
Lecture 17 (10/31/02): XML query languages. XSLT: stylesheet, templates, evaluation. Reading: Ch.17:4.2.
Lecture 18 (11/05/02): XQuery: FLWR
expression, evaluation, built-in functions, user-defined functions,
aggregation, quantification. Reading:
Lecture 19 (11/07/02): More XQuery: data and
types; XQuery and XML Schema; proj, sel, construction, group, join,
recursive function, wildcard types, XqueryX; XPath and XQuery,
laws. Reading: Ch.17:4.4-4.5, extra
Lecture 20 (11/12/02): Query processing: external sort, duplicate removal, computing projections,
computing selections, access path. Reading:
Lecture 21 (11/14/02): Computing joins: block
nested loops, index-nested loops, sort-merge, hash; star joins: using
join index, bitmap indices; choosing indices. Reading: Ch.13:4-8.
Lecture 22 (11/19/02): Query optimization: equivalence rules; heuristics: pushing selections and
projections in, pipelining; cost estimation. Reading: Ch.14:1-3.
Lecture 23 (11/21/02): Estimating output
size; choosing query evaluation plan; incremental update for
optimization, example of computing averages. Reading: Ch.14:4-7.
Lecture 24 (11/26/02): Distributed databases: kinds of data distribution, views of developers; data
fragmentation, replication, performance analysis. Reading: Ch.18:1-2.
Thanksgiving break: November 27-30.
Lecture 25 (12/03/02): Query planning,cost
analysis,semijoin,pushing. Reading:Ch.18:3-4. Homework:Handout H3.
Lecture 26 (12/05/02): OLAP, data mining, and data
warehouses: OLAP, vs OLTP, vs data mining;
multidimensional model, star schema; aggregation, drilling, rolling,