Spring 2002
Stony Brook
Principles of Database Systems
Annie Liu
Homework 5
Handout H5
Apr. 26, 2002
Due May 7

Hand in your homework any time on or before May 7 (put under my office door if I am not in).

SQL Triggers and Application Programs, Data Storage and Indexing, Query Processing and Optimization, and Some Basics.
You should understand all basic concepts, not just the ones exercised below. This includes mainly what we covered recently (on SQL triggers and application programs, data storage and indexing, query processing and optimization), and the most basic and important aspects covered earlier (on keys, constraints, and redundancy; and data definitions and queries in SQL).
Note: You will need to understand the problems and the solutions to them, not just memorize the answers, because slightly different questions may lead to completely different answers, and seemingly different questions may have exactly the same answer.
More exercises are parts of Project 4. Please make sure that you can do them yourself; do not just let your teammate do them. If you would like more problems to solve, do other exercises in the textbook.

Problem 0. Timing.
The kinds of problems in this homework are meant to help you prepare for the final, but the amount of work here might not be the same as that required for the final. To help us estimate how many problems to put in the actual exam, you are asked to report how long it takes you to do this homework. Put the time right next to your homework header information (name, etc).

Problem 1. Triggers.
a. What is the basic structure of a trigger? Triggers are a part of which version of SQL standard?
b. Give an example of a trigger.
c. What is a trigger event in general, and what can be a trigger event in SQL?
d. List two of the issues to consider in trigger handling.

Problem 2. Application programs.
a. What is the difference between statement-level interface and call-level interface?
b. What is the difference between embedded SQL and dynamic SQL?
c. What are the similarity and difference between dynamic SQL and a call-level interface?
d. What kinds of properties can a cursor have?
e. Where does stored procedure reside? List two advantages of using stored procedures.
f. What is a transaction in JDBC?

Problem 3. Data storage and indexing.
a. What are the goals in designing efficient data storage methods? What is one of the principles in achieving the goals?
b. Suppose a phone book contains 1000 pages, and each page can contain up to 500 records. Suppose we want to search for a particular name in the book. Give a worst-case bound on the number pages that must be looked at to perform a search using each of the following methods: (i) linear search, as if the book is organized as a heap file, (ii) binary search, since the book is ordered by names, (iii) with an index for the name of the first entry on each page.
c. What is the difference between an integrated storage structure and a separate storage structure?
d. Is an integrated storage structure always clustered? Why?
e. Can an file have more than one clustered index? Why?
f. Can an unclustered index be sparse? Why?

Problem 4. Locating index entries.
a. What are two basic efficient ways of locating index entries?
b. What are the advantages of tree indexing over hash indexing and vice versa?
c. Exercise 11.8 (a) in textbook.
d. Search key value s is to be inserted in the following B+ tree. Show what the tree looks like after the insertion.

    | |  f  | |  p  | |
    /        |        \
   V         V         V
  -----    -----    -----
  -----    -----    -----
e. Suppose a family of hash functions h_k(v)=h(v) mod 2^k is used for extendable hashing. If k=2, and we have the following search key value v and hash function value h(v), form the buckets.
v    h(v)
pete 11010
mary 00000
jane 11110
bill 00000
john 01001
tony 10101
karl 10111
f. Suppose each bucket in the above example can contain at most two records. If the key value lucy, where h(lucy)=10001, is to be inserted in the above file, what happens to k and to the buckets?

Problem 5. Query processing and optimization
Suppose a database has the following schema:
Professor(id: INTEGER, name: STRING, deptId: DEPT)
Teaching(profId: INTEGER, crsCode: COURSE, semester: SEMESTER)

a. Write an SQL query that returns the name of all professors in the CS department who have taught a course in F1994.
b. Translate the SQL query in (a) into a most directly corresponding relational algebra expression.
c. Translate the relational algebra expression in (b) into an equivalent expression using pushing.
d. Translate the relational algebra expression in (c) into a most directly corresponding SQL query.

Problem 6. Database design in context.
Suppose schema S has attributes A, B, and C; A uniquely determines B; and B uniquely determines C. Suppose we decompose S into S1 and S2, such that S1 contains A and B, and S2 contains B and C.
a. What are the keys of S, S1, and S2, respectively?
b. Is the decomposition lossless? Why?
c. Does the decomposition lead to tables that are overall smaller? Why?
d. Does the decomposition lead to tables that enable faster query and update operations? Why?