13.2.4.
You need to use integration. E.g., use "dx" as the track-width, and set up a formula
to integrate over. The answer you will get is actually quite close to 1/3.
#2b (#2a is easier).
Explanation:
IF there are say 9 keys, then three of these keys will appear in 3 tuples each, three will appear
in 2 records each, and the remaining three will appear in 1 record each.
The above is just an example to explain the meaning of the first statement of the question.
HINT:
Consider the three cases of number of duplicates separately (each occuring with a
probability of 1/3). For each case, figure out the probability that you will have
to access ONE OR TWO buckets (note that the records for any key can be in at most 2
buckets). Then, sum up the appropriate times to find the expected I/O cost.
14.2.8.
See the solution on the textbook's webpage for (a). Based on that, you should be
able to solve (b) and (c).
For solution to (a): See http://infolab.stanford.edu/~ullman/dscbsols/sol13.html,
13.3.9 (a).
14.2.9.
Start with an empty tree, and insert a few keys in the order. You will begin to
see a pattern.
14.3.8.
This is perhaps the most difficult problem (so leave it for the end).
(a)
There are total n PHYSICAL buckets.
How many VIRTUAL buckets are there? Note that the n is at least 1/2 the TOTAL (physical
and virtual) number of buckets. This can be computed as a function of n (I leave it to
you to figure out). E.g., if n = 5, then the number of virtual buckets is 3.
If n = 11, then the number of virtual buckets is 5. HINT: Use the log and the CEILING
functions.
Let V be the number of virtual bucket, and T be the total number.
Then, each of the (physical and virtual) buckets gets r/T tuples. BUT, since the virtual
buckets don't exist -- their tuples go to some of the physical buckets. So, SOME of the
physical buckets will get 2r/T tuples, while the others will get r/T tuples.
You need to find out how many of the n buckets get 2r/T tuples, and how many get r/T tuples.
The number of blocks to store 2r/T tuples is CEILING(2r/kT).
(b)
This is a more difficult version of (a) -- you will have to look up some properties of
poisson distribution.
Grid Problem.
Consider the "effect" of what ONE grid line (either vertical or horizontal) has on the
x=y line.
14.5.6.
(a) is straightforward.
(b) requires you to differentiate the formula in (a), and THEN ARGUE about when the function
minimizes.
14.6.7.
Straightforward (once you understand k-d trees).
14.6.11
As in 14.6.7, here too you will have to traverse MULTIPLE children at each node. The (average)
number of children you need to traverse depends on the average overlap between the subregions
of the children.
More hint:
The point is that: In a "where am I query" when you are going down the tree to
find all the rectangles that contain a point — lets say you are at a node P in
the tree. The big question is on an average how many children of P should you
explore? The 150% number will help you answer that question. Once you answer
that — rest is simple.