Steven S. Skiena
How good are Random Trees?
Who have seen that binary trees can have heights ranging from to n. How tall are they on average?
By using an intuitive argument, like I did with quicksort. I will convince you a random tree is usually quite close to balanced. The text contains a more rigorous proof, which you should look at.
Consider the first insertion into an empty tree. This node becomes the root and never changes. Since in a binary search tree all keys less than the root go in the left subtree, the root acts as a partition or pivot element!
Let's say a key is a 'good' pivot element if it is in the center half of the sorted space of keys. Half of the time, our root will be a 'good' pivot element.
The next insertion will form the root of a subtree, and will be drawn at random from the items either > root or < root. Again, half the time each insertion will be a 'good' partition of the appropriate subset of keys.
The bigger half of a good partition contains at most 3n/4 items. Thus the maximum depth of good splits k is:
so .
Doubling the depth to account for bad splits still makes in on average!
On average, random search trees are very good - more careful analysis shows the average height after n insertions is . Since , this is only 39% more than a perfectly balanced tree.
Of course, if we get unlucky and insert keys in sorted order, we are doomed to the worst case performance.
insert(a)
insert(b)
insert(c)
insert(d)
What we want is an insertion/deletion procedure which adjusts the tree a little after each insertion, keeping it close enough to balanced so the maximum height is logarithmic, but flexible enough so we can still update fast!
Perfectly Balanced Trees
Perfectly balanced trees require a lot of work to maintain:
If we insert the key 1, we must move every single node in the tree to rebalance it, taking time.
Therefore, when we talk about "balanced" trees, we mean trees whose height is , so all dictionary operations (insert, delete, search, min/max, successor/predecessor) take time.
Red-Black trees are binary search trees where each node is assigned a color, where the coloring scheme helps us maintain the height as .