Trees

Contents

Dictionary ADT

Dictionary ADT represents a collection of items, where each item can be a key or a (key, value) pair.

ADT Item Ordered? Duplicates? Implementation
Set key Hash table
Sorted set key Balanced tree
Multiset key Hash table
Sorted multiset key Balanced tree
Map (key, value) Hash table
Sorted map (key, value) Balanced tree
Multimap (key, value) Hash table
Sorted multimap (key, value) Balanced tree

Map

A map is a collection of key-value pairs (k, v), where keys are unique.

Key Value
Dictionary word Word meaning
User ID User record
Employee ID Employee record
Student ID Student record
Patient ID Patient record
Profile ID Person details
Order ID Order details
Transaction ID Transaction details
URL Web page
Full file name File

Set ADT (java.util.Set interface)

Method Functionality
add(e) Adds the element e to S if it is not already present.
remove(e) Removes the element e from S if it is present.
contains(e) Returns whether e is an element of S.
iterator() Returns an iterator of the elements of S.
addAll(T) Updates S to also include all elements of set T, effectively replacing S by S ∪ T.
retainAll(T) Updates S so that it only keeps those elements that are also elements of set T, effectively replacing S by S ∩ T.
removeAll(T) Updates S by removing any of its elements that also occur in set T, effectively replacing S by S − T.

Sorted set ADT (java.util.SortedSet interface)

Method Functionality
first() Returns the smallest element in S.
last() Returns the largest element in S.
ceiling(e) Returns the smallest element ≥ e.
floor(e) Returns the largest element ≤ e.
lower(e) Returns the largest element < e.
higher(e) Returns the smallest element > e.
subSet(e1,e2) Returns an iteration of all elements greater than or equal to e1, but strictly less than e2.
pollFirst() Returns and removes the smallest element in S.
pollLast() Returns and removes the largest element in S.

Multiset ADT

Method Functionality
add(e) Adds a single occurrence of e to the multiset.
contains(e) Returns true if the multiset contains an element equal to e.
count(e) Returns the number of occurrences of e in the multiset.
remove(e) Removes a single occurrence of e from the multiset.
remove(e, n) Removes n occurrences of e from the multiset.
size() Returns the number of elements of the multiset (including duplicates).
iterator() Returns an iteration of all elements of the multiset (repeating those with multiplicity greater than one).

Dictionary operations (for unique keys)

Worst case Average case
Data structure Search Insert Delete Search Insert Delete
Sorted array $\Theta(\log n)$ $\Theta(n)$ $\Theta(n)$ $\Theta(\log n)$ $\Theta(n)$ $\Theta(n)$
Unsorted list $\Theta(n)$ $\Theta(1)$ $\Theta(n)$ $\Theta(n)$ $\Theta(1)$ $\Theta(n)$
Hashing $\Theta(n)$ $\Theta(n)$ $\Theta(n)$ $\Theta(1)^*$ $\Theta(1)^*$ $\Theta(1)^*$
BST $\Theta(n)$ $\Theta(n)$ $\Theta(n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$
Splay tree $\Theta(\log n)^*$ $\Theta(\log n)^*$ $\Theta(\log n)^*$ $\Theta(\log n)^*$ $\Theta(\log n)^*$ $\Theta(\log n)^*$
Scapegoat tree $\Theta(\log n)$ $\Theta(\log n)^*$ $\Theta(\log n)^*$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$
AVL tree $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$
Red-black tree $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$
AA tree $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$
(a,b)-tree $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$
B-tree $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$

* = Amortized

General Trees

Family tree

Family tree

Company organization tree

Company organization tree

File system tree

File system tree

Book organization tree

Book organization tree

Terminology

Term Meaning
Tree ADT that stores elements hierarchically
Parent node Immediate previous-level node
Child nodes Immediate next-level nodes
Root node Top node of the tree
Sibling nodes Nodes that are children of the same parent
External nodes Nodes without children
Internal nodes Nodes with one or more children
Ancestor node Parent node or ancestor of parent node
Descendent node Child node or descendent of child node
Subtree Tree consisting of the node and its descendants
Edge Pair of nodes denoting a parent-child relation
Path Pair of nodes denoting an ancestor-descendant relation
Ordered tree Tree with a meaningful linear order among child nodes

Terminology

Terminology

Binary Trees

A binary tree is an ordered tree with the following properties:

  1. Every node has at most two children.
  2. Each child node is labeled as a left child or a right child.
  3. A left child precedes a right child in the order of children.

A recursive definition of the binary tree:

Decision tree

Decision tree

Arithmetic expression tree

Arithmetic expression tree

Tree represents $((((3+1)*3)/((9-5)+2))-((3*(7-4))+6))$.

Terminology

Term Meaning
Left subtree Subtree rooted at the left child of an internal node
Right subtree Subtree rooted at the right child of an internal node
Proper/full tree A tree in which every node has either 0 or 2 children
Complete tree Tree in which all except possibly the last level is completely filled and the nodes in the last level are as far left as possible
Perfect tree Complete tree in which the last level is completely filled

Tree examples

Binary tree normal

Binary ✓, Proper ✗, Complete ✗, Perfect ✗

Proper binary tree

Binary ✓, Proper ✓, Complete ✗, Perfect ✗

Proper binary tree 2

Binary ✓, Proper ✓, Complete ✗, Perfect ✗

Complete binary tree

Binary ✓, Proper ✗, Complete ✓, Perfect ✗

Complete binary tree 2

Binary ✓, Proper ✓, Complete ✓, Perfect ✗

Perfect binary tree

Binary ✓, Proper ✓, Complete ✓, Perfect ✓

Levels and maximum number of nodes

Levels and maximum number of nodes

Properties of binary tree

Let

Then

Properties of proper binary tree

If $T$ is a proper nonempty binary tree,

Implementing a binary tree using linked structure

Implementing a binary tree using linked structure 1 Implementing a binary tree using linked structure 2

Implementing a binary tree using array

Implementing a binary tree using array 1
Implementing a binary tree using array 2

Implementing a binary tree using array

Implementing a binary tree using array 1
Implementing a binary tree using array 2

Implementing a binary tree using array

$$ f(p) = \begin{cases} 0 & \text{if $p$ is the root} \\ 2f(q) + 1 & \text{if $p$ is the left child of position $q$} \\ 2f(q) + 2 & \text{if $p$ is the right child of position $q$} \end{cases} $$

Implementing a general tree using linked structure

Implementing a general tree using linked structure 1 Implementing a general tree using linked structure 2

Tree traversals


Traversal Binary tree? General tree?
Preorder traversal
Inorder traversal
Postorder traversal
Breadth-first traversal

Preorder / inorder / postorder traversals

\begin{algorithm}
\begin{algorithmic}
\FUNCTION{PreorderTraversal}{$root$}
    \IF{$root \ne null$}
        \STATE \CALL{Visit}{$root$}
        \STATE \CALL{PreorderTraversal}{$root.left$}
        \STATE \CALL{PreorderTraversal}{$root.right$}
    \ENDIF
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
\begin{algorithm}
\begin{algorithmic}
\FUNCTION{InorderTraversal}{$root$}
    \IF{$root \ne null$}
        \STATE \CALL{InorderTraversal}{$root.left$}
        \STATE \CALL{Visit}{$root$}
        \STATE \CALL{InorderTraversal}{$root.right$}
    \ENDIF
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
\begin{algorithm}
\begin{algorithmic}
\FUNCTION{PostorderTraversal}{$root$}
    \IF{$root \ne null$}
        \STATE \CALL{PostorderTraversal}{$root.left$}
        \STATE \CALL{PostorderTraversal}{$root.right$}
        \STATE \CALL{Visit}{$root$}
    \ENDIF
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
Preorder inorder postorder traversals 1 Preorder inorder postorder traversals 2

Preorder traversal

Preorder traversal 1

Preorder traversal: Table of contents

Preorder traversal table of contents 1

Postorder traversal

Postorder traversal 1

Postorder traversal: Compute disk space

\begin{algorithm}
\begin{algorithmic}
\FUNCTION{ComputeDiskSpace}{$root$}
    \STATE $space \gets root.key$
    \FORALL{child $child$ of $root$ node}
        \STATE $space \gets space +$ \CALL{ComputeDiskSpace}{$root.child$}
    \ENDFOR
    \RETURN $space$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}

Inorder traversal: Arithmetic expression

Inorder traversal arithmetic expression 1

$(((3+1) \times 3) / ((9-5) + 2)) - ((3 \times (7-4)) + 6)$

Breadth-first traversal: Game trees

Breadth-first traversal game trees 1

Breadth-first traversal

General tree.

\begin{algorithm}
\begin{algorithmic}
\FUNCTION{BreadthFirstTraversal}{$root$}
    \STATE $Q$.Enqueue($root$)
    \WHILE{$Q$ is not empty}
        \STATE $curr \gets Q$.Dequeue$()$
        \STATE \CALL{Visit}{$curr$}
        \FORALL{child $child$ of $curr$ node}
            \STATE $Q$.Enqueue$(curr.child)$
        \ENDFOR
    \ENDWHILE
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}

Binary tree.

\begin{algorithm}
\begin{algorithmic}
\FUNCTION{BreadthFirstTraversal}{$root$}
    \STATE $Q$.Enqueue($root$)
    \WHILE{$Q$ is not empty}
        \STATE $curr \gets Q$.Dequeue$()$
        \STATE \CALL{Visit}{$curr$}
        \IF{$curr.left \ne null$}
            \STATE $Q$.Enqueue$(curr.left)$
        \ENDIF
        \IF{$curr.right \ne null$}
            \STATE $Q$.Enqueue$(curr.right)$
        \ENDIF
    \ENDWHILE
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}

Binary Search Trees (BST)

Binary search tree (BST)

A binary search tree is a binary tree $T$ such that, for each internal node $p$ of $T$:

Binary search tree BST 1

Binary search tree node

class Node<T>
{
    T key;
    Node<T> left;
    Node<T> right;

    Node(T item, Node<T> lchild, Node<T> rchild)
    { key = item; left = lchild; right = rchild; }

    Node(T item)
    { this(item, null, null); }
}

Search: 65 exists

Search 65 exists 1

Search: 68 does not exist

Search 68 does not exist 1

Search: Recursive algorithm

\begin{algorithm}
\begin{algorithmic}
\FUNCTION{Search}{$curr, target$}
    \IF{$curr = null$}
        \RETURN $curr$ \COMMENT{unsuccessful search}
    \ENDIF
    \IF{$target < curr.key$}
        \RETURN \CALL{Search}{$curr.left, target$} \COMMENT{recur on left subtree}
    \ENDIF
    \IF{$target > curr.key$}
        \RETURN \CALL{Search}{$curr.right, target$} \COMMENT{recur on right subtree}
    \ENDIF
    \IF{$target = curr.key$}
        \RETURN $curr$ \COMMENT{successful search}
    \ENDIF
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}

Search: Non-recursive algorithm

\begin{algorithm}
\begin{algorithmic}
\FUNCTION{Search}{$curr, target$}
    \WHILE{$curr \ne null$}
        \IF{$target < curr.key$}
            \STATE $curr \gets curr.left$ \COMMENT{recur on left subtree}
        \ELSIF{$target > curr.key$}
            \STATE $curr \gets curr.right$ \COMMENT{recur on right subtree}
        \ELSIF{$target = curr.key$}
            \RETURN $curr$ \COMMENT{successful search}
        \ENDIF
    \ENDWHILE
    \RETURN $null$ \COMMENT{unsuccessful search}
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}

Search: Analysis

Search analysis 1

$\text{Time} = O(h) = O(n)$

Add 68

Add 68 1

Add 68 2

Add: Recursive algorithm

\begin{algorithm}
\begin{algorithmic}
\FUNCTION{Add}{$curr, item$}
    \IF{$curr = null$}
        \STATE $curr \gets$ Node$(item)$ \COMMENT{item does not exist}
    \ELSIF{$curr \ne null$}
        \IF{$item < curr.key$}
            \STATE $curr.left \gets$ \CALL{Add}{$curr.left, item$} \COMMENT{recur on left subtree}
        \ELSIF{$item > curr.key$}
            \STATE $curr.right \gets$ \CALL{Add}{$curr.right, item$} \COMMENT{recur on right subtree}
        \ELSIF{$item = curr.key$}
            \STATE do nothing \COMMENT{item exists}
        \ENDIF
    \ENDIF

    \RETURN $curr$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}

Add: Non-recursive algorithm

\begin{algorithm}
\begin{algorithmic}
\FUNCTION{Add}{$curr, item$}
    \STATE $parent \gets null$
    \WHILE{$curr \ne null$}
        \STATE $parent \gets curr$
        \IF{$item < curr.key$}
            \STATE $curr \gets curr.left$ \COMMENT{recur on left subtree}
        \ELSIF{$item > curr.key$}
            \STATE $curr \gets curr.right$ \COMMENT{recur on right subtree}
        \ELSIF{$item = curr.key$}
            \RETURN $curr$ \COMMENT{item exists}
        \ENDIF
    \ENDWHILE

    \STATE $curr \gets$ Node$(item)$ \COMMENT{item does not exist}
    \IF{$parent \ne null$}
        \IF{$item < parent.key$}
            \STATE $parent.left \gets curr$
        \ENDIF
        \IF{$item > parent.key$}
            \STATE $parent.right \gets curr$
        \ENDIF
    \ENDIF

    \RETURN $curr$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}

Add: Analysis

Add analysis 1

$\text{Time} = O(h) = O(n)$

Remove 32: Node 32 has one child

Remove 32 node 32 has one child 1

Remove 32 node 32 has one child 2

Remove 88: Node 88 has two children

Remove 88 node 88 has two children 1

Remove 88 node 88 has two children 2

Remove

Removing a node (with a particular key) has four cases:

  1. Node is not found.
    Do nothing.
  2. Node is found and it has 0 nonempty children.
    Remove the node.
  3. Node is found and it has 1 nonempty child.
    Remove the node.
    Its nonempty child will take the location of the node.
  4. Node is found and it has 2 nonempty children.
    Locate the predecessor of the node.
    Predecessor = curr.left.right.right........right
    Predecessor will take the location of the node.
    Predecessor's left child will take the location of the predecessor.
    (Can we use successor instead of predecessor?)

Remove: Recursive algorithm

\begin{algorithm}
\begin{algorithmic}
\FUNCTION{Remove}{$curr, item$}
    \IF{$curr = null$}
        \STATE do nothing \COMMENT{item does not exist}
    \ELSIF{$item < curr.key$}
        \STATE $curr.left \gets$ \CALL{Remove}{$curr.left, item$} \COMMENT{recur on left}
    \ELSIF{$item > curr.key$}
        \STATE $curr.right \gets$ \CALL{Remove}{$curr.right, item$} \COMMENT{recur on right}
    \ELSE
        \IF{$curr.left = null$}
            \STATE $curr \gets curr.right$ \COMMENT{0 or 1 child}
        \ELSIF{$curr.right = null$}
            \STATE $curr \gets curr.left$ \COMMENT{1 child}
        \ELSE
            \STATE $curr.key \gets$ \CALL{Maximum}{$curr.left$}.key \COMMENT{find predecessor}
            \STATE $curr.left \gets$ \CALL{Remove}{$curr.left, curr.key$} \COMMENT{remove predecessor}
        \ENDIF
    \ENDIF

    \RETURN $curr$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}

Remove: Non-recursive algorithm

How do you write a non-recursive algorithm for removing an item?

Remove: Analysis

Remove analysis 1

$\text{Runtime} = O(h) = O(n)$

Balanced Search Trees

Balanced search trees: Motivation

Data structure Search Add Remove
Binary search tree $O(n)$ $O(n)$ $O(n)$
Balanced search tree $O(\log n)$ $O(\log n)$ $O(\log n)$

(2,4)-trees

(2,4)-trees 1

There are three types of non-empty nodes:

Search: 24 exists

Search 24 exists 1

Search: 12 does not exist

Search 12 does not exist 1

Add 17

Add 17 1

Size and depth properties are satisfied.

Add 17 2

Overflow: Size property is violated at [13 14 15 17].

Add 17 3

Size property at [13 14 15 17] will be fixed via split operation.

Add 17 4

Overflow: Size property is violated at [5 10 12 15].

Add 17 5

Size property at [5 10 12 15] will be fixed via split operation.

Add 17 6

Size and depth properties are satisfied.

Add: Node split

Add node split 1 Add node split 2 Add node split 3

Add 4, 6, 12, 15

Add 4 6 12 15 1 Add 4 6 12 15 2 Add 4 6 12 15 3
Add 4 6 12 15 4 Add 4 6 12 15 5

Add 3, 5

Add 3 5 1 Add 3 5 2
Add 3 5 3 Add 3 5 4

Add 10, 8

Add 10 8 1
Add 10 8 2 Add 10 8 3

Remove 4

Remove 4 1

Underflow: Size property is violated is [4].

Remove 4 2

Size property will be fixed via transfer operation.

Remove 12

Remove 12 1

Remove 12 2

Underflow: Size property is violated is [12], which has non-empty children. It will be fixed via swap with predecessor.

Underflow: Size property is violated is [11].

Remove 12 3

Size property will be fixed via fusion operation.

Remove 12 4

Remove 13

Remove 13 1

Remove 13 2

Remove

$n_e =$ node with empty children

$n_{\ne e} =$ node with non-empty children

$s_{3,4} =$ immediate sibling of $n_e$ is a 3-node or a 4-node

$s_2 =$ immediate sibling of $n_e$ is a 2-node

$p =$ parent of $n_e$

(2,4)-trees: Complexity

Method Running time
Search $O(\log n)$
Add $O(\log n)$
Remove $O(\log n)$

B Trees

Computer memory

Computer memory 1

Cache-efficient algorithms: Example

How do you efficiently sort a 1 GB file of natural numbers

Do you want to use quicksort or merge sort, usually implemented in a standard library's sorting algorithm? Your computer program might still take hours to run. Reason? Your algorithm is computation-efficient but not communication-efficient and communication is more expensive than computation.

Reducing communication (via good use of cache) leads to reduced running time. An algorithm that makes good use of cache is called cache-efficient. A cache-efficient sorting algorithm might take just a few minutes to sort a 1 GB file of numbers.

Example: External-memory merge sort.

Cache data locality

An algorithm must have the following two features in order to make good use of cache.

  1. Spatial data locality
  2. Temporal data locality

Spatial data locality:

Temporal data locality:

Cache complexity

Cache complexity diagram 1

Cache-efficient algorithms

Problem Cache-inefficient algorithm Cache-efficient algorithm
Sorting Merge sort
$O(n \log n)$
Ext-memory merge sort
$O(\frac{n}{B}\log_{\frac{M}{B}}\frac{n}{B})$
Balanced tree (2,4)-tree
$O(\log n)$
B tree
$O(\log_B n)$
Matrix product Iterative
$O(n^3)$
Recursive D&C
$O(\frac{n^3}{B\sqrt{M}})$

$(a, b)$-trees

  1. $2 \le a \le (b+1)/2$
  2. Size property. Every non-empty node has children in the range $[a, b]$.
  3. Depth property. All empty nodes have the same depth.

B trees

B trees 1

B trees: Complexity

(2,4)-tree B tree
Method Communication Computation Communication Computation
Search $O(\log n)$ $O(\log n)$ $O(\log_B n)$ $O(\log n)$
Add $O(\log n)$ $O(\log n)$ $O(\log_B n)$ $O(\log n)$
Remove $O(\log n)$ $O(\log n)$ $O(\log_B n)$ $O(\log n)$

Applications

B trees (and variants such as B+ trees, B* trees, B# trees) are used for file systems and databases.