Dictionary ADT represents a collection of items, where each item can be a key or a (key, value) pair.
| ADT | Item | Ordered? | Duplicates? | Implementation |
|---|---|---|---|---|
| Set | key | ✗ | ✗ | Hash table |
| Sorted set | key | ✓ | ✗ | Balanced tree |
| Multiset | key | ✗ | ✓ | Hash table |
| Sorted multiset | key | ✓ | ✓ | Balanced tree |
| Map | (key, value) | ✗ | ✗ | Hash table |
| Sorted map | (key, value) | ✓ | ✗ | Balanced tree |
| Multimap | (key, value) | ✗ | ✓ | Hash table |
| Sorted multimap | (key, value) | ✓ | ✓ | Balanced tree |
A map is a collection of key-value pairs (k, v), where keys are unique.
| Key | Value |
|---|---|
| Dictionary word | Word meaning |
| User ID | User record |
| Employee ID | Employee record |
| Student ID | Student record |
| Patient ID | Patient record |
| Profile ID | Person details |
| Order ID | Order details |
| Transaction ID | Transaction details |
| URL | Web page |
| Full file name | File |
| Method | Functionality |
|---|---|
| add(e) | Adds the element e to S if it is not already present. |
| remove(e) | Removes the element e from S if it is present. |
| contains(e) | Returns whether e is an element of S. |
| iterator() | Returns an iterator of the elements of S. |
| addAll(T) | Updates S to also include all elements of set T, effectively replacing S by S ∪ T. |
| retainAll(T) | Updates S so that it only keeps those elements that are also elements of set T, effectively replacing S by S ∩ T. |
| removeAll(T) | Updates S by removing any of its elements that also occur in set T, effectively replacing S by S − T. |
| Method | Functionality |
|---|---|
| first() | Returns the smallest element in S. |
| last() | Returns the largest element in S. |
| ceiling(e) | Returns the smallest element ≥ e. |
| floor(e) | Returns the largest element ≤ e. |
| lower(e) | Returns the largest element < e. |
| higher(e) | Returns the smallest element > e. |
| subSet(e1,e2) | Returns an iteration of all elements greater than or equal to e1, but strictly less than e2. |
| pollFirst() | Returns and removes the smallest element in S. |
| pollLast() | Returns and removes the largest element in S. |
| Method | Functionality |
|---|---|
| add(e) | Adds a single occurrence of e to the multiset. |
| contains(e) | Returns true if the multiset contains an element equal to e. |
| count(e) | Returns the number of occurrences of e in the multiset. |
| remove(e) | Removes a single occurrence of e from the multiset. |
| remove(e, n) | Removes n occurrences of e from the multiset. |
| size() | Returns the number of elements of the multiset (including duplicates). |
| iterator() | Returns an iteration of all elements of the multiset (repeating those with multiplicity greater than one). |
| Worst case | Average case | |||||
|---|---|---|---|---|---|---|
| Data structure | Search | Insert | Delete | Search | Insert | Delete |
| Sorted array | $\Theta(\log n)$ | $\Theta(n)$ | $\Theta(n)$ | $\Theta(\log n)$ | $\Theta(n)$ | $\Theta(n)$ |
| Unsorted list | $\Theta(n)$ | $\Theta(1)$ | $\Theta(n)$ | $\Theta(n)$ | $\Theta(1)$ | $\Theta(n)$ |
| Hashing | $\Theta(n)$ | $\Theta(n)$ | $\Theta(n)$ | $\Theta(1)^*$ | $\Theta(1)^*$ | $\Theta(1)^*$ |
| BST | $\Theta(n)$ | $\Theta(n)$ | $\Theta(n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ |
| Splay tree | $\Theta(\log n)^*$ | $\Theta(\log n)^*$ | $\Theta(\log n)^*$ | $\Theta(\log n)^*$ | $\Theta(\log n)^*$ | $\Theta(\log n)^*$ |
| Scapegoat tree | $\Theta(\log n)$ | $\Theta(\log n)^*$ | $\Theta(\log n)^*$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ |
| AVL tree | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ |
| Red-black tree | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ |
| AA tree | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ |
| (a,b)-tree | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ |
| B-tree | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ | $\Theta(\log n)$ |
* = Amortized
| Term | Meaning |
|---|---|
| Tree | ADT that stores elements hierarchically |
| Parent node | Immediate previous-level node |
| Child nodes | Immediate next-level nodes |
| Root node | Top node of the tree |
| Sibling nodes | Nodes that are children of the same parent |
| External nodes | Nodes without children |
| Internal nodes | Nodes with one or more children |
| Ancestor node | Parent node or ancestor of parent node |
| Descendent node | Child node or descendent of child node |
| Subtree | Tree consisting of the node and its descendants |
| Edge | Pair of nodes denoting a parent-child relation |
| Path | Pair of nodes denoting an ancestor-descendant relation |
| Ordered tree | Tree with a meaningful linear order among child nodes |
A binary tree is an ordered tree with the following properties:
A recursive definition of the binary tree:
Tree represents $((((3+1)*3)/((9-5)+2))-((3*(7-4))+6))$.
| Term | Meaning |
|---|---|
| Left subtree | Subtree rooted at the left child of an internal node |
| Right subtree | Subtree rooted at the right child of an internal node |
| Proper/full tree | A tree in which every node has either 0 or 2 children |
| Complete tree | Tree in which all except possibly the last level is completely filled and the nodes in the last level are as far left as possible |
| Perfect tree | Complete tree in which the last level is completely filled |
Binary ✓, Proper ✗, Complete ✗, Perfect ✗
Binary ✓, Proper ✓, Complete ✗, Perfect ✗
Binary ✓, Proper ✓, Complete ✗, Perfect ✗
Binary ✓, Proper ✗, Complete ✓, Perfect ✗
Binary ✓, Proper ✓, Complete ✓, Perfect ✗
Binary ✓, Proper ✓, Complete ✓, Perfect ✓
Let
Then
If $T$ is a proper nonempty binary tree,
$$ f(p) = \begin{cases} 0 & \text{if $p$ is the root} \\ 2f(q) + 1 & \text{if $p$ is the left child of position $q$} \\ 2f(q) + 2 & \text{if $p$ is the right child of position $q$} \end{cases} $$
| Traversal | Binary tree? | General tree? |
|---|---|---|
| Preorder traversal | ✓ | ✓ |
| Inorder traversal | ✓ | ✗ |
| Postorder traversal | ✓ | ✓ |
| Breadth-first traversal | ✓ | ✓ |
\begin{algorithm}
\begin{algorithmic}
\FUNCTION{PreorderTraversal}{$root$}
\IF{$root \ne null$}
\STATE \CALL{Visit}{$root$}
\STATE \CALL{PreorderTraversal}{$root.left$}
\STATE \CALL{PreorderTraversal}{$root.right$}
\ENDIF
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
\begin{algorithm}
\begin{algorithmic}
\FUNCTION{InorderTraversal}{$root$}
\IF{$root \ne null$}
\STATE \CALL{InorderTraversal}{$root.left$}
\STATE \CALL{Visit}{$root$}
\STATE \CALL{InorderTraversal}{$root.right$}
\ENDIF
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
\begin{algorithm}
\begin{algorithmic}
\FUNCTION{PostorderTraversal}{$root$}
\IF{$root \ne null$}
\STATE \CALL{PostorderTraversal}{$root.left$}
\STATE \CALL{PostorderTraversal}{$root.right$}
\STATE \CALL{Visit}{$root$}
\ENDIF
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
\begin{algorithm}
\begin{algorithmic}
\FUNCTION{ComputeDiskSpace}{$root$}
\STATE $space \gets root.key$
\FORALL{child $child$ of $root$ node}
\STATE $space \gets space +$ \CALL{ComputeDiskSpace}{$root.child$}
\ENDFOR
\RETURN $space$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
$(((3+1) \times 3) / ((9-5) + 2)) - ((3 \times (7-4)) + 6)$
General tree.
\begin{algorithm}
\begin{algorithmic}
\FUNCTION{BreadthFirstTraversal}{$root$}
\STATE $Q$.Enqueue($root$)
\WHILE{$Q$ is not empty}
\STATE $curr \gets Q$.Dequeue$()$
\STATE \CALL{Visit}{$curr$}
\FORALL{child $child$ of $curr$ node}
\STATE $Q$.Enqueue$(curr.child)$
\ENDFOR
\ENDWHILE
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
Binary tree.
\begin{algorithm}
\begin{algorithmic}
\FUNCTION{BreadthFirstTraversal}{$root$}
\STATE $Q$.Enqueue($root$)
\WHILE{$Q$ is not empty}
\STATE $curr \gets Q$.Dequeue$()$
\STATE \CALL{Visit}{$curr$}
\IF{$curr.left \ne null$}
\STATE $Q$.Enqueue$(curr.left)$
\ENDIF
\IF{$curr.right \ne null$}
\STATE $Q$.Enqueue$(curr.right)$
\ENDIF
\ENDWHILE
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
A binary search tree is a binary tree $T$ such that, for each internal node $p$ of $T$:
class Node<T>
{
T key;
Node<T> left;
Node<T> right;
Node(T item, Node<T> lchild, Node<T> rchild)
{ key = item; left = lchild; right = rchild; }
Node(T item)
{ this(item, null, null); }
}
\begin{algorithm}
\begin{algorithmic}
\FUNCTION{Search}{$curr, target$}
\IF{$curr = null$}
\RETURN $curr$ \COMMENT{unsuccessful search}
\ENDIF
\IF{$target < curr.key$}
\RETURN \CALL{Search}{$curr.left, target$} \COMMENT{recur on left subtree}
\ENDIF
\IF{$target > curr.key$}
\RETURN \CALL{Search}{$curr.right, target$} \COMMENT{recur on right subtree}
\ENDIF
\IF{$target = curr.key$}
\RETURN $curr$ \COMMENT{successful search}
\ENDIF
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
\begin{algorithm}
\begin{algorithmic}
\FUNCTION{Search}{$curr, target$}
\WHILE{$curr \ne null$}
\IF{$target < curr.key$}
\STATE $curr \gets curr.left$ \COMMENT{recur on left subtree}
\ELSIF{$target > curr.key$}
\STATE $curr \gets curr.right$ \COMMENT{recur on right subtree}
\ELSIF{$target = curr.key$}
\RETURN $curr$ \COMMENT{successful search}
\ENDIF
\ENDWHILE
\RETURN $null$ \COMMENT{unsuccessful search}
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
$\text{Time} = O(h) = O(n)$
\begin{algorithm}
\begin{algorithmic}
\FUNCTION{Add}{$curr, item$}
\IF{$curr = null$}
\STATE $curr \gets$ Node$(item)$ \COMMENT{item does not exist}
\ELSIF{$curr \ne null$}
\IF{$item < curr.key$}
\STATE $curr.left \gets$ \CALL{Add}{$curr.left, item$} \COMMENT{recur on left subtree}
\ELSIF{$item > curr.key$}
\STATE $curr.right \gets$ \CALL{Add}{$curr.right, item$} \COMMENT{recur on right subtree}
\ELSIF{$item = curr.key$}
\STATE do nothing \COMMENT{item exists}
\ENDIF
\ENDIF
\RETURN $curr$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
\begin{algorithm}
\begin{algorithmic}
\FUNCTION{Add}{$curr, item$}
\STATE $parent \gets null$
\WHILE{$curr \ne null$}
\STATE $parent \gets curr$
\IF{$item < curr.key$}
\STATE $curr \gets curr.left$ \COMMENT{recur on left subtree}
\ELSIF{$item > curr.key$}
\STATE $curr \gets curr.right$ \COMMENT{recur on right subtree}
\ELSIF{$item = curr.key$}
\RETURN $curr$ \COMMENT{item exists}
\ENDIF
\ENDWHILE
\STATE $curr \gets$ Node$(item)$ \COMMENT{item does not exist}
\IF{$parent \ne null$}
\IF{$item < parent.key$}
\STATE $parent.left \gets curr$
\ENDIF
\IF{$item > parent.key$}
\STATE $parent.right \gets curr$
\ENDIF
\ENDIF
\RETURN $curr$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
$\text{Time} = O(h) = O(n)$
Removing a node (with a particular key) has four cases:
\begin{algorithm}
\begin{algorithmic}
\FUNCTION{Remove}{$curr, item$}
\IF{$curr = null$}
\STATE do nothing \COMMENT{item does not exist}
\ELSIF{$item < curr.key$}
\STATE $curr.left \gets$ \CALL{Remove}{$curr.left, item$} \COMMENT{recur on left}
\ELSIF{$item > curr.key$}
\STATE $curr.right \gets$ \CALL{Remove}{$curr.right, item$} \COMMENT{recur on right}
\ELSE
\IF{$curr.left = null$}
\STATE $curr \gets curr.right$ \COMMENT{0 or 1 child}
\ELSIF{$curr.right = null$}
\STATE $curr \gets curr.left$ \COMMENT{1 child}
\ELSE
\STATE $curr.key \gets$ \CALL{Maximum}{$curr.left$}.key \COMMENT{find predecessor}
\STATE $curr.left \gets$ \CALL{Remove}{$curr.left, curr.key$} \COMMENT{remove predecessor}
\ENDIF
\ENDIF
\RETURN $curr$
\ENDFUNCTION
\end{algorithmic}
\end{algorithm}
$\text{Runtime} = O(h) = O(n)$
| Data structure | Search | Add | Remove |
|---|---|---|---|
| Binary search tree | $O(n)$ | $O(n)$ | $O(n)$ |
| Balanced search tree | $O(\log n)$ | $O(\log n)$ | $O(\log n)$ |
There are three types of non-empty nodes:
Size and depth properties are satisfied.
Overflow: Size property is violated at [13 14 15 17].
Size property at [13 14 15 17] will be fixed via split operation.
Overflow: Size property is violated at [5 10 12 15].
Size property at [5 10 12 15] will be fixed via split operation.
Size and depth properties are satisfied.
Underflow: Size property is violated is [4].
Size property will be fixed via transfer operation.
Underflow: Size property is violated is [12], which has non-empty children. It will be fixed via swap with predecessor.
Underflow: Size property is violated is [11].
Size property will be fixed via fusion operation.
$n_e =$ node with empty children
$n_{\ne e} =$ node with non-empty children
$s_{3,4} =$ immediate sibling of $n_e$ is a 3-node or a 4-node
$s_2 =$ immediate sibling of $n_e$ is a 2-node
$p =$ parent of $n_e$
| Method | Running time |
|---|---|
| Search | $O(\log n)$ |
| Add | $O(\log n)$ |
| Remove | $O(\log n)$ |
Reducing communication (via good use of cache) leads to reduced running time. An algorithm that makes good use of cache is called cache-efficient. A cache-efficient sorting algorithm might take just a few minutes to sort a 1 GB file of numbers.
Example: External-memory merge sort.An algorithm must have the following two features in order to make good use of cache.
Spatial data locality:
Temporal data locality:
| Problem | Cache-inefficient algorithm | Cache-efficient algorithm |
|---|---|---|
| Sorting | Merge sort $O(n \log n)$ |
Ext-memory merge sort $O(\frac{n}{B}\log_{\frac{M}{B}}\frac{n}{B})$ |
| Balanced tree | (2,4)-tree $O(\log n)$ |
B tree $O(\log_B n)$ |
| Matrix product | Iterative $O(n^3)$ |
Recursive D&C $O(\frac{n^3}{B\sqrt{M}})$ |
| (2,4)-tree | B tree | |||
|---|---|---|---|---|
| Method | Communication | Computation | Communication | Computation |
| Search | $O(\log n)$ | $O(\log n)$ | $O(\log_B n)$ | $O(\log n)$ |
| Add | $O(\log n)$ | $O(\log n)$ | $O(\log_B n)$ | $O(\log n)$ |
| Remove | $O(\log n)$ | $O(\log n)$ | $O(\log_B n)$ | $O(\log n)$ |
B trees (and variants such as B+ trees, B* trees, B# trees) are used for file systems and databases.