Next: Genome Examples Up: Tabling and Datalog Programming Previous: Other Datalog Examples

# Some Simple Graph Problems

Consider the problem of finding connected components in a directed graph. Assume we have a node and we want to find all the nodes that are in the same connected component as the given node.

The first thought that comes to mind is: given a node X, find those nodes Y that are reachable from X and from which you can get back to X. So we will assume that edges are given by an `edge/2` relation:

```sameSCC(X,Y) :- reach(X,Z), reach(Z,Y).

:- table reach/2.
reach(X,X).
reach(X,Y) :- reach(X,Z), edge(Z,Y).
```

Indeed given a node X, this will find all nodes in the same strongly connected component as X, however it will in general take O(n*e) time, where n is the number of nodes and e is the number of edges. The reason is that given an X, there are n possible Z values and for each of them, we will find everything reachable from them, and each search can take O(e) time.

However, we can do better. It is known that this problem can be solved in O(e) time. The idea is, given a node X, to find all nodes reachable from X following edges forward. Then find all nodes reachable from X following edges backward (i.e., follow edges against the arrow.) Then intersect the two sets. That will be the set of nodes in X's SCC, because if Y is in both these sets, you can follow the edges forward from X to Y and then since there is also a backwards path from X to Y, there is forward path from Y to X, so you can get from X to Y and back to X following edges forward. So the program that does this is:

```% sameSCC(+X,-Y)
sameSCC(X,Y) :- reachfor(X,Y), reachback(X,Y).

:- table reachfor/2, reachback/2.
reachfor(X,X).
reachfor(X,Y) :- reachfor(X,Z),edge(Z,Y).

reachback(X,X).
reachback(X,Y) :- reachback(X,Z),edge(Y,Z).
```

Let's now consider its complexity to see why it is O(e). For a fixed value X, the computation of the query `reachfor(X,Y)` takes O(e) time. Then we may have O(n) calls to `reachback(X,Y)` (one for each Y) but they all use one underlying call to `reachback(X,_)` which takes O(e) and is done only once. So when we add all that up (assuming a connected graph), we get O(e) time.

Next: Genome Examples Up: Tabling and Datalog Programming Previous: Other Datalog Examples
David S. Warren
1999-07-31