On the Complexity of Tabled Datalog Programs

Next: Datalog Optimization in XSB Up: Inferring When to Table Previous: Inferring When to Table

On the Complexity of Tabled Datalog Programs

The worst-case complexity of a Datalog program (with every predicate tabled) is:

$\begin{displaymath}\sum_{clause} (len(clause) + k^{num\_of\_vars(body(clause))})\end{displaymath}$

where k is the number of constants in the Herbrand base (i.e., in the program). One can see how this can be achieved by making all base relations to be cross products of the set of constants in the program. Assume the call is completely open. Then if there are v₁ variables in the first subgoal, there will be k^v₁ tuples. Each of theses tuples will be extended through the second subgoal, and consider how many tuples from the second subgoal there can be: k^v₂ where v₂ is the number of variables appearing in the second subgoal and not appearing in the first. So to get through the second subgoal will take time k^v₁*k^v₂. And similarly through the entire body of the clause. Each subgoal multiplies by a factor k^v where v is the number of new variables. And every variable in the body of the clause is new once and only once. This is the reason for the second component in the summation above. The first component is just in case there are no variables in the clause. For an entire program one can see that the complexity (for a nonpropositional) datalog program is O(k^v) where v is the maximum number of variables in the body of any clause.

We can use folding to try to improve the worst-case efficiency of a Datalog program. Consider the query:

(7) :- p(A,B,C,D),q(B,F,G,A),r(A,C,F,D),s(D,G,A,E),t(A,D,F,G).

It has 7 variables (as indicated by the number in parentheses that precedes the query), so its worst-case efficiency is O(n⁷). However, we can fold the first two subgoals by introducing a new predicate, obtaining the following program:

(6) :- f1(A,C,D,F,G),r(A,C,F,D),s(D,G,A,E),t(A,D,F,G).
(6) f1(A,C,D,F,G) :- p(A,B,C,D),q(B,F,G,A).

This one has a maximum of 6 variables in the query or in the right-hand-side of any rule, and so has a worst-case complexity of O(n⁶).

We can do a couple of more folding operations as follows:

(5) :- f2(A,D,F,G),s(D,G,A,E),t(A,D,F,G).
(5) f2(A,D,F,G) :- f1(A,C,D,F,G),r(A,C,F,D).
(6) f1(A,C,D,F,G) :- p(A,B,C,D),q(B,F,G,A).

(4) :- f2(A,D,F,G),f3(D,G,A),t(A,D,F,G).
(4) f3(D,G,A) :- s(D,G,A,E).
(5) f2(A,D,F,G) :- f1(A,C,D,F,G),r(A,C,F,D).
(6) f1(A,C,D,F,G) :- p(A,B,C,D),q(B,F,G,A).

Thus far, we have maintained the order of the subgoals. If we allow re-ordering, we could do the following. For each variable, find all the variables that appear in some subgoal that it appears in. Choose the variable so associated with the fewest number of other variables. Factor those subgoals, which removes that variable (at least). Continue until all variables have the same number of associated variables.

Let's apply this algorithm to the initial query above. First we give each variable and the variables that appear in subgoals it appears in.

A:BCDEFG
B:ACDFG
C:ABDF
D:BCDEFG
E:ADG
F:ABGCD
G:ABFDE

Now E is the variable associated with the fewest number of other variables, so we fold all the literals (here only one) containing E, and obtain the program:

(6) :- p(A,B,C,D),q(B,F,G,A),r(A,C,F,D),f1(D,G,A),t(A,D,F,G).
(4) f1(D,G,A) :- s(D,G,A,E).

Now computing the new associated variables for the first clause, and then choosing to eliminate C, we get:

A:BCDFG
B:ACDFG
C:ABDF
D:ABCFG
F:ABGCD
G:ABFD

(5) :- f2(A,B,D,F),q(B,F,G,A),f1(D,G,A),t(A,D,F,G).
(4) f1(D,G,A) :- s(D,G,A,E).
(5) f2(A,B,D,F) :- p(A,B,C,D),r(A,C,F,D).

Now computing the associated variables for the query, we get:

a:bdfg
b:adfg
d:abfg
f:abdg
g:abfd

All variables are associated with all other variables, so no factoring can help the worst-case complexity, and the complexity is O(k⁵).

However, there is still some factoring that will eliminate variables, and so might improve some queries, even though it doesn't guarantee to reduce the worst-case complexity.

(4) :- f3(A,D,F,G),f1(D,G,A),t(A,D,F,G).
(5) f3(A,D,F,G) :- f2(A,B,D,F),q(B,F,G,A).
(4) f1(D,G,A) :- s(D,G,A,E).
(5) f2(A,B,D,F) :- p(A,B,C,D),r(A,C,F,D),

(3) :- f4(A,D,G),f1(D,G,A).
(4) f4(A,D,G) :- f3(A,D,F,G),t(A,D,F,G).
(5) f3(A,D,F,G) :- f2(A,B,D,F),q(B,F,G,A).
(4) f1(D,G,A) :- s(D,G,A,E).
(5) f2(A,B,D,F) :- p(A,B,C,D),r(A,C,F,D).

The general problem of finding an optimal factoring is conjectured to be NP hard. (Steve Skiena has the sketch of a proof.)

Next: Datalog Optimization in XSB Up: Inferring When to Table Previous: Inferring When to Table

David S. Warren
1999-07-31