The worst-case complexity of a Datalog program (with every predicate
tabled) is:
We can use folding to try to improve the worst-case efficiency of a Datalog program. Consider the query:
(7) :- p(A,B,C,D),q(B,F,G,A),r(A,C,F,D),s(D,G,A,E),t(A,D,F,G).
It has 7 variables (as indicated by the number in parentheses that precedes the query), so its worst-case efficiency is O(n7). However, we can fold the first two subgoals by introducing a new predicate, obtaining the following program:
(6) :- f1(A,C,D,F,G),r(A,C,F,D),s(D,G,A,E),t(A,D,F,G). (6) f1(A,C,D,F,G) :- p(A,B,C,D),q(B,F,G,A).
This one has a maximum of 6 variables in the query or in the right-hand-side of any rule, and so has a worst-case complexity of O(n6).
We can do a couple of more folding operations as follows:
(5) :- f2(A,D,F,G),s(D,G,A,E),t(A,D,F,G). (5) f2(A,D,F,G) :- f1(A,C,D,F,G),r(A,C,F,D). (6) f1(A,C,D,F,G) :- p(A,B,C,D),q(B,F,G,A). (4) :- f2(A,D,F,G),f3(D,G,A),t(A,D,F,G). (4) f3(D,G,A) :- s(D,G,A,E). (5) f2(A,D,F,G) :- f1(A,C,D,F,G),r(A,C,F,D). (6) f1(A,C,D,F,G) :- p(A,B,C,D),q(B,F,G,A).
Thus far, we have maintained the order of the subgoals. If we allow re-ordering, we could do the following. For each variable, find all the variables that appear in some subgoal that it appears in. Choose the variable so associated with the fewest number of other variables. Factor those subgoals, which removes that variable (at least). Continue until all variables have the same number of associated variables.
Let's apply this algorithm to the initial query above. First we give each variable and the variables that appear in subgoals it appears in.
A:BCDEFG B:ACDFG C:ABDF D:BCDEFG E:ADG F:ABGCD G:ABFDE
Now E is the variable associated with the fewest number of other variables, so we fold all the literals (here only one) containing E, and obtain the program:
(6) :- p(A,B,C,D),q(B,F,G,A),r(A,C,F,D),f1(D,G,A),t(A,D,F,G). (4) f1(D,G,A) :- s(D,G,A,E).
Now computing the new associated variables for the first clause, and then choosing to eliminate C, we get:
A:BCDFG B:ACDFG C:ABDF D:ABCFG F:ABGCD G:ABFD (5) :- f2(A,B,D,F),q(B,F,G,A),f1(D,G,A),t(A,D,F,G). (4) f1(D,G,A) :- s(D,G,A,E). (5) f2(A,B,D,F) :- p(A,B,C,D),r(A,C,F,D).
Now computing the associated variables for the query, we get:
a:bdfg b:adfg d:abfg f:abdg g:abfd
All variables are associated with all other variables, so no factoring can help the worst-case complexity, and the complexity is O(k5).
However, there is still some factoring that will eliminate variables, and so might improve some queries, even though it doesn't guarantee to reduce the worst-case complexity.
(4) :- f3(A,D,F,G),f1(D,G,A),t(A,D,F,G). (5) f3(A,D,F,G) :- f2(A,B,D,F),q(B,F,G,A). (4) f1(D,G,A) :- s(D,G,A,E). (5) f2(A,B,D,F) :- p(A,B,C,D),r(A,C,F,D), (3) :- f4(A,D,G),f1(D,G,A). (4) f4(A,D,G) :- f3(A,D,F,G),t(A,D,F,G). (5) f3(A,D,F,G) :- f2(A,B,D,F),q(B,F,G,A). (4) f1(D,G,A) :- s(D,G,A,E). (5) f2(A,B,D,F) :- p(A,B,C,D),r(A,C,F,D).
The general problem of finding an optimal factoring is conjectured to be NP hard. (Steve Skiena has the sketch of a proof.)