Research Overview

I work primarily on logic programming and formal methods. The main theme of my work has been to
  1. Formulate analysis and verification problems as query evaluation or inference over logic programs;
  2. Develop novel inference techniques to expand the scope of analysis problems that can be posed as query evaluation;
  3. Develop evaluation techniques so that the complexity and efficiency of query-based formulation matches those of specialized algorithms for the same analysis.
On the formal methods side, this approach has been remarkably fruitful for a number of problems ranging from traditional static analysis of programs, to verification of complex properties of concurrent systems, analysis of security policies, and analysis of properties of probabilistic systems. On the logic programming side, this approach has produced efficient techniques for incremental evaluation, handling constraint solving specifications, and inference and learning over logical and probabilistic knowledge. The salient aspect of this approach is that the improvements in the inference and evaluation techniques can be applied in general to other knowledge-based applications as well.

Most work described below were parts of Ph.D. theses of my students. The following is a brief summary of the significant developments described in detail in our publications.

Formal Methods

Model Checking

Our work on posing model checking as query evaluation over logic programs started with a model checker for the alternation free modal mu-calculus for verifying properties of processes written in CCS [RRR+97]. Since then, we have used this approach productively to consider the model checking and equivalence checking problems for a richer classes of systems and alternative ways of formulating the verification problems. All along the way, our approach has been characterized by its focus on a high-level logical encoding of the semantics of process algebras and temporal logics, followed by logic-based optimizations and/or generalizations of inference algorithms. Below is a short overview of our work on model checking, starting from the most recent.

Probabilistic Systems

In this work, we sought to leverage on recent efforts in the Statistical Relational Learning community on performing inference over a combination of probabilistic and logical knowledge. We posed the problem of model checking probabilistic systems as one of query evaluation over probabilistic logic programs. Our formulation covers a rich class of probabilistic models (discrete-time probabilistic models including recursive Markov chains) and probabilistic temporal logics (including PCTL and GPL, which subsumes PCTL*). However, the existing query evaluation algorithms for probabilistic logic-programming systems handle only queries with a finite number of explanations. In our formulation, explanations correspond to executions of the system model being checked, which may be infinite in general. Hence existing evaluation algorithms are insufficient. We developed a more general inference algorithm that uses finite generative structures (similar to automata) to represent families of explanations. The inference algorithm computes the probability of a possibly infinite set of explanations directly from the finite generative structure. We have implemented our inference algorithm in XSB Prolog, and use this implementation to answer queries over our formulation of model checkers for a variety of probabilistic temporal logics. The complexity of the resulting model checker is same as those specially devised for the temporal logics, and the implementation shows comparable performance as well [GRS12].

We are currently working on expanding the class of models handled by our formulation and on the formulation of equivalence checking over such models.

Mobile Ad-Hoc Networks

We developed a process algebra, called the omega-calculus, for formally modeling and reasoning about mobile ad hoc network (MANET) protocols. The omega-calculus inherits many constructs of the pi-calculus, and include a novel local broadcast construct that enables a MANET node to broadcast a message to any other node within its physical transmission range. Nodes can also move in and out of the transmission range of other nodes in the network. The omega-calculus uses the notion of groups to model local broadcast-based communication, and separates the description of the actions of a protocol (called processes) from that of the network topology (specified by sets of groups, called interfaces of processes). As a result, the problems of verifying reachability properties and bisimilarity are decidable for a large class of omega calculus specifications (even in the presence of arbitrary node movement).

We developed a symbolic algorithm for verifying ad hoc network protocols for unknown network topologies [SRS10]. Ad hoc network protocols pose a unique verification problem called instance explosion because an ad hoc network, with a fixed number of nodes, can assume exponential number of topologies. We developed a symbolic verification technique to combat this problem [SRS09]. Furthermore, we developed a parameterized verification technique for verifying ad hoc network protocols with arbitrary number of nodes.

Pi Calculus

We developed MMC, a model checker for verifying mobile systems. The properties are specified in the pi-mu-calculus (which extends the alternation-free modal mu-calculus with constructs to deal with pi-calculus-like names); and processes are specified in finite-control fragment of the pi-calculus. The key aspect of this work is the formulation of the model checker as query evaluation, mapping pi-calculus names to logical variables. This leverages the standard renaming done as a part of resolution for renaming pi-calculus names as needed [YRS03]. This elegant (and, in hindsight, straightforward) mapping allowed us to construct MMC such that the model checker showed little performance overhead relative to XMC, our earlier model checker for value-passing CCS. We subsequently developed a compilation technique for making the model checking of pi-calculus systems more efficient, and to exploit symmetry in the process structure using associative-commutative matching; this work won the most practical paper award at the 2005 Practical Applications of Declarative Programming conference [YDRS05]. We also developed a technique based on partial model checking for the verification of parameterized families of finite-control pi-calculus processes [YBR06].

Parameterized Systems

Model checking techniques have been traditionally applied finite-state systems, or to those systems whose behavior can be abstracted by a finite-state system. One of the challenging problems is to deploy similar techniques for the verification of parameterized families of finite state systems. While each member of the family has finitely many states, the family itself is unbounded. An instance of a parameterized system is a specification of a locking protocol with m objects and n threads. For each m and n we get a finite state system, but the family has countably many such systems. The problem here is to verify properties for the entire family of such systems, and not just for specific instances.

We have investigated two ways of tackling this problem. The most recent approach is to use partial model checking techniques for generating a sequence of formulae, and computing the limit of this sequence. More precisely, consider a parameterized system with n instances of process P. To verify if Pn satisfies formula φ, we first compute the quotient of φ with respect to P, which is a formula φ' such that, for all E, P | E satisfies φ if, and only if, E satisfies φ'. The quotient is computed by a partial model checker. For parameterized verification, we compute the sequence φ0 = φ, φ1, φ2, ..., such that, for each i, φi+1 is the quotient of φi with respect to P. If this sequence converges, say after i = k, and the zero process satisfies the limit, then we know that Pn satisfies φ for all n ≥ k.

We constructed partial model checkers for CCS/modal mu-calculus [BR03, BR06], and the pi-calculus/pi-mu-calculus [YBR06], and used this approach for verifying parameterized families of complex systems--- Java Meta Locking protocol with the former, and the Handover Protocol with the latter.

Earlier, we had investigated the use of logic programming transformations for deriving automated induction proofs for verification of parameterized families [RR04]. The key idea here was to use the fold transformation, which replaces a part of a clause body by an appropriate clause head, to simulate the effect of the induction step. To apply this strategy for verification proofs, we extended the folding step to work over clauses containing disjunction as well as negation [RKRR04, RKRR02].

Push-Down Systems

Our work on the verification of recursive programs has taken two approaches. The most recent one is a corollary to our work on the verification of probabilistic systems. We have shown that reachability properties of probabilistic push-down systems can be verified in terms of GPL (roughly, modal mu-calculus for probabilistic systems) properties of reactive probabilistic labeled transition systems. As a corollary, reachability of non-probabilistic push-down systems can be cast as verification of modal mu-calculus properties of finite state labeled transition systems. This result ties together the works on the verification of recursive processes and the verification of mu-calculus-based properties [GRS12].

We have also looked at the verification of linear-time properties of recursive programs more directly [BKPR02]. In that work, we separated infinite runs into finite-stack runs (where the program stack remains finite, but without an a priori bound) and possibly stack-divergent runs. We constructed a model checker to check for satisfaction of LTL properties for finite-stack runs, separately from the satisfaction of the properties for all runs.

Infinite-State Systems

We developed model checkers for a variety of infinite-state systems by posing the model checking problem as one of query evaluation over constraint logic programs. For timed systems, the evaluation was over clock constraints (e.g.[PRR02]). For data-independent systems, the evaluation was over equality and dis-equality contraints [SR03a]. The salient advance here is the ability to verify complex value-passing systems which handle data over unbounded domains. While earlier work on data independence needed a separate proof mapping the verification problem to a finite domain one, our work constructs the required finite domain (in terms of equality and dis-equality constraints) on the fly. This work also motivated the development of an efficient logical inference procedure that combined tabled evaluation with constraint processing [SR07].

Program Analysis

The "logical encoding of semantics" approach we have successfully employed for the verification problems can also be used for static analysis of programs.

Alias Analysis of C Programs

As a part of our work on evaluating incremental evaluation techniques, we looked at the popular flow-insensitive inclusion-based alias analysis of Anderson. Anderson's rules can be readily encoded as a logic program, and the analysis can be cast as query evaluation. One interesting side-effect is that, when queries are evaluated using top-down tabled resolution, we get demand-driven analysis: doing only the work needed to answer the alias queries about specific variables [SR05a]. We were further able to make demand-driven analysis efficient by using subsumption-based tabled evaluation (which we had developed earlier, e.g. [RRR96]). Moreover, the analysis was readily made incremental (w.r.t. addition as well as deletion of statements in the source program) by using our incremental query evaluation techniques.

Security Policy Analysis

We have studied a number of problems in analysis of security properties of systems. We have used query evaluation in logic programs as a way to specify and implement analyzers for detecting vulnerabilities in computer system configurations [RS02]. More recently, we have built a deductive spreadsheet system that incorporates logical rules to relate the values of cells in a spreadsheet [RRW07, RRW06]; and used it for specifying configurations of networked systems, and analyzing them for vulnerabilities [SRRSW07].

We have also considered several problems in the analysis of access control policies. Role-Based Access Control (RBAC) is a popular model for stating access control policies. Administrative RBAC (ARBAC) models are used to express how administrators in organizations can change the RBAC policies of that organization. Changes made by different administrators, all following the same ARBAC policy, may interact with each other in unforeseen ways. We considered the problem of analyzing ARBAC policies to check if RBAC policies with specific attributes are reachable via changes allowed by the ARBAC policy. By reducing the propositional planning problem to the ARBAC policy analysis problem, we established that the analysis problem is PSPACE-complete [SYSR11], and shown the computational complexity of several subsets of ARBAC policies.

We also considered interesting classes of ARBAC policies for which we could devise analysis algorithms that have high complexity on an usually small parameter characterizing the hardness of the input [SYRG07]. These algorithms were implemented at a high level as logic programs. More recently, we investigated symbolic techniques for analyzing parameterized ARBAC policies, where roles and permissions have parameters. Since the parameters may range over infinite types, the analysis problem is undecidable in general. We developed a semi-decision procedure for this problem, and implemented based on query evaluation over tabled constraint logic programs [SYGR11].

Other Static Analysis

We have modeled a number of other analysis problems in terms of query evaluation over logic programs. This ranges from analysis of logic programs for reducing non-determinism in their evaluation [RRS95, DRRS93], strictness analysis of functional programs, and dataflow analysis of imperative programs [DRW96], to interactions in virtual enterprises [DKPRRD99].

Logic Programming

As mentioned earlier, one of the themes of my research has been to formulate analysis and verification problems as query evaluation or inference over logic programs. For many problems, the logic programming paradigm and standard query evaluation techniques are insufficient to address the analysis and verification problems. Consequently, I have been working on improving the power and applicability of logic programming for verification and program analysis. I have been working on improving the expressiveness of the logic programming paradigm, particularly with respect to the available query evaluation techniques, to encode and solve complex verification and program analysis problems. I have also worked on improving the efficiency of the query evaluation techniques to ensure that it becomes practical to derive good implementations from this high-level encoding.

Inference and Learning in Probabilistic Logic Programming

Since 2010 we have been studying the combination of logical and statistical reasoning. We have been especially influenced by Sato and Kameya's PRISM framework, which brings the power of statistical knowledge representation and reasoning to traditional logic programming by adding msw (muti-valued switch) relation to encode random processes and their outcomes. PRISM's distribution semantics also gives a solid semantic foundation for combining statistical and logical reasoning. The framework also gives a powerful way to learn the distribution parameters of a combined logical-statistical model from examples. While foundational, there are several interesting and important problems that arise when we do inference and parameter learning in PRISM.

Verification of Probabilistic Systems:

PRISM's evaluation technique (as well as those of more recent systems following PRISM) computes the probability of a query answer by considering the set of all explanations of the answer. They require the set of explanations to be finite. This restriction prevents the use of such systems for encoding model checkers for probabilistic systems. We developed an alternative evaluation technique that represents the set of explanations in an automaton-like form. This form, called Factored Explanation Diagrams (FEDs), is a generalization of BDDs allowing one diagram to "call" another. Thus, FEDs are analogous to (probabilistic) context free grammars. Using this new technique, we can once again use query evaluation for model checking a wide variety of probabilistic process specification languages and probabilistic temporal logics as query evaluation [GRS12]. We are now studying the interactions between probabilistic choice (decisions that are based on the outcomes of probabilistic processes) and non-deterministic choice using this framework.

Inference with Continuous Random Variables:

Another problem that arises with the need to consider the set of all explanations is inference in the presence of continuous random variables. We developed a new evaluation technique based on representing explanations in a symbolic form (symbolic derivations), which can be applied when Gaussian random variables and linear equality constraints between such variable valuations are used. Linear equality constraints are sufficient to model dependencies between Gaussian variables. With this extension, we can encode statistical models using Gaussian variables, such as Finite Mixture Models and Kalman Filters, as a probabilistic logic program, and perform inference over them using our new evaluation technique [IRR12a].

Learning Distribution Parameters of Continuous Random Variables:

Symbolic derivations also form the basis for learning the distribution parameters of Gaussian random variables in a probabilistic logic program. For inference, each symbolic derivation was associated with a formula that (symbolically) encodes the probability of the set of explanations represented by the symbolic derivation. We perform parameter learning by adding formulas representing expectations of the mean and variance of Gaussian variables as well. We use these expectations in the style of EM learning algorithms, to iteratively compute distribution parameters that (locally) maximize the likelihood of a set of observations [IRR12b].

Incremental Evaluation

We have done extensive work on incremental query evaluation in a system that uses tabled resolution. In such systems, when the base facts change, the derived relations in memo tables become stale. We found that the Delete-Rederive (DRed) algorithm developed for deductive databases was not efficient enough. We developed support graphs to remember the dependencies between answers in different tables and base facts, so that when a fact is deleted, the graph can be traversed to find the affected answers [SR03b]. While the use of support graphs improved the time efficiency of incremental query evaluation, they were not space efficient. We developed the symbolic support graph to compactly represent large support graphs and still permit efficient traversal [SR05b]. One particularly noteworthy application of this technique is its use to implement a very efficient incremental version of Anderson's May-Point-To analysis for C programs [SR05a]. We subsequently expanded the work on incremental query evaluation to consider updates to facts as well [SR06a]. All these works considered definite logic programs and ignored impure features in languages like Prolog. For use in the more general setting, we maintained dependencies between queries (calls) and facts than between individual answers. We developed an algorithm to maintain the dependencies and traverse them efficiently to mark tables that may be affected when underlying facts are updated [SR06b]. This algorithm is now used in the standard release of the XSB logic programming system.

Tabled Evaluation

We have done extensive work in ensuring that the verification and analysis systems derived from their high-level encoding perform well enough to be practical. Our approach here has been to develop techniques for query evaluation with memo tables share as much of computation as possible. One aspect of this work is incremental evaluation described above, where some computations done before a fact is changed are shared (and not redone) after the change. Another aspect of this sharing problem arises from the sharable computations are identified. In the original implementations of memo tables, the computation of answers is shared only between two queries that are identical (modulo names of variables). We developed data structures and algorithms to ensure sharing when a later query is subsumed by (i.e. an instance of) an earlier one. The key issue here is one of indexing--- quickly identifying a set of answers relevant to a more specific query--- that works even when the underlying tables are not completely computed [JRRR99, RRR98].

We have also considered the problem of combine constraint processing with tabled evaluation, primarily to handle a larger classes of verification and analysis problems. For instance, we combined the handling of clock constraints into tabled evaluation in order to construct model checkers for timed systems [PRR02, DRS00]; dis-equality constraints for constructing model checkers and symbolic bisimulation checkers for value-passing systems [SYGR11, SRS09, YRS04, SR03a, BMRRV01]. One central problem in combining constraints solving and tabled evaluation was that the traditional constraint solvers maintained a global constraint store. Newly discovered constraints are added to this store, and entailment and consistency checks are performed against this central store. This strategy is followed even when constraint solvers are specified at a higher level, e.g. using constraint handling rules. In tabled evaluation, we need to keep multiple constraint stores, local to each query and answer. We developed an efficient technique for storing constraints in local stores, and evaluating constraint handling rules over such stores [SR07], thereby combining the power of tabled evaluation and constraint solving.

C. R. Ramakrishnan
Last modified: Thu Feb 7 18:29:25 EST 2013