Next: Push-Down Automata Up: Finite State Machines Previous: Minimization of FSM's

## Regular Expressions

Regular expressions are another way of specifying finite state languages, that is sets of strings of symbols. A regular expression over an alphabet is:

1.
a symbol from , or
2.
an expression (RE1*RE2), where RE1 and RE2 are regular expressions, or
3.
an expression (RE1+RE2), where RE1 and RE2 are regular expressions, or
4.
an expression @(RE), where RE is a regular expression.

We associate with each regular expression (RE) a set of strings over (i.e., a language.) We will use XSB (Prolog) rules to define the set of strings associated with a RE. An RE will be represented as a Prolog term, and the definition below shows that we are using * for concatentation, + for alternation, and @ for iteration.

The following program, when given a regular expression, accepts strings that are in the language represented by that expression:

```% Is StringName in the language represented by Exp?
reacc(Exp,StringName) :-
reacc(Exp,StringName,0,F),
stringlen(StringName,F).

% An atom represents itself
reacc(A,S,From,To) :- atomic(A),string(S,From,A,To).
% Concatenation of E1 and E2
reacc((E1*E2),S,From,To) :-
reacc(E1,S,From,M),
reacc(E2,S,M,To).
% Alternation
reacc((E1+_E2),S,From,To) :- reacc(E1,S,From,To).
reacc((_E1+E2),S,From,To) :- reacc(E2,S,From,To).
% Iteration if 0 or more occurrences
reacc(@(_E),_S,From,From).
reacc(@(E),S,From,To) :-
reacc(@(E),S,From,Mid),
reacc(E,S,Mid,To).
```

Now we can test whether string s1 (00112) is in the language represented by the regular expression @(0)* @(1)* @(2):

```| ?- reacc(@(0)* @(1)* @(2),s1).
++Warning: Removing incomplete tables...

yes
| ?-
```
and it is.

It turns out that regular expressions represent exactly the same languages that finite state machines do. Given a regular expression, we can construct a finite state machine that recognizes the same language (and vice versa.) So first we will construct a machine given a regular expression. To make the construction easy, we will represent machine names and states in an interesting way. Given a regular expression RE, we will use m(RE) to name the machine for recognizing the same langauge as RE. And the initial state for the constructed machine that recognizes the language of RE will be named i(RE) and the final state (there will always be exactly one in our construction) will be f(RE). Note that we are free to name the machines and states anything we want, so this choice is simply a convenience.

The following rules define the FSM given a regular expression:

```% The machine for an atomic RE, simply transits from its initial state
%    to its final state on the given atomic symbol.
%    (All the others will be epsilon transitions.)
m(re(RE),i(RE),RE,f(RE)) :- atomic(RE).

% To recognize concatenated expressions:
% Connect the initial state of the compound expr to the initial state of
%    the first subexpr.
m(re(RE1*RE2),i(RE1*RE2),'',i(RE1)).

% Connect the final state of the first subexpr to the initial state of
%     the second subexpr.
m(re(RE1*RE2),f(RE1),'',i(RE2)).

% Connect the final state of the second subexpr to the final state of
%     the compound expr.
m(re(RE1*RE2),f(RE2),'',f(RE1*RE2)).

% And finally must add the transitions of the machines for the
%    subexpressions.
m(re(RE1*_RE2),S,Sy,T) :- m(re(RE1),S,Sy,T).
m(re(_RE1*RE2),S,Sy,T) :- m(re(RE2),S,Sy,T).

% The process is analogous for alternation.
m(re(RE1+RE2),i(RE1+RE2),'',i(RE1)).
m(re(RE1+RE2),i(RE1+RE2),'',i(RE2)).
m(re(RE1+RE2),f(RE1),'',f(RE1+RE2)).
m(re(RE1+RE2),f(RE2),'',f(RE1+RE2)).
m(re(RE1+_RE2),S,Sy,T) :- m(re(RE1),S,Sy,T).
m(re(_RE1+RE2),S,Sy,T) :- m(re(RE2),S,Sy,T).

% and for iteration
m(re(@(RE)),i(@(RE)),'',f(@(RE))).
m(re(@(RE)),i(@(RE)),'',i(RE)).
m(re(@(RE)),f(RE),'',f(@(RE))).
m(re(@(RE)),f(@(RE)),'',i(@(RE))).
m(re(@(RE)),S,Sy,T) :- m(re(RE),S,Sy,T).

% and the initial and final states are just those named i() and f().
mis(re(RE),i(RE)).
mfs(re(RE),f(RE)).
```

As an example, consider the following execution

```| ?- m(re(a*b*c),S,Sy,T),writeln(m(re(a*b*c),S,Sy,T)),fail.
m(re(a * b * c),i(a * b * c),,i(a * b))
m(re(a * b * c),f(a * b),,i(c))
m(re(a * b * c),f(c),,f(a * b * c))
m(re(a * b * c),i(a * b),,i(a))
m(re(a * b * c),f(a),,i(b))
m(re(a * b * c),f(b),,f(a * b))
m(re(a * b * c),i(a),a,f(a))
m(re(a * b * c),i(b),b,f(b))
m(re(a * b * c),i(c),c,f(c))

no
| ?- m(det(efree(re(a*b*c))),S,Sy,T),writeln(m(re(a*b*c),S,Sy,T)),fail.
m(re(a * b * c),[i(a * b * c)],a,[f(a),i(b)])
m(re(a * b * c),[f(a),i(b)],b,[f(b),f(a * b),i(c)])
m(re(a * b * c),[f(b),f(a * b),i(c)],c,[f(c),f(a * b * c)])

no
| ?-
```

Here we first constructed the FSM that recognizes the single string abc, which has 9 transitions, most of them epsilon transitions. In the second query, we found the deterministic version of that machine. Notice that this dropped the number of transitions to the minimal three.

The final problem concerning FSM's and regular expressions that we will consider is that of, given a machine, constructing an equivalent regular expression.

```re(S,T,0,Mach,RE) :- is_state(S),is_state(T),S\==T,tsetof(Sy,m(Mach,S,Sy,T),RE).
re(S,S,0,Mach,[''|RE]) :- is_state(S),tsetof(Sy,m(Mach,S,Sy,T),RE).
re(I,J,K,Mach,[RE1* @(RE2) * RE3,RE4]) :- K>0,
K1 is K-1,
re(I,K,K1,Mach,RE1),
re(K,K,K1,Mach,RE2),
re(K,J,K1,Mach,RE3),
re(I,J,K1,Mach,RE4).
```

Next: Push-Down Automata Up: Finite State Machines Previous: Minimization of FSM's
David S. Warren
1999-07-31