re_match/5
provides us with the offsets. How can we
actually get the matched substrings? This is done with the help of the
predicate re_substring/4
:
re_substring(+String, +BeginOffset, +EndOffset, -Result).
This predicate works exactly like substring/4 described in Section 1.6, except that the resulting substring is not interned (if it is an atom). All you can do with this string is to immediately convert it into a list (using atom_codes/2) or into a true atom (using intern_string/2, which must be imported from module machine).
The reason for these complications is to allow the user to control the size of the atom table. At present, XSB does not have atom table garbage collection, so heavy use of string manipulation functions can result in atom table overflow. This danger is particularly severe when XSB is used for processing HTML pages. This predicate will become an alias to substring/4 when atom garbage collection will be added to XSB.
On the other hand, converting strings into lists (without interning them first) is safe, because lists are garbage-collected in XSB Version 2.0.
Here is a complete example that shows matching followed by a subsequent extraction of the matches:
| ?- import intern_string/2 from machine. | ?- Str = 'abbbcd\bbo', re_match("a(b*)cd\\\\",Str,0,_,[match(X,Y), match(V,W)|L]), re_substring(Str,X,Y,UninternedMatch), intern_string(UninternedMatch,Match), re_substring(Str,V,W,UninternedParen1), atom_codes(UninternedParen1,Paren1). Str = abbbcd\bbo X = 0 Y = 7 V = 1 W = 4 L = [] UninternedMatch = abbbcd\ Match = abbbcd\ UninternedParen1 = bbb Paren1 = [98,98,98]Note that the strings UninternedMatch and UninternedParen1 cannot be used by themselves. In the first case, we converted the string into a Prolog atom and in the second case into a string. The resulting objects ( Match and Paren1) can be used in further computations.
Observe that XSB reports that UninternedMatch and
UninternedParen1 are both equal the string `` bbb'', while
Match -- the atom obtained from UninternedMatch -- is different.
This is because UninternedMatch and UninternedParen1 are
uninterned and both occupy the same physical space. Thus, the second call
to re_substring/4
overrides the value stored in this location by the
first call.