The advanced XSB/Prolog interface uses only one data type: prolog_term. A Prolog term (as the name suggests) can be bound to any XSB term. On the C side, the type of the term can be checked and then processed accordingly. For instance, if the term turns out to be a structure, then it can be decomposed and the functor can be extracted along with the arguments. If the term happens to be a list, then it can be processed in a loop and each list member can be further decomposed into its atomic components. The advanced interface also provides functions to check the types of these atomic components and for converting them into C types.
As with the basic C interface, the file emu/cinterf.h must be included in the C program in order to make the prototypes of the relevant functions known to the C compiler.
The first set of functions is typically used to check the type of Prolog terms passed into the C program.
[]
(nil) value, and FALSE otherwise.
After checking the types of the arguments passed in from the Prolog side, the next task usually is to convert Prolog data into the types understood by C. This is done with the following functions. The first three convert between the basic types. The last two extract the functor name and the arity. Extraction of the components of a list and the arguments of a structured term is explained later.
The next batch of functions support conversion of data in the opposite direction: from basic C types to the type prolog_term. These c2p_* functions all return a boolean value TRUE if successful and FALSE if unsuccessful. The XSB term argument must always contain an XSB variable, which will be bound to the indicated value as a side effect of the function call.
The following functions create Prolog data structures within a C program. This is usually done in order to pass these structures back to the Prolog side.
[]
(nil).To use the above functions, one must be able to get access to the components of the structured Prolog terms. This is done with the help of the following functions:
It is very important to realize that these functions return the actual Prolog term that is, say, the head of a list or the actual argument of a structured term. Thus, assigning a value to such a prolog term also modifies the head of the corresponding list or the relevant argument of the structured term. It is precisely this feature that allows passing structured terms and lists from the C side to the Prolog side. For instance,
prolog_term plist, /* a Prolog list */ structure; /* something like f(a,b,c) */ prolog_term tail, arg; .......... tail = p2p_cdr(plist); /* get the list tail */ arg = p2p_arg(structure, 2); /* get the second arg */ /* Assume that the list tail was supposed to be a prolog variable */ if (is_var(tail)) c2p_nil(tail); /* terminate the list */ else { fprintf(stderr, "Something wrong with the list tail!"); exit(1); } /* Assume that the argument was supposed to be a prolog variable */ c2p_string("abcdef", arg);
In the above program fragment, we assume that both the tail of the list and the second argument of the term were supposed to be bound to Prolog variables. In case of the tail, we check if this is, indeed, the case. In case of the argument, no checks are done; XSB will issue an error (which might be hard to track down) if the second argument is not currently bound to a variable.
The last batch of functions is useful for passing data in and out of the Prolog side of XSB. The first function is the only way to get a prolog_term out of the Prolog side; the second function is sometimes needed in order to pass complex structures from C into Prolog.
For instance, consider the Prolog call test(X, f(Z)), which is implemented by a C function with the following fragment:
prolog_term newterm, newvar, z_var, arg2; ..... /* process argument 1 */ c2p_functor("func",1,reg_term(1)); c2p_string("str",p2p_arg(reg_term(1),1)); /* process argument 2 */ arg2 = reg_term(2); z_var = p2p_arg(arg2, 1); /* get the var Z */ /* bind newterm to abc(V), where V is a new var */ c2p_functor("abc", 1, newterm); newvar = p2p_arg(newterm, 1); newvar = p2p_new(); .... /* return TRUE (success), if unify; FALSE (failure) otherwise */ return p2p_unify(z_var, newterm);On exit, the variable will be bound to the term func(str). Processing argument 2 is more interesting. Here, argument 2 is used both for input and output. If test is called as above, then on exit will be bound to abc(_h123), where _h123 is some new Prolog variable. But if the call is test(X,f(1)) or test(X,f(Z,V)) then this call will fail (fail as in Prolog, i.e., it is not an error), because the term passed back, abc(_h123), does not unify with f(1) or f(Z,V). This effect is achieved by the use of p2p_unify above.
We conclude with two real examples of functions that pass complex data in and out of the Prolog side of XSB. These functions are part of the Posix regular expression matching package of XSB. The first function uses argument 2 to accept a list of complex prolog terms from the Prolog side and does the processing on the C side. The second function does the opposite: it constructs a list of complex Prolog terms on the C side and passes it over to the Prolog side in argument 5.
/* XSB string substitution entry point: replace substrings specified in Arg2 with strings in Arg3. In: Arg1: string Arg2: substring specification, a list [s(B1,E1),s(B2,E2),...] Arg3: list of replacement string Out: Arg4: new (output) string Always succeeds, unless error. */ int do_regsubstitute__(void) { /* Prolog args are first assigned to these, so we could examine the types of these objects to determine if we got strings or atoms. */ prolog_term input_term, output_term; prolog_term subst_reg_term, subst_spec_list_term, subst_spec_list_term1; prolog_term subst_str_term=(prolog_term)0, subst_str_list_term, subst_str_list_term1; char *input_string=NULL; /* string where matches are to be found */ char *subst_string=NULL; prolog_term beg_term, end_term; int beg_offset=0, end_offset=0, input_len; int last_pos = 0; /* last scanned pos in input string */ /* the output buffer is made large enough to include the input string and the substitution string. */ char subst_buf[MAXBUFSIZE]; char *output_ptr; int conversion_required=FALSE; /* from C string to Prolog char list */ input_term = reg_term(1); /* Arg1: string to find matches in */ if (is_string(input_term)) /* check it */ input_string = string_val(input_term); else if (is_list(input_term)) { input_string = p_charlist_to_c_string(input_term, input_buffer, sizeof(input_buffer), "RE_SUBSTITUTE", "input string"); conversion_required = TRUE; } else xsb_abort("RE_SUBSTITUTE: Arg 1 (the input string) must be an atom or a character list"); input_len = strlen(input_string); /* arg 2: substring specification */ subst_spec_list_term = reg_term(2); if (!is_list(subst_spec_list_term) && !is_nil(subst_spec_list_term)) xsb_abort("RE_SUBSTITUTE: Arg 2 must be a list [s(B1,E1),s(B2,E2),...]"); /* handle substitution string */ subst_str_list_term = reg_term(3); if (! is_list(subst_str_list_term)) xsb_abort("RE_SUBSTITUTE: Arg 3 must be a list of strings"); output_term = reg_term(4); if (! is_var(output_term)) xsb_abort("RE_SUBSTITUTE: Arg 4 (the output) must be an unbound variable"); subst_spec_list_term1 = subst_spec_list_term; subst_str_list_term1 = subst_str_list_term; if (is_nil(subst_spec_list_term1)) { strncpy(output_buffer, input_string, sizeof(output_buffer)); goto EXIT; } if (is_nil(subst_str_list_term1)) xsb_abort("RE_SUBSTITUTE: Arg 3 must not be an empty list"); /* initialize output buf */ output_ptr = output_buffer; do { subst_reg_term = p2p_car(subst_spec_list_term1); subst_spec_list_term1 = p2p_cdr(subst_spec_list_term1); if (!is_nil(subst_str_list_term1)) { subst_str_term = p2p_car(subst_str_list_term1); subst_str_list_term1 = p2p_cdr(subst_str_list_term1); if (is_string(subst_str_term)) { subst_string = string_val(subst_str_term); } else if (is_list(subst_str_term)) { subst_string = p_charlist_to_c_string(subst_str_term, subst_buf, sizeof(subst_buf), "RE_SUBSTITUTE", "substitution string"); } else xsb_abort("RE_SUBSTITUTE: Arg 3 must be a list of strings"); } beg_term = p2p_arg(subst_reg_term,1); end_term = p2p_arg(subst_reg_term,2); if (!is_int(beg_term) || !is_int(end_term)) xsb_abort("RE_SUBSTITUTE: Non-integer in Arg 2"); else{ beg_offset = int_val(beg_term); end_offset = int_val(end_term); } /* -1 means end of string */ if (end_offset < 0) end_offset = input_len; if ((end_offset < beg_offset) || (beg_offset < last_pos)) xsb_abort("RE_SUBSTITUTE: Substitution regions in Arg 2 not sorted"); /* do the actual replacement */ strncpy(output_ptr, input_string + last_pos, beg_offset - last_pos); output_ptr = output_ptr + beg_offset - last_pos; if (sizeof(output_buffer) > (output_ptr - output_buffer + strlen(subst_string))) strcpy(output_ptr, subst_string); else xsb_abort("RE_SUBSTITUTE: Substitution result size %d > maximum %d", beg_offset + strlen(subst_string), sizeof(output_buffer)); last_pos = end_offset; output_ptr = output_ptr + strlen(subst_string); } while (!is_nil(subst_spec_list_term1)); if (sizeof(output_buffer) > (output_ptr-output_buffer+input_len-end_offset)) strcat(output_ptr, input_string+end_offset); EXIT: /* get result out */ if (conversion_required) c_string_to_p_charlist(output_buffer,output_term,"RE_SUBSTITUTE","Arg 4"); else /* DO NOT intern. When atom table garbage collection is in place, then replace the instruction with this: c2p_string(output_buffer, output_term); The reason for not interning is that in Web page manipulation it is often necessary to process the same string many times. This can cause atom table overflow. Not interning allows us to circumvent the problem. */ ctop_string(4, output_buffer); return(TRUE); } /* XSB regular expression matcher entry point In: Arg1: regexp Arg2: string Arg3: offset Arg4: ignorecase Out: Arg5: list of the form [match(bo0,eo0), match(bo1,eo1),...] where bo*,eo* specify the beginning and ending offsets of the matched substrings. All matched substrings are returned. Parenthesized expressions are ignored. */ int do_bulkmatch__(void) { prolog_term listHead, listTail; /* Prolog args are first assigned to these, so we could examine the types of these objects to determine if we got strings or atoms. */ prolog_term regexp_term, input_term, offset_term; prolog_term output_term = p2p_new(); char *regexp_ptr=NULL; /* regular expression ptr */ char *input_string=NULL; /* string where matches are to be found */ int ignorecase=FALSE; int return_code, paren_number, offset; regmatch_t *match_array; int last_pos=0, input_len; char regexp_buffer[MAXBUFSIZE]; if (first_call) initialize_regexp_tbl(); regexp_term = reg_term(1); /* Arg1: regexp */ if (is_string(regexp_term)) /* check it */ regexp_ptr = string_val(regexp_term); else if (is_list(regexp_term)) regexp_ptr = p_charlist_to_c_string(regexp_term, regexp_buffer, sizeof(regexp_buffer), "RE_MATCH", "regular expression"); else xsb_abort("RE_MATCH: Arg 1 (the regular expression) must be an atom or a character list"); input_term = reg_term(2); /* Arg2: string to find matches in */ if (is_string(input_term)) /* check it */ input_string = string_val(input_term); else if (is_list(input_term)) { input_string = p_charlist_to_c_string(input_term, input_buffer, sizeof(input_buffer), "RE_MATCH", "input string"); } else xsb_abort("RE_MATCH: Arg 2 (the input string) must be an atom or a character list"); input_len = strlen(input_string); offset_term = reg_term(3); /* arg3: offset within the string */ if (! is_int(offset_term)) xsb_abort("RE_MATCH: Arg 3 (the offset) must be an integer"); offset = int_val(offset_term); if (offset < 0 || offset > input_len) xsb_abort("RE_MATCH: Arg 3 (=%d) must be between 0 and %d", input_len); /* If arg 4 is bound to anything, then consider this as ignore case flag */ if (! is_var(reg_term(4))) ignorecase = TRUE; last_pos = offset; /* returned result */ listTail = output_term; while (last_pos < input_len) { c2p_list(listTail); /* make it into a list */ listHead = p2p_car(listTail); /* get head of the list */ return_code = xsb_re_match(regexp_ptr, input_string+last_pos, ignorecase, &match_array, &paren_number); /* exit on no match */ if (! return_code) break; /* bind i-th match to listHead as match(beg,end) */ c2p_functor("match", 2, listHead); c2p_int(match_array[0].rm_so+last_pos, p2p_arg(listHead,1)); c2p_int(match_array[0].rm_eo+last_pos, p2p_arg(listHead,2)); listTail = p2p_cdr(listTail); last_pos = match_array[0].rm_eo+last_pos; } c2p_nil(listTail); /* bind tail to nil */ return p2p_unify(output_term, reg_term(5)); }