|Further specification:||A Stochastic Inductive Learner|
|Pointers:||Contact Michele Sebag http://www.lri.fr/ia/sebag/introduction.en.html|
STILL extends the Disjunctive Version Space (DiVS) approach (Sebag 1996) to Logic Programming and Constraint Logic Programming. We briefly recall DiVS and then describe how this approach is extended to First Order Logic with polynomial complexity.
DiVS is an hybrid of Version Space and AQ: for each example Ex (positive or negative), DiVS constructs the set H(Ex) of all hypotheses covering Ex and not covering any example belonging to another class than Ex. A further instance E of the problem domain is classified via a nearest-neighbour-like process: E is classified by a majority vote among its neighbours, where E is neighbour of Ex iff E is covered by a hypothesis in H(Ex).
This approach would be exponential even in attribute-value languages if H(Ex) were characterized as a disjunction of conjunctions (after Haussler, 1988). But characterizing H(Ex) as a conjunction of disjunctions, the process gets linear in the number of attributes and of examples (Sebag 1996).
The theoretical extension of DiVS to Logic Programming and Constraint Logic Programming was achieved in ILP (Sebag and Rouveirol 1995). Practical applications then face with the fact that induction and deduction in First Order Logic both are exponential in the number of literals built on a predicate. That is to say, if a clause involves three literals built on the predicate atom, and a molecule involves 40 atoms, there are possible matchings between the clause and the example. And one must consider all such matchings in order to check whether the clause discriminates the example.
In the ILP literature, this problem is tackled by considering a judiciously restricted hypothesis space, and exhaustively exploring the matchings between a hypothesis and an example. In opposition, we chose to set no restriction on the hypothesis space, but to stochastically explore the set of matchings between a hypothesis and an example. The complexity of induction gets quadratic in the size of the data and linear in the number of samples allowed. The same heuristic applies for deduction: the hypotheses learned can be used with polynomial complexity.
The combination of DiVS with this stochastic heuristics is named STILL for Stochastic Inductive Learner. This work corresponds to a new type of bias for induction, namely stochastic bias. In opposition to syntactic bias, a stochastic bias does not require any a priori knowledge about the expected solutions; neither does it tend to favor some kinds of solutions, in contrast with search bias.
The price to pay is that STILL constructs hypotheses which are only approximately consistent: one cannot guarantee from the sampled matchings that there exists no matching between a hypothesis and a negative example. The number of samples allowed is supplied by the user, who thereby controls the induction cost as well as the consistency of the produced hypotheses: the more samples allowed, the more consistent the hypotheses found.
STILL involves three main parameters. The first one is the number of samples allowed, noted , which is set to 300 in the following experiments (to be compared with the number of possible matchings, here ).
The second parameter, denoted , corresponds to the acceptable number of inconsistencies for one hypothesis.
The third parameter, noted M, allows to tune the generality of the hypotheses: for M = 1, H(Ex) includes all hypotheses maximally discriminant covering Ex, which are too general if data are sparse. The generality of hypotheses decreases as M increases.