System: | CPROGOL |
Version: | 4.1, 4.2 |
Further specification: | empirical ILP system |
Pointers: | http://www.doc.ic.ac.uk/~shm/index.html |
Code: | C source code |
References: | (Muggleton 1995, 1996) |
Other comments: | Distributed together with manual and examples |
PROGOL employs a covering approach like, e.g., FOIL. That
is, it selects an example to be generalised and finds a consistent clause
covering the example. All clauses made redundant by the found clause including
all examples covered by the clause are removed from the theory. The example
selection and generalisation cycle is repeated until all examples are covered.
When constructing hypothesis clauses consistent with the examples, PROGOL
conducts a general-to-specific search in the theta-subsumption lattice
of a single clause hypothesis. In contrast to other general-to-specific
searching systems, PROGOL computes the most specific clause covering the
seed example and belonging to the hypothesis language. This most specific
clause bounds the theta-subsumption lattice from below. On top, the lattice
is bounded by the empty clause. The search strategy is an -like
algorithm guided by an approximate compression measure. Each invocation
of the search returns a clause which is guaranteed to maximally compress
the data, however, the set of all found hypotheses is not necessarily the
most compressive set of clauses for the given example set. PROGOL can learn
ranges and functions with numeric data (integer and floating point) by
making use of the built-in predicates ``is'', <, =<, etc.
The hypothesis language of PROGOL is restricted by the means of mode declarations provided by the user. The mode declarations specify the atoms to be used as head literals or body literals in hypothesis clauses. For each atom, the mode declaration indicates the argument types, and whether an argument is to be instantiated with an input variable, an output variable, or a constant. Furthermore, the mode declaration bounds the number of alternative solutions for instantiating the atom. The types are defined in the background knowledge by unary predicates, or by Prolog built-in functions.
Arbitrary Prolog programs are allowed as background knowledge. Besides the background theory provided by the user, standard primitive predicates are built into PROGOL and are available as background knowledge. Positive examples are represented as arbitrary definite clauses. Negative examples and integrity constraints are represented as headless Horn clauses. Using negation by failure (CWA), PROGOL is able to learn arbitrary integrity constraints.
PROGOL provides a range of parameters for controlling the generalisation process. These parameters specify the maximum cardinality of hypothesis clauses, a depth bound for the theorem prover, the maximum layers of new variables, and an upper bound on the nodes to be explored when searching for a consistent clause. PROGOL allows to relax consistency by setting an upper bound on the number of negatives that can be covered by an acceptable clause.
PROGOL4.2 (Muggleton 1996) is an upward compatible with
version 4.1 but learns from positive-only data. The system is available
from the author upon request. The paper (Muggleton 1996) describing PROGOL4.2,
can be obtained as Postscript by anonymous ftp from ftp.comlab.ox.ac.uk
in the file
pub/Packages/ILP/Papers/poslearn1.ps.