FOIL is a system for learning
intensional concept definitions from relational tuples.
It is one of the best-known and successful empirical
ILP systems and has inspired a lot of further research.
FOIL induces concept definitions represented as function-free
Horn clauses, optionally containing negated body literals. The background knowledge
predicates are represented extensionally as sets of ground tuples.
FOIL employs a heuristic search strategy which prunes vast parts of the
hypothesis space.
As its general search strategy, FOIL adopts a covering approach.
Induction of a single clause starts with a clause with an empty body which
is specialised by repeatedly adding a body literal to the clause built so
far. As candidate body literals, FOIL considers the literals which
are constructed by variabilising the
predicates (including the target predicate), that is, by distributing variables
to their argument places.
Additionally, FOIL takes into account literals stating
(in)equality of variables in the head or body literals.
Furthermore, literals may contain constants
which the user has declared as theory (i.e. relevant) constants.
All literals have to conform to the type restrictions of the predicates. For
further control of the language bias, FOIL provides parameters
limiting the total number and maximum depth of variables in a single
clause. In addition, FOIL incorporates mechanisms for excluding
literals which might lead to endless loops in recursive hypothesis
clauses. FOIL offers limited number handling capabilities by
generating literals which compare numeric variables to each other or to
thresholds.
Among the candidate literals, FOIL selects one literal to be added to
the body of the hypothesis clause according the information gain
heuristic, an information-based measure estimating
the utility of a literal in dividing positive from negative examples.
FOIL stops adding literals to the hypothesis clause if the clause
reaches a predefined minimum accuracy or if the encoding length of the
clause exceeds the number of bits needed for explicitly encoding the
positive examples it covers. This second stopping criterion prevents the
induction of overly long and specific clauses in noisy domains. Induction
of further hypothesis clauses stops if all positive examples are covered
or if the set of induced hypothesis clauses violates the encoding length
restriction. In a postprocessing stage, FOIL removes unneccessary
literals from induced clauses as well as redundant clauses from the
concept definition.
FOIL's greedy search strategy makes it very efficient, but also prone
to exclude the intended concept definitions from the search space. Some
refinements of the hill-climbing search alleviate its short-sightedness,
such as including a certain class of literals with zero information gain
into the hypothesis clause and a simple backtracking mechanism.
FOIL is a batch learning system which reads in all learning input
from a single input file. Positive as well as negative
examples are required for learning. A user may provide negative examples explicitly
or, alternatively, instruct FOIL to generate negative examples
automatically according to the Closed World Assumption (CWA). In the latter
case, the set of positive examples must be complete up to a certain
example complexity. For predicates with high arity, the CWA may generate a
huge number of negative examples. FOIL offers a command line option
allowing the user to specify the percentage of randomly-selected negative
examples to be used for induction.
Examples and background knowledge for FOIL have to be formatted as
tuples, that is, each ground instance of a predicate is represented as a
sequence of argument values. For each predicate, the user provides a header
defining its name and argument types. Optionally, the user may indicate
the input/output mode of the predicates, thus further limiting the number
of candidate body literals.
For convenient testing of the induced hypothesis, the user may provide
test cases (i.e. classified examples) for the target predicates together
with the learning input. FOIL then checks the hypothesis on these
cases and reports the results.
References
J.R. Quinlan and R.M. Cameron-Jones.
FOIL: A midterm report.
In P. Brazdil, editor, Proceedings of the 6th European
Conference on Machine Learning, volume 667 of Lecture Notes in
Artificial Intelligence, pages 3-20. Springer-Verlag, 1993.