FOIL

System:	FOIL
Version:	6.4
Pointers:	http://www.cse.unsw.edu.au/~quinlan/
Code:	C Source Code
References:	Quinlan and Cameron-Jones 1993

FOIL is a system for learning intensional concept definitions from relational tuples. It is one of the best-known and successful empirical ILP systems and has inspired a lot of further research.

FOIL induces concept definitions represented as function-free Horn clauses, optionally containing negated body literals. The background knowledge predicates are represented extensionally as sets of ground tuples. FOIL employs a heuristic search strategy which prunes vast parts of the hypothesis space. As its general search strategy, FOIL adopts a covering approach. Induction of a single clause starts with a clause with an empty body which is specialised by repeatedly adding a body literal to the clause built so far. As candidate body literals, FOIL considers the literals which are constructed by variabilising the predicates (including the target predicate), that is, by distributing variables to their argument places. Additionally, FOIL takes into account literals stating (in)equality of variables in the head or body literals. Furthermore, literals may contain constants which the user has declared as theory (i.e. relevant) constants. All literals have to conform to the type restrictions of the predicates. For further control of the language bias, FOIL provides parameters limiting the total number and maximum depth of variables in a single clause. In addition, FOIL incorporates mechanisms for excluding literals which might lead to endless loops in recursive hypothesis clauses. FOIL offers limited number handling capabilities by generating literals which compare numeric variables to each other or to thresholds. Among the candidate literals, FOIL selects one literal to be added to the body of the hypothesis clause according the information gain heuristic, an information-based measure estimating the utility of a literal in dividing positive from negative examples. FOIL stops adding literals to the hypothesis clause if the clause reaches a predefined minimum accuracy or if the encoding length of the clause exceeds the number of bits needed for explicitly encoding the positive examples it covers. This second stopping criterion prevents the induction of overly long and specific clauses in noisy domains. Induction of further hypothesis clauses stops if all positive examples are covered or if the set of induced hypothesis clauses violates the encoding length restriction. In a postprocessing stage, FOIL removes unneccessary literals from induced clauses as well as redundant clauses from the concept definition. FOIL's greedy search strategy makes it very efficient, but also prone to exclude the intended concept definitions from the search space. Some refinements of the hill-climbing search alleviate its short-sightedness, such as including a certain class of literals with zero information gain into the hypothesis clause and a simple backtracking mechanism.

FOIL is a batch learning system which reads in all learning input from a single input file. Positive as well as negative examples are required for learning. A user may provide negative examples explicitly or, alternatively, instruct FOIL to generate negative examples automatically according to the Closed World Assumption (CWA). In the latter case, the set of positive examples must be complete up to a certain example complexity. For predicates with high arity, the CWA may generate a huge number of negative examples. FOIL offers a command line option allowing the user to specify the percentage of randomly-selected negative examples to be used for induction.

Examples and background knowledge for FOIL have to be formatted as tuples, that is, each ground instance of a predicate is represented as a sequence of argument values. For each predicate, the user provides a header defining its name and argument types. Optionally, the user may indicate the input/output mode of the predicates, thus further limiting the number of candidate body literals. For convenient testing of the induced hypothesis, the user may provide test cases (i.e. classified examples) for the target predicates together with the learning input. FOIL then checks the hypothesis on these cases and reports the results.

References

J.R. Quinlan and R.M. Cameron-Jones. FOIL: A midterm report. In P. Brazdil, editor, Proceedings of the 6th European Conference on Machine Learning, volume 667 of Lecture Notes in Artificial Intelligence, pages 3-20. Springer-Verlag, 1993.

back to index