|Further specification:||First order regression system|
|Code:||C mixed with SICStus Prolog|
|References:||Karalic and Bratko 1996|
Handling numerical constraints in the normal ILP setting takes the form of induction of classification or regression rules that involve the use of real numbers, predicting a discrete or a real-valued class in the presence of background knowledge. In the ILP project, a transformation approach to this problem was developed, using propositional systems as subroutines. However, this approach only works for determinate background knowledge, which excludes its applicability to domains such as predicting activity or mutagenicity of chemical compounds.
A new approach developed in ILP, called First Order Regression (FOR), is a combination of ILP and numerical regression. First-order logic descriptions are induced to carve out those subspaces that are amenable to numerical regression among real-valued variables. The program FORS (First Order Regression System) is an implementation of this idea, where numerical regression is focused on a distinguished continuous argument of the target predicate. This can be viewed as a generalisation of the usual ILP problem. Namely, the target predicate in usual ILP can be modified by adding an extra ``continuous'' attribute whose value would be determined by the truth of the examples: 1.0 for positive examples and 0.0 for negative. The regression formulas would only involve this attribute and FORS would tend to find rules that cover subsets of positive-only and negative-only examples.
FORS uses a covering approach, similar to the one of FOIL. The clause building part of the algorithm uses a top-down approach. The algorithm starts with the most general candidate clause, covering the entire example set and then specializes the clause by adding literals. Clause construction uses beam search to guide the algorithm through the space of possible clauses. As a part of the system, pruning based on the Minimum description length principle was developed that can handle also continuous variables. It turned out that MDL pruning helps to build more comprehensible models, while at the same time preserves model's performance in terms of its prediction power. FORS can handle noisy data and can also model dynamic systems (learn from time series).