System |
DISTILL |

Version |
1.3 |

Code |
C++ on HP workstation |

References |
ILP97 |

Pointers |
http://www.lri.fr/ia/sebag/introduction.en.html |

Full name |
Distance Induction with STILL |

DISTILL aims at mapping the problem domain onto a metric space, because a number of robust data analysis algorithms are applicable into metric spaces, to various aims (discrimination, regression, clustering, visualization,...).

One first remarks that a set of disjunctive hypotheses
defines a mapping from the problem
domain onto the space of integer
vectors .
Let *h* be given as , where
are conjunctive hypotheses in
. Let *<E,h>* denote the
number of which subsume *E*.
The mapping from the problem
domain onto
is defined as follows:

By the way, this mapping onto the metric space automatically defines a metric structure on :

Expectedly, the relevance of this metric structure increases with the
number *p* of hypotheses hi considered, and the diversity of these
hypotheses. This relevance will be empirically estimated from the
predictive accuracy of a k-nearest neighbor classification process based on
the above distance.

Hypotheses are termed dimensions of
the mapping , and *p* is the
user-supplied number of dimensions. DISTILL uses fragments of hypotheses
built by STILL (see) as dimensions of :
*p* pairs of examples (,
) are randomly selected; hypothesis
is set to the disjunctive hypothesis
D(, ),
which covers and discriminates
. Thanks to the stochastic subsumption
mechanism of STILL, can be approximately
characterized and used with linear complexity in the size of the examples.
DISTILL complexity (computing the distance of any two examples) finally is

where *p* is the number of dimensions,
the number of samples considered during induction, *K* the number of
samples considered during classification, and *p* the maximum number of
literals of an example.

DISTILL involves one parameter less-than STILL: parameters
and *M*,
which control the consistency and generality requirements in STILL, are
replaced by the number of dimensions *p*.

back to index