MUTAGENESIS (LRI)

Application domain: Mutagenesis, regression-unfriendly
Source: Oxford
Dataset size: 2 556 facts
Data format: text
Systems Used: STILL, DISTILL
Pointers: http://www.lri.fr/~sebag/

The data (Oxford)

The mutagenesis dataset has been extensively studied in the ILP project; it aims at discriminating organic compounds classified into two classes.

Most experiments regard the 188-compound dataset, known as regression-friendly since numerical regression obtains around 86% predictive accuracy.

The following experiments concern the 42 other compounds, i.e. the regression-unfriendly dataset which is considered to be harder than the 188-dataset (the best prediction rate reported on this dataset [#!SriKin96-ILP96!#] is 64%). This dataset involves 29 inactive compounds vs 13 active compounds.

Experiments with STILL (LRI)

$\epsilon$ M OK ? Mis $\pm$ time
0 1 88.9 1.9 9.26 $\pm$ 15 1
0 2 88.3 3.3 8.33 $\pm$ 15 1
0 3 80 8.3 11.7 $\pm$ 18 2
0 4 78.3 18 3.33 $\pm$ 22 4
1 1 73.3 8.3 18.3 $\pm$ 19 1
1 2 90 0 10 $\pm$ 14 1
1 3 86.7 1.7 11.7 $\pm$ 17 2
1 4 83.3 3.3 13.3 $\pm$ 18 2
2 1 68.3 3.3 28.3 $\pm$ 16 1
2 2 73.3 0 26.7 $\pm$ 13 1
2 3 85 1.7 13.3 $\pm$ 18 1
2 4 86.7 1.7 11.7 $\pm$ 17 1
Average Predictive Accuracy of STILL on the regression-unfriendly dataset

The two parameters of stochastic subsumption $\eta$ and $K$ are respectively set to 300 and 3, and we focus on the influence of parameters $\epsilon$ and $M$, where $\epsilon =0$ corresponds to perfect consistency and $M=1$ to maximal generality.

The table above summarizes the predictive accuracy of STILL on the test set, averaged on 25 independent selections of a 4-example test set distributed as the whole dataset. The third, fourth and fifth columns respectively give the percentage of correctly classified, unclassified and misclassified test examples. Column 6 gives the standard deviation of the predictive accuracy, Column 7 gives the total computational time (induction and classification of the test examples), in seconds on a HP-710 workstation.

The results obtained for reasonable values of $\epsilon$ and $M$ are satisfactory; the computational cost is negligible. One only regrets the high variance of the predictive accuracy.

Experiments with DISTILL (LRI)

p Average OK Range Average variance time
10 89.8 [82.5, 95] $\pm$ 12 1
20 88.8 [82.5, 95] $\pm$ 13 2
30 92.1 [85, 97.5] $\pm$ 12 4
40 93 [87.5, 97.5] $\pm$ 11 4
50 93.1 [90, 97.5] $\pm$ 11 5
60 93.5 [90, 97.5] $\pm$ 11 5
70 94.2 [87.5, 97.5] $\pm$ 11 8
80 93.4 [87.5, 97.5] $\pm$ 11 9
90 94.5 [90, 97.5] $\pm$ 10 10
100 95.1 [92.5, 97.5] $\pm$ 10 12
Average Predictive Accuracy of DISTILL on the regression-unfriendly dataset

The two parameters of stochastic subsumption $\eta$ and $K$ are respectively set to 10 and 3, and we focus on the influence of the number of dimensions $p$.

The above table summarizes the predictive accuracy of DISTILL, with the following experimental setting. A run corresponds to a 10-cross-fold validation; column 2 indicates the average predictive accuracy on 20 independent runs and column 3 indicates the range of variation of this predictive accuracy. The average variance of the cross-validation is given in column 4, and the computational time (induction of hypotheses, mapping all examples onto ${\rm I\hspace{-0.50ex}N}^p$ and k-NN classification of the test examples), in seconds on a HP-710 workstation.

The results obtained for sufficient values of $p$ ($p > 50$) are satisfactory and degrade gracefully as $p$ decreases; the computational cost remains moderate. DISTILL obtains slightly better and overall more steady performances than STILL, whereas it involves one less parameter.


back to index