|Application domain:||Predicting biodegradability|
|Dataset size:||62 compounds, 83 KB|
|References:||(Van Laer et al. 1997)|
The dataset was constructed in cooperation with Boris Kompare of the Faculty of Civil Engineering, University of Ljubljana. The task considered here is similar to the task of predicting mutagenicity. The variable to predict is the average half-time of aerobic aqueous biodegradation. Thus, we have essentially a relational regression problem. A discrete version of the class was also provided by the expert. There are four classes: resistant (half-time of more than six months), slow, moderate, and fast (half-time of up to one week).
Biodegradation rates were available for 330 chemicals. QUANTA structural information was kindly provided by Ross King for 62 of these chemicals. Thus, atom and bond information is available. Molecular weight was also added as background knowledge. This is a small dataset and the compounds are structuraly diverse, making the prediction of biodegradability a tough problem.
ICL was applied to the problem of predicting the speed of aqueous biodegradation of chemical compounds from their structure. Six-fold cross-validation was performed. Four different parameter settings of ICL were tried out. Discretization was employed to handle the real numbers in the data. An accuracy of 58.1% on unseen case was achieved with the best parameter setting.