|Application domain:||Predicting chemical properties of river water|
Physical and chemical properties give a specific picture of river water quality at a particular point in time, while the biota (living organisms) act as continuous monitors and give a more general picture of water quality over a period of time. This has increased the relative importance of biological methods for monitoring water quality. The problem of inferring the chemical properties from the biota is practically relevant, especially in countries where extensive biological monitoring is conducted. Regular monitoring for a very wide range of chemical pollutants would be very expensive, if not impossible. On the other hand, the state of the biota can reflect an increase in pollution and indicate likely causes/sources.
We used data on biological and chemical samples from Slovenian rivers collected through the monitoring program of the Hydrometeorological Institute of Slovenia (Dzeroski et al. 2000). Pairs of biological and chemical samples that were taken at the same site at approximately the same time were used: there were 1061 such pairs, collected over six years. Data on biological samples list all the species/taxa present at the site and their abundances. Chemical samples contain the measured values of 16 physical an chemical parameters: biological oxygen demand (BOD), chlorine concentration (Cl), CO2 concentration, electrical conductivity, chemical oxygen demand COD (K2Cr2O7 and KMnO4), concentrations of ammonia (NH4), NO2, NO3 and dissolved oxygen (O2), alkalinity (pH), PO4, oxygen saturation, SiO2, water temperature, and total hardness.
We used regression tree induction to learn predictive models for each of the 16 parameters (Dzeroski et al. 2000) separately. The models for the most important indicators of pollution (ammonia, biological oxygen demand, chemical oxygen demand) had the best predictive power. We also used clustering trees (Blockeel et al. 1998) to predict the values for all 16 parameters at the same time (Blockeel et al. 1999): this actually improved the accuracy as compared to individual predictions for each of the 16 parameters.