Foreword

to the book Relational Data Mining,
edited by Saso Dzeroski and Nada Lavrac
Springer, Berlin, 2001

The area of data mining, or knowledge discovery in databases, started to receive a lot of attention in the 1990s. Developments in sensing, communications and storage technologies made it possible to collect and store large collections of scientific and industrial data. The abilities to analyze such data sets had not developed as fast. Data mining research arouse at the intersection of several different research areas, notably statistics, databases, machine learning and algorithms. The area can loosely be defined as the analysis of large collections of data for finding models or patterns that are interesting or valuable.

The development of data mining methods requires the solution of several different types of problems. The data can have a very large number of dimensions, indicating that for example examining every pair of variables is impractical. The data can have hundreds of millions of observations, and therefore only a limited number of passes through the data can be done. The data can be observations of a process about which very little is known; hence there is no background knowledge available, and thus selection of appropriate models can be challenging. Or there can be heaps of background knowledge available, and methods that overlook it are destined to fail.

Most data mining methods have been developed for data in the traditional matrix form: rows represent observations, and columns represent variables. This representation has been the traditional one used in statistics, and it has many advantages. For example, matrix operations can be used to represent several data analytic procedures quite succintly, and these representations make it possible to devise efficient algorithms.

However, data about the real world is seldom of this form. Rather, the application domain contains several different types of entities, of which different types of data are known. Only recently has a large body of research aimed at data mining on such data emerged.

Relational data mining studies methods for knowledge discovery in databases when the database has information about several types of objects. This, of course, is usually the case when the database has more than one table. Hence there is little doubt as to the relevance of the area; indeed, one can wonder why most of data mining research has concentrated on the single table case.

Relational data mining has its roots in inductive logic programming, an area in the intersection of machine learning and programming languages. The early work in this area aimed at the synthesis of nontrivial programs from examples and background knowledge. The results were quite fascinating, but the true applicability of the techniques became clear only when the focus changed to the discovery of useful pieces of information from large collections of data, i.e., when the techniques started to be applied to data mining issues.

The present book Relational Data Mining provides a thorough overview of different techniques and strategies used in knowledge discovery from multi-relational data. The chapters describe a broad selection of practical inductive logic programming approaches to relational data mining and give a good overview of several interesting applications. I hope that the book will stimulate the interest for practical applications of relational data mining and further research in the development of relational data mining techniques.

Heikki Mannila, Helsinki, June 2001