|Multi-Relational Data Mining (MRDM) is the multi-disciplinary field dealing with knowledge discovery from relational databases consisting of multiple tables. Mining data which consists of complex/structured objects also falls within the scope of this field, since the normalized representation of such objects in a relational database requires multiple tables. The field aims at integrating results from existing fields such as inductive logic programming, KDD, machine learning and relational databases; producing new techniques for mining multi-relational data; and practical applications of such techniques.
Typical data mining approaches look for patterns in a single relation of a database. For many applications, squeezing data from multiple relations into a single table requires much thought and effort and can lead to loss of information. An alternative for these applications is to use multi-relational data mining. Multi-relational data mining can analyze data from a multi-relation database directly, without the need to transfer the data into a single table first. Thus the relations mined can reside in a relational or deductive database. Using multi-relational data mining it is often also possible to take into account background knowledge, which often corresponds to views in the database.
Present MRDM approaches consider all of the main data mining tasks, including association analysis, classification, clustering, learning probabilistic models and regression. The pattern languages used by single-table data mining approaches for these data mining tasks have been extended to the multiple-table case. Relational pattern languages now include relational association rules, relational classification rules, relational decision trees, and probabilistic relational models, among others. MRDM algorithms have been developed to mine for patterns expressed in relational pattern languages. Typically, data mining algorithms have been upgraded from the single-table case: for example, distance-based algorithms for prediction and clustering have been upgraded by defining distance measures
between examples/instances represented in relational logic.
MRDM methods have been successfully applied accross many application areas, ranging from the analysis of business data, through bioinformatics (including the analysis of complete genomes) and pharmacology (drug design) to Web mining (information extraction from text and Web sources).
The aim of the workshop is to bring together researchers and practitioners of data mining interested in methods for finding patterns in expressive languages from complex / multi-relational / structured data and their applications.
This workshop is the fourth of its kind. It follows the success of the
workshops on Multi-Relational Data Mining, held at SIGKDD 2002, 2003, and 2004,
reports on which appear in SIGKDD Explorations [Vols 4(2), 5(2) and 6(2)].
Further information on the workshops can be found at web sites MRDM-2002, MRDM-2003 or MRDM-2004.
Based on MRDM-02, a special issue of SIGKDD Explorations [Vol 5(1)] was co-edited by Saso Dzeroski and Luc de Raedt.
Why the topic is of interest?
An increasing number of data mining applications involve the analysis of complex and structured types of data (such as sequences in genome analysis, HTML and XML documents) and require the use of expressive pattern languages. There is thus a clear need for multi-relational data mining (MRDM) techniques.
On the other hand, there is a wealth of recent work concerned with upgrading some recent
successful data mining approaches to relational logic. A case in point are kernel methods (support-vector machines): the development of kernels for structured and richer data types is a hot research topic. Another example is the development of probabilistic relational representations and methods for learning in them (e.g., probabilistic relational models, first-order Bayesian networks, stochastic logic programs, etc.).
In the latter case, a whole new research topic, called Statistical Relational Learning (SRL),has emerged which has recently attracted significant attention. Several successful workshops on this topic have been organized at the AAAI, IJCAI and ICML conferences. MRDM-2005 welcomes submissions on SRL related to data mining and knowledge discovery. Note that no SRL workshop is planned for this conference season and that several prominent members of the SRL community are included in the MRDM-2005 PC.
A final topic of particular interest is multi-relational data mining from continuous data streams. Mining streams instead of a static database has recently been receiving more attention in the data mining community, but within relational data mining the topic has received less attention. We explicitly sollicit work addressing the intersection of these two largely disjoint areas.
Non-exclusive list of topics, listed in alphabetical order:
- Applications of (multi-)relational data mining
- Data mining problems that require (multi-)relational methods
- Distance-based methods for structured/relational data
- Inductive databases
- Kernel methods for structured/relational data
- Link analysis and discovery
- Methods for (multi-)relational data mining
- Mining structured data, such as amino-acid sequences, chemical compounds, HTML and XML documents, ...
- Mining relational data from continuous streams
- Propositionalization methods for transforming (multi-)relational data mining problems to single-table data mining problems
- Relational neural networks
- Relational pattern languages
- Statistical relational learning (Learning in probabilistic relational representations)
The interest of the KDD community in MRDM has increased sharply over the last few years. An evidence for this is also the success of the previous three MRDM workshops, as well as the MRDM tutorial at KDD-2003 (given by Saso Dzeroski and Luc De Raedt), all of which attracted many participants.
Contact information of organizers
Saso Dzeroski (
Jozef Stefan Institute, Jamova 39, SI-1000 Ljubljana, Slovenia.
phone: +386 1 477 3217, fax: +386 1 477 3131
Katholieke Universiteit Leuven, Department of Computer Science
Celestijnenlaan 200A, B-3001 Heverlee, Belgium
Program Committee Members
- Jean-Francois Boulicaut (University of Lyon)
- Jeffrey Coble (University of Texas at Arlington)
- Diane Cook (University of Texas at Arlington)
- Luc Dehaspe (PharmaDM)
- Pedro Domingos (University of Washington)
- Peter Flach (University of Bristol)
- Thomas Gaertner (Fraunhofer Institute for Autonomous Intelligent Systems, Sankt Augustin)
- Lise Getoor (University of Maryland)
- Jiawei Han (University of Illinois at Urbana-Champaign)
- David Jensen (University of Massachusetts at Amherst)
- Kristian Kersting (Albert-Ludwigs-Universitaet Freiburg)
- Joerg-Uwe Kietz (kdlabs AG, Zurich)
- Ross King (University of Aberystwith)
- Joost Kok (Leiden University)
- Stefan Kramer (Technical University Munich)
- Nada Lavrac (Jozef Stefan Institute) - to be confirmed
- Donato Malerba (University of Bari)
- Stan Matwin (University of Ottawa)
- Hiroshi Motoda (University of Osaka)
- David Page (University of Wisconsin at Madison)
- Alexandrin Popescul (University of Pennsylvania)
- Raghu Ramakrishnan (University of Wisconsin - Madison)
- Foster Provost (Stern School of Business, New York University)
- Celine Rouveirol (University Paris Sud XI)
- Michele Sebag (University Paris Sud XI) - to be confirmed
- Arno Siebes (Universiteit Utrecht)
- Ashwin Srinivasan (IBM India) - to be confirmed
- Jan Struyf (Katholieke Universiteit Leuven)
- Takashi Washio (University of Osaka)
- Stefan Wrobel (Fraunhofer Institute for Autonomous Intelligent Systems, Sankt Augustin / University of Bonn)
- Mohammed Zaki (Rensselaer Polytechnic Institute)
Deadline for submissions: June 10, 2005
Notification: July 4, 2005
Camera ready: July 20, 2005
Workshop day: August 21, 2005