Data Mining: Algorithms and Limitations

Usama Fayyad

Invited talk at ILP-97


Knowledge Discovery in Databases (KDD) and Data Mining are concerned with the extraction of high-level information (knowledge) from low-level data (usually stored in large databases). We give an overview of this rapidly growing area, define the goals, present motivation, and give a high-level definition of the KDD Process and how it relates to Data Mining. We then focus on data mining methods. These methods have their origins in statistics, pattern recognition, artificial intelligence, visualization, databases, and parallel computing. Basic coverage of a sampling of data mining methods will be provided to give a feel for what the methods are about and how they are used. We outline current limitations of methods and directions for extending them to address some of the standing challenges of dealing with large databases (scalability) and making the methods easier to use by nonexperts (automation).

Some biographical data on Usama Fayyad:

Usama Fayyad is a Senior Researcher at Microsoft Research. After receiving the Ph.D. degree in 1991, he joined the Jet Propulsion Laboratory (JPL), California Institute of Technology (until 1996). At JPL, he headed the Machine Learning Systems Group where he developed data mining systems for analysis of large scientific databases. He remains affiliated with JPL as a Distinguished Visiting Scientist. Fayyad received the JPL 1993 Lew Allen Award for Excellence in Research, and the 1994 NASA Exceptional Achievement Medal. He was program co-chair of KDD-94 and KDD-95 (the First International Conference on Knowledge Discovery and Data Mining). He is general chair of KDD-96, editor-in-chief of the journal Data Mining and Knowledge Discovery, and co-editor of the new MIT Press book (1996): Advances in Knowledge Discovery and Data Mining. His research interests include knowledge discovery in large databases, data mining, machine learning, statistical pattern recognition, and clustering. Check out Usama Fayyad's home page at Microsoft Research.