Home Call for Papers Committees and Chairs Awards Venue and Travel Paper Submission Registration Program Links Sponsors 
Conference ProgramThe Joint (DS and ALT) Conference Program is now available in PDF format. You can also take a look at the DS06 program schedule or the ALT06 program schedule in HTML format. Joint Invited Speaker (DS and ALT)Andrew Ng, Stanford University, USA Reinforcement Learning and Apprenticeship Learning for Robotic Control Many control problems, such as autonomous helicopter flight, legged robot locomotion, and autonomous driving are difficult because (i) It is hard to write down, in closed form, a formal specification of the control task (for example, what is the cost function for "driving well"?), (ii) It is difficult to learn good models of the robot's dynamics, and (iii) It is expensive to find closedloop controllers for high dimensional, highly stochastic domains. Using apprenticeship learning—in which we learn from a human demonstration of a task—as a unifying theme, I will present formal results showing how many control problems can be efficiently addressed given access to a demonstration. In presenting these ideas, I will also draw from a number of case studies, including applications in autonomous helicopter flight, quadruped obstacle negotiation, snake robot locomotion, and highspeed offroad navigation. Finally, I will also describe the application of these ideas to the STAIR (STanford AI Robot) project, which has the long term goal of integrating methods from all major areas of AI—including spoken dialog/NLP, manipulation, vision, navigation, and planning—to build a generalpurpose, "intelligent" home/office robotic assistant. Joint work with Pieter Abbeel, Adam Coates, Ashutosh Saxena, Jeremy Kolter, Honglak Lee, Yirong Shen, Justin Driemeyer, Justin Kearns, and Chioma Osondu. Invited Speakers (DS)Carole Goble, University of Manchester, UK Putting Semantics into eScience and the Grid eScience is scientific investigation performed through distributed global collaborations between scientists and their resources, and the computing infrastructure that enables this. Scientific progress increasingly depends on pooling knowhow and results; making connections between ideas, people, and data; and finding and interpreting knowledge generated by strangers in new ways other than that intended at its time of collection. It is about harvesting and harnessing the /collective intelligence/ of the scientific community. It has as much to do with intelligent information management as with sharing scarce resources like large scale compute power or expensive instrumentation. The Semantic Web is an initiative to enable and operate a semantic infrastructure for gathering and exploiting the Web's collective intelligence, exploiting technologies primarily from artificial intelligence and data management computing. Applying the Semantic Web paradigm to eScience seems like it might be a winner. Moreover, eScience looks promising as the nursery that the fledgling Semantic Web needs in order to mature. This talk makes a case for why eScience needs the Semantic Web and the Semantic Web needs eScience, using my experiences from working in the Life Sciences. Padhraic Smyth, University of California, Irvine, USA DataDriven Discovery using Probabilistic Hidden Variable Models Generative probabilistic models have proven to be a very useful framework for machine learning from scientific data. Key ideas that underlie the generative approach include (a) representing complex stochastic phenomena using the structured language of graphical models, (b) using latent (hidden) variables to make inferences about unobserved phenomena, and (c) leveraging Bayesian ideas for learning and prediction. This talk will begin with a brief review of learning from data with hidden variables and then discuss some exciting recent work in this area that has direct application to a broad range of scientific problems. A number of different scientific data sets will be used as examples to illustrate the application of these ideas in probabilistic learning, such as timecourse microarray expression data, functional magnetic resonance imaging (fMRI) data of the human brain, text documents from the biomedical literature, and sets of cyclone trajectories. Invited Speakers (ALT)Gunnar Raetsch, Max Planck Society, Germany The Solution of SemiInfinite Linear Programs using Boostinglike Methods We consider methods for the solution of large linear optimization problems, in particular socalled SemiInfinite Linear Programs (SILPs) that have a finite number of variables but infinitely many linear constraints. We illustrate that such optimization problems frequently appear in machine learning and discuss several examples including maximum margin boosting, multiple kernel learning and structure learning. In the second part we review methods for solving SILPs. Here, we are particularly interested in methods related to boosting. We review recent theoretical results concerning the convergence of these algorithms and conclude this work with a discussion of empirical results comparing these algorithms. Hans Ulrich Simon, RuhrUniversity Bochum, Germany The Usage of the Spectral Norm in Learning Theory: Some Selected Topics In the talk, we review some known results about the statistical query complexity of a concept class and the spectral norm of its correlation matrix. Since spectral norms are widely used in various other areas, we are then able to put statistical query complexity in a broader context. We briefly describe some surprising connections to (seemingly) different topics in learning theory, complexity theory, and cryptography. A connection to the socalled Hidden Number Problem, which plays an important role for proving bitsecurity of cryptographic functions, will be discussed in somewhat more detail. Slides of the talk available here. TutorialsMichael May, Fraunhofer Institute for Autonomous Intelligent Systems, Germany Geographic and Spatial Data Mining The widespread use of ubiquitous and mobile technologies such as sensor networks, GPS, mobile phones and RFID, as well as the recent success of Google Earth lead to a situation where more and more data mining applications will have to deal with nontrivial problems of spatiotemporal data analysis. Applications range from telecommunication, retail and market research to scientific applications from ecology or epidemiology. Despite the importance, standard data mining tools and methods cannot not adequately deal with spatial information. Consequently, important information is thrown away, leading to nonoptimal results. The last years have seen several lines of research that try to change this situation. Various classes of data mining algorithms  e.g. clustering, association rules, decision trees, subgroup discovery  have been upgraded to handle geographic objects such as lines, points and polygons and their spatial relationships. Nicely complementing classical approaches that have been pioneered in geostatistics (e.g. Kriging, Point Pattern Analysis), those approaches are often rooted in some form of MultiRelational Data Mining. In this tutorial, we will first clarify the various data types relevant for geographic data mining and work out the specific characteristics and challenges of geographic data. Next, we discuss several examples of algorithms that take advantage of these data types. Finally, we present a wide range of applications to illustrate the potential, successes and shortcomings of current Spatial Data Mining approaches. We conclude by pointing out some future challenges and directions. Luis Torgo, University of Porto, Portugal Using R for Data Mining and Scientific Discovery R is a freely downloadable language and environment for data analysis. The R community has been growing at a very fast rate, the same happening to the list of available addon packages addressing a very large set of domains of application. The main purpose of this tutorial is to illustrate R capabilities on typical data mining and scientific discovery tasks. We aim to convince you that R is an excellent tool to implement ideas to solve specific tasks within these areas. We will pursuit our goal by means of presenting a set of concrete case studies. These case studies will be described and all necessary steps to reach the results using R will be provided as a means of both introducing you to R, but also for allowing you to continue, adapt, and change these "solutions" after attending the tutorial. An associated web site will be made available containing all code and data necessary for you to replicate what will be shown in the tutorial, following the open source spirit of the R project. Our presentation of R will be illustrated by three different case studies. The first is an ecological modelling task, where the main objective is to obtain models that are able to early forecast harmful algae blooms in a river dam used to collect potable water. The second case study is related to stock market trading. We will show how to obtain models for these complex dynamic systems, and also how to use these models for decision making. Finally, the third case study addresses the exploratory analysis of microarray genomic data so common in bioinformatics applications. List of Accepted PapersLong Papers
Regular (Short) Papers
