Research
Challenges for Data Mining (1)
-
Scalability
-
efficient and sufficient sampling schemes
-
in-memory vs. disk-based data processing
-
choice of right subset of techniques to
span most tasks
-
interfaces to large warehouses, use of
metadata to optimize access
-
client-server issues, where to perform
the processing (where and when to mine)
-
exploiting parallelism, distributed computing
over a network of workstations
General systems challenge: what will
a system that will enable exploration, visualization, analysis over large
databases look like?
