Transcript art
Panel: The Art of Data Mining, and
the Quest for Greater Insight
Moderator:
Kate Smith-Miles, Deakin University, Australia
Panelists:
Kristin Bennett, Rensselaer Polytechnic Institute, USA
Sven Crone, Lancaster University, UK
Wlodzislaw Duch, Nicolaus Copernicus University, Poland
Isabelle Guyon, ClopiNet, USA
Nik Kasabov, Auckland University of Technology, New
Zealand
Zhi-Hua Zhou, Nanjing University, China
Overview
The data mining process requires a number of decisions
to be made in each stage:
selection of data and variables,
choice of suitable sampling methods,
data pre-processing steps,
selection of the best knowledge discovery algorithms
selection of parameters.
With so many choices that can have significant impact
upon the eventual success of the results, data mining
can sometimes be seen as more art than science unless
the user is highly knowledgeable.
Is there a science to data mining? Or is it still more art
than science? What insights do our experts have about
which methods to use when?
Aims
This panel discussion aims to bring together
experts in data mining to see if we can come up
with some ideas about:
our collective knowledge of when certain techniques
(algorithms, pre-processing methods, etc.) are
expected to perform well.
How much insight do we have into the most effective
data mining process?
How can recent research in model selection and
meta-learning help us to gain greater insight into the
most effective data mining steps for a given problem?
Can we take some of the mystery and need for trial
and error out of the process, and come up with some
expert guidelines, and lay the foundations for
merging this information with large scale empirical
analysis in the future?
Questions for discussion
Is there a science to data mining?
Do you have your own rules (developed by experience)
about when certain methods should be used, or not used?
selection of data and variables,
choice of suitable sampling methods,
data pre-processing steps,
selection of the best knowledge discovery algorithms
selection of parameters.
What about empirical studies (meta-learning, model
selection, etc.) aimed to learn these rules?
What would we need to do to take the trial-and-error and
art out of the process to make data mining more userfriendly and effective?
Next steps?