Why Data Mining Does Not Contribute to Business?
Download
Report
Transcript Why Data Mining Does Not Contribute to Business?
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does
Not Contribute to Business?
Mykola Pechenizkiy, Seppo Puuronen
Department of Computer Science
University of Jyväskylä
Finland
Alexey Tsymbal
Department of Computer Science
Trinity College Dublin
Ireland
Outline
• Introduction and What is our message?
• Where we are? – rigor vs. relevance in DM
• Towards the new framework for DM research
– DM System as adaptive Information System (IS)
– DM research as IS Development: DM system as
artefact
– DM success model: success factors
• Further plans and Discussion
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
2
Our Message
• DM is still a technology having great expectations
to enable organizations to take more benefit of
their huge databases.
• There exist some success stories where
organizations have managed to have competitive
advantage of DM.
• Still the strong focus of most DM-researchers in
technology-oriented topics does not support
expanding the scope in less rigorous but
practically very relevant sub-areas.
• Research in the IS discipline has strong traditions
to take into account human and organizational
aspects of systems beside the technical ones.
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
3
Our Message
•
Currently the maturation of DM-supporting processes which would
take into account human and organizational aspects is still living its
childhood.
•
DM community might benefit, at least from the practical point of
view, looking at some other older sub-areas of IT having traditions to
consider solution-driven concepts with a focus also on human and
organizational aspects.
•
The DM community by becoming more amenable to research results
of the IS community might be able to increase its collective
understanding of
– how DM artifacts are developed – conceived, constructed, and
implemented,
– how DM artifacts are used, supported and evolved,
– how DM artifacts impact and are impacted by the contexts in
which they are embedded.
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
4
Existing Frameworks for DM
• Theory-oriented
– Databases;
– Statistics;
– Machine learning;
– Data compression
• Process-oriented
– Fayyad’s
– CRISP-DM
– Reinartz’s
– Reductionist approach of
viewing DM as statistics has
advantages of the strong
background, and easyformulated problems.
– The DM tasks concerning
processes like clustering,
regression and classification
fit easily into these
approaches.
– More recent (processoriented) frameworks address
the issues related to a view of
DM as a process, and its
iterative and interactive nature
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
5
Rigor and Relevance in DM Research
• Lin in Wu et al. notices that a new successful
industry (as DM) can follow consecutive phases:
1. discovering a new idea,
2. ensuring its applicability,
3. producing small-scale systems to test the market,
4. better understanding of new technology and
5. producing a fully scaled system.
• At the present moment there are several dozens
of DM systems, none of which can be compared
to the scale of a DBMS system.
– This fact indicates that we are still in the 3rd phase in
the DM area!
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
6
Rigor vs Relevance in DM Research
Relevance
Relevance
Rigor
Rigor
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
7
Where is the focus?
•
•
•
•
Still! … speeding-up, scaling-up, and increasing the accuracies of
DM techniques.
Piatetsky-Shapiro : “we see many papers proposing incremental
refinements in association rules algorithms, but very few papers
describing how the discovered association rules are used”
Lin claims that the R&D goals of DM are quite different:
– since research is knowledge-oriented while development is
profit-oriented.
– Thus, DM research is concentrated on the development of new
algorithms or their enhancements,
– but the DM developers in domain areas are aware of cost
considerations: investment in research, product development,
marketing, and product support.
However, we believe that the study of the DM development and DM
use processes is equally important as the technological aspects and
therefore such research activities are likely to emerge within the DM
field.
Towards the new framework for
DM research …
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
8
DMS in the Kernel of an Organization
Environment
Organization
DM Task(s)
DMS (Artifact)
•
DM is fundamentally application-oriented area motivated by business
and scientific needs to make sense of mountains of data.
•
A DMS is generally used to support or do some task(s) by human
beings in an organizational environment both having their desires
related to DMS.
•
Further, the organization has its own environment that has its own
interest related to DMS, e.g. that privacy of people is not violated.
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
9
The ISs-based paradigm for DM
The External Environment
The Organizational Environment
User
Environment
The
Use
Process
IS
Development
Environment
The
Development
Process
IS
operations
environment
The
Operation
Process
The
Information
Subsystem
(ISS)
Ives B., Hamilton S., Davis G. (1980). “A Framework for Research in Computer-based MIS”
Management Science, 26(9), 910-934.
“Information systems are powerful instruments for organizational
problem solving through formal information processing”
Lyytinen, K., 1987, “Different perspectives on ISs: problems and solutions.” ACM Computing Surveys, 19(1), 5-46.
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
10
DM Artifact Development
Theory Building
DM Artifact
Development
Observation
Experimentation
A multimethodological approach to the construction of an artefact for DM
Adapted from: Nunamaker, W., Chen, M., and Purdin, T. 1990-91, Systems development
in information systems research, Journal of Management Information
Systems, 7(3), 89-106.
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
11
The Action Research and Design Science
Approach to Artifact Creation
Awareness of business
problem
Contextual
Knowledge
Business
Knowledge
Design
Knowledge
Action planning
Artifact Development
Artifact Evaluation
Action taking
Conclusion
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
12
DM Artifact Use: Success Model 1 of 3
System
Quality
Use
Information
Quality
Service
Quality
Individual
Impact
User
Satisfaction
Organizational
Impact
Adapted from D&M IS Success Models
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
13
DM Artifact Use: Success Model 2 of 3
• What are the key factors of successful use and impact of
DMS both at the individual and organizational levels.
1. how the system is used, and also supported and
evolved, and
2. how the system impacts and is impacted by the
the leadership,
communication
skills and
contexts in which
it is embedded.
understanding
of factors
the culture
of the organization
are
Coppock:
the failure
of DM-related
projects.
lessnothing
important
the
traditionally
emphasized
•nothave
to do than
with the
skill
of the modeler
or the
technological
quality of data.job of turning data into insights
• But those do include:
1. persons in charge of the project did not formulate
actionable insights,
2. the sponsors of the work did not communicate the
insights derived to key constituents,
3. the results don't agree with institutional truths
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
14
DM Artifact Use: Success Model 3 of 3
• Hermiz communicated his beliefs that there are
the four critical success factors for DM projects:
• (1) having a clearly articulated business problem that needs
to be solved and for which DM is a proper tool;
• (2) insuring that the problem being pursued is supported by
the right type of data of sufficient quality and in sufficient
quantity for DM;
• (3) recognizing that DM is a process with many components
and dependencies – the entire project cannot be "managed"
in the traditional sense of the business word;
• (4) planning to learn from the DM process regardless of the
outcome, and clearly understanding, that there is no
guarantee that any given DM project will be successful.
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
15
New Research Framework for DM Research
DM Research
Business
Needs
People
Organizations
Technology
Relevance
Rigor
Develop/Build
Assess
Refine
Justify/Evaluate
(Un-)Successful Applications in
the appropriate environment
Applicable
Knowledge
Environment
Knowledge Base
Foundations
Design knowledge
Contribution to Knowledge Base
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
16
New Research Framework for DM Research
Needs
Technology
Infrastructure
Applications
Communications
Architecture
Development
Capabilities
DM Research
Develop/Build
Theories
Artifacts
Assess
Refine
Justify/
Evaluate
Analytical
Case Study
Experimental
Field Study
Simulation
(Un-)Successful Applications in
the appropriate environment
Rigor
Knowledge
Organizations
Strategy
Structure&Culture
Processes
Business
People
Roles
Capabilities
Characteristics
Relevance
Applicable
Environment
Knowledge Base
Foundations
Base-level theories
Frameworks
Models
Instantiation
Validation Criteria
Design knowledge
Methodologies
Validation Criteria
(not instantiations
of models but KDD
processes, services,
systems)
Contribution to Knowledge Base
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
17
Further Work
• Definition of Relevance concept in DM research
• The revision of the book chapter
• Further work on the new framework for DM
research
• Organization of Workshop/Working conf. or ST on
– more social directions in DM research likely with one of the
focuses on IS as a sister discipline.
– SIAM DM 2006 Interests include
• Human Factors and Social Issues:
Ethics of Data Mining Intellectual Ownership
Privacy Models Privacy Preservation Techniques
Risk Analysis User Interfaces
Data and Result Visualization
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
18
Thank You!
Feedback is very welcome:
• Questions
• Suggestions
• Collaboration
Book chapter draft is available on request from
Mykola Pechenizkiy
Department of Computer Science and Information Systems,
University of Jyväskylä, FINLAND
E-mail: [email protected]
Tel.: +358 14 2602472 Fax: +358 14 260 3011
http://www.cs.jyu.fi/~mpechen
DMBiz’05 Porto, Portugal October 3, 2005
Why Data Mining Research Does Not Contribute to Business? by M. Pechenizkiy, S. Puuronen, A. Tsymbal
19