Meta-Knowledge Management in Multistrategy Process

Download Report

Transcript Meta-Knowledge Management in Multistrategy Process

TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges
in Knowledge Discovery Systems
Mykola Pechenizkiy, Seppo Puuronen
Department of Computer Science
University of Jyväskylä
Finland
Alexey Tsymbal
Department of Computer Science
Trinity College Dublin
Ireland
Outline
• Introduction
– KDD
– Selection of DM strategy for a problem at hand
– Meta-learning
• Our goal
– To propose a knowledge-driven approach to enhance
the selection of DM strategies in KDSs.
• Need for KM
• What are the challenges
– KM processes wrt problem of DM strategy selection
• Further research
• Discussion
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
2
I
Knowledge discovery as a process
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.,
Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1997.
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
3
CRISP-DM
http://www.crisp-dm.org/
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
4
KDD Process: “Vertical Solutions”
Busine ss
Understanding
Data
Understanding
Data
Preparation
Ex
p
er
Data
Exproration
ien
ce
ac
cu
mu
la
t io
Data
Mining
n
Evaluation &
Interpretation
Deployment
Reinartz, T. 1999, Focusing Solutions for Data Mining.
LNAI 1623, Berlin Heidelberg.
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
5
The Search for Scientific Methods and Meta-Learning
• Adequate scientific methods make induction easier with
a smaller number of examples.
• The choice of methods needs to be based on a higher
level induction or on meta-learning in the context of
machine learning.
• “knowledge concerning the most appropriate method for
a given goal can be obtained by induction on the
database of history of science a collection of problems of
different methods, different goals and different degrees of
success” [Laudan]
• Meta-learning can produce rules concerning the use of
the alternative strategies, methodological knowledge, or
correct predictions concerning the best rank of strategies
for a new task.
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
6
Dynamic Selection of DM Methods
• … in KDSs has been under active study
• 2 contexts of dynamic selection:
– multi-classifier systems that apply different
ensemble techniques (Dietterich, 1997).
• Their general idea is usually to select one classifier
on the dynamic basis taking into account the local
performance (e.g. generalisation accuracy) in the
instance space.
– multistrategy learning (Michalski)
• applies a strategy selection approach which takes
into account the classification problem- related
characteristics (meta-data).
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
7
Selection of the most appropriate DM technique
• Motivation
– No Free Lunch theorem;
– many empirical studies show
• one learning strategy can perform significantly better than another
strategy on a group of problems that are characterised by some
properties (Kiang, 2003).
• Problem
– Selection is usually not straightforward.
– some knowledge is required for making a decision about appropriate
techniques’ selection and DM strategy construction for a problem at hand.
• We distinguish 2 levels of knowledge:
– the knowledge extracted from data that represents the problem to be
mined by means of applying a DM technique
– the higher-level knowledge (from the KDS perspective) required for
managing techniques’ selection, combination and application => metaknowledge.
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
8
Meta-learning
• or “learning to learn” – the effort to
automatically induce dependencies:
– learning tasks  learning strategies.
• based on the assumptions that it is
possible
– to evaluate and compare learning strategies,
– to measure the benefits of early learning on
subsequent learning,
– to use such evaluations to reason about
learning strategies
• select useful ones and disregard the useless or
misleading strategies (Schmidhuber et al., 1996).
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
9
in Meta-learning …
• in the context of classifier ensembles, where only
the data itself is used to make decisions about
method selection,
– rather good practical results are shown in experiments
supported by theoretical studies as well;
• in dynamic integration of DM strategies for a data
set at hand:
– a multistrategy approach based on the ideas of
constructive induction and conceptual clustering
(Michalski, 1997)
– several studies on automatic classifier selection via
meta-learning (Kalousis, 2002)
• No practical success!
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
10
Meta-Learning
Collection of
data sets
Performance criteria
A new data set
Collection of
techniques
Meta-learning space
Knowledge
repository
Meta-model
Suggested technique
Evaluation
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
11
Problems with Meta-Learning for DM SS
• Representativeness of meta-data samples
– Meta-learning space is large
– Computationally expensive to produce meta-data
samples
– Curse of dimensionality
– Many possible irrelevant features wrt
collected/produced meta-data
• Complexity of statistical measures
– Why do we need to spend time to characterize the
dataset if we can use this time to try different DM
approaches and select the best one?
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
12
Our goal and focus: KM perspective
• to propose a knowledge-driven approach to enhance
the dynamic integration of DM strategies in
knowledge discovery systems;
• focus on KM aimed to organise a systematic process
of knowledge capture and refinement over time.
• We consider the basic knowledge management
processes of
– knowledge creation and identification,
– representation, collection and organization,
– sharing and integration,
– adaptation and application
with respect to the introduced concept of meta-knowledge.
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
13
Introducing KM to DM SS
•
Generally, the problem of knowledge capture, storage, and
dissemination is similar to data and information management in ISs,
and therefore some executives prefer to view KM as a natural
extension to IS functions (Alavi and Leidner, 1999).
•
Zack (1999) – the most practical way to define KM is to show on the
existing IT infrastructure the involvement of:
– (1) knowledge repositories,
– (2) best-practices and lessons-learned systems,
– (3) expert networks [these are DM experts], and
– (4) communities of practice [these are end-users].
Knowledge
Creation &
Acquisition
Knowledge
Organization &
Storage
Knowledge
Distribution &
Integration
Knowledge
Adaptation &
Application
Knowledge Evaluation, Validation and Refinement
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
14
Transformations of data and knowledge concepts
Entities
Reality
Data
Capture, Transmission, Representation,
Recording, Storage, Archiving, Deletion
Data Processing
Attributes
Information
Information Processing
Knowing that and what
Knowledge
Knowing how and why
Knowing when, where and what for
Knowledge
Processing
Wisdom
(adopted from Spiegler, 2000)
Knowledge is “justified belief that increases an entity’s capacity for effective
action” (Nonaka, 1994).
A long history of epistemological debates, and discussion of knowledge from
different perspectives in Polanyi (1962).
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
15
Different types of knowing
Knowing
that and what
how
where
when
why
who
how much
what for
Analysis
Conceptual
Functional
Spatial
Temporal
Causal
Organizational
Economical
Strategic
Context
concepts, relationships, i.e. declarative knowledge
hypothesis, i.e. procedural knowledge
data set characterization
temporal context
higher-level abstraction
integration, sharing
benefits, risks, resources
business DM goals, domain knowledge
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
16
Knowledge distribution and knowledge integration
4 potential sources of knowledge that has to be
integrated in the repository of KDS system:
– (1) knowledge from an expert in data-mining, knowledge
discovery, statistics and related fields;
– (2) knowledge from a data-mining practitioner;
– (3) knowledge from laboratory experiments on synthetic
data sets; and, finally,
– (4) knowledge from field experiments on real-world
problems.
– Beside this, research and business communities, and
similar KDSs themselves can organize different trusted
networks, where participant are motivated to share their
knowledge.
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
17
Knowledge Repository Lifecycle (1 of 2)
• Since the repository is created it tends to grow and
at some point it naturally begins to collapse under its
own weight, requiring major reorganization.
– needs for continuously update,
• some content needs to be deleted (if misleading), deactivated
or archived (if it is potentially useful).
• if similar contributions are combined, generalized and
restructured, the content may become less fragmented and
redundant.
• The process of filtering knowledge claims into
accepted or suppressed is important
– when a plenty of claims are produced automatically they
need to be filtered automatically.
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
18
Knowledge Repository Lifecycle (2 of 2)
• “knowing when” and “knowing where” contexts:
– when the environment changes, all of the general rules without
specifying the context could become invalid.
– some knowledge should exist that would guide an organization to
change the repository when the environment calls for it.
• Some knowledge claims are naturally in constant competition
with the other claims.
– Disagreements within the knowledge repository need to be
resolved by means of generalization of some parts and
contextualization of the others.
• In order to increase the quality and validity of knowledge, it
needs to be continually tested, improved or removed.
• Some basic principles of triggers can be introduced
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
19
Knowledge validity and knowledge quality
•
The contexts “knowing when” and “knowing where” can be discovered
before it appears in a real situation.
– Active learning
– Zooming in and zooming out procedures
– Search for balance between generality, compactness, interpretability, and
understandability and sensitiveness to the context, exactness, precision,
and adequacy of (meta-)knowledge.
– context conditions can be important for knowledge quality estimation
•
•
•
The quality of knowledge can be estimated by its ability to help a KDS
produce solutions faster and more effectively.
Knowledge claims have both a degree of utility and a degree of
satisfaction.
To determine the relative quality of a validated knowledge claim,
evaluation criteria should be defined:
– complexity, usefulness, and predictive power are well formalised and easy
to estimate;
– understandability, reliability of source, explanatory power are rather
subjective and therefore inaccurate.
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
20
Limitations
• The goal of KM here is to make more effective
and efficient use of available DM techniques.
• The most important issues in knowledge
management:
– (1) executive/strategic management,
– (2) operational management,
• the identification of available knowledge,
• seeking ways to capture it in a KM process,
• and analysing the ability to design an KM
(sub)system including its tools and applications
– (3) costs, benefits, and risks management, and
– (4) standards in the KM technology and communication.
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
21
Further Research
Knowledge
Creation &
Acquisition
Knowledge
Organization &
Storage
Knowledge
Distribution &
Integration
Knowledge
Adaptation &
Application
Knowledge Evaluation, Validation and Refinement
• Implementation of presented knowledge-driven
framework for a KDS that contains a limited
number of DM techniques of a certain type
– Feature extraction techniques and classification
techniques
• Evaluation of the framework in practice for realworld problems in a distributed environment
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
22
Thank You!
Feedback is very welcome:
• Questions
• Suggestions
• Guidelines
• Collaboration
Contact Info:
Mykola Pechenizkiy
Department of Computer Science and Information Systems,
University of Jyväskylä, FINLAND
E-mail: [email protected]
Tel.: +358 14 2602472 Fax: +358 14 260 3011
http://www.cs.jyu.fi/~mpechen
TAKMA’05 Copenhagen, Denmark August 22-26, 2005
Knowledge Management Challenges in Knowledge Discovery Systems by Pechenizkiy, Tsymbal, Puuronen
23