Document

Transcript Document

Panel Discussion
on Foundations of Data Mining
at RSCTC2004
J. T. Yao
University of Regina
Email: [email protected]
Web: http://www2.cs.uregina.ca/~jtyao
What is the Foundations of Data
Mining?



DM research mainly focuses on algorithms
and methodologies.
There is a lack of study on mathematical
modeling of, or foundations of, data mining
The study of foundations of data mining is
in its infancy, and there are probably more
questions than answers. (Mannila 2000)
What is the Foundations of Data
Mining?

Chen's approach (2002): data mining can
be studied from three different but related
dimensions.



The philosophical dimension deals with the
nature and scope of data mining.
The technical dimension covers data mining
methods and techniques.
The social dimension concern the social
impact and consequences of data mining.
What is the Foundations of Data
Mining?

Xie and Raghavan's approach (2002):
logical foundation of data mining based on
Bacchus' probability logic.


Precise definition of intuitive notions, such as
``pattern'', ``previously unknown knowledge''
and ``potentially useful knowledge''.
A logic induction operator is defined for
discovering ``previously unknown and
potentially useful knowledge''.
What is the Foundations of Data
Mining?

Lin's (2002), Tsumoto (2002), and Yao's (2001)
approaches: Granular computing as a basis for
data mining.
A concept consists of two parts, the intension and
extension of the concept.
 The intension of a concept consists of properties
objects.
 The extension of a concept is the set of instances.
 A rule can be expressed in the form, φ=>ψ
where φ and ψ are intensions of two concepts.
 Rules are interpreted using extensions of the two
concepts.

A Multi-level Framework for
Modeling Data Mining



The kernel focuses on the study of knowledge
without reference to data mining algorithms.
The technique levels focus on data mining
algorithms without reference to particular
application.
The application levels focus on the utility of
discovered knowledge with respect to particular
domains of applications.
How do Rough Sets Contribute to
FDM?



Knowledge is an entity in the semantic levels of
data. Knowledge embedded in data is related to
semantic interpretations of data.
The existence of knowledge in data is unrelated
to whether we have an algorithm to extract it.
We need to separate the study of knowledge
and the study of data mining algorithms, and in
turn to separate them from the study of utility of
discovered knowledge.
How do Rough Sets Contribute to
FDM?

Concepts are used as a primitive notion of data
mining:





Every concept is understood as a unit of thoughts that
consists of two parts, the intension and the extension
of the concept.
Tarski's approach is used to study concepts through
the notions of a model and satisfiability.
An information table is used as a model.
The intension of a concept is expressed by a formula
of a decision language in the information table.
The extension of a concept is expressed by a subset
of objects.
How do Rough Sets Contribute to
FDM?

Rules are used to express relationships.


Rules can be interpreted and classified in terms of
extensions of concepts and are based on probability
theory.
Many classes of rules can be defined:



association rules, exception rules, peculiarity rules, similarity,
negative association, conditional association rules.
Both concepts and rules are used as examples
to illustrate the focus of discussion at kernel
level.
References






Chen, Z. The three dimensions of data mining
foundation, FDM’02, 119-124, 2002.
Lin, T.Y. Issues in modeling for data mining,
COMPSAC’02, 1152-1157, 2002.
Mannila, H. Theoretical frameworks for data mining,
SIGKDD Explorations, (2), 30-32, 2000.
Tsumoto, S.,T.Y Lin, J.F. Peters. Foundations of Data
Mining via Granular and Rough Computing.
COMPSAC’02, 1123-1124, 2002
Yao, Y.Y. Modeling data mining with granular
computing, COMPSAC’01, 638-643, 2001.
Yao, Y.Y., A step towards the foundations of data
mining, SPIE Vol. 5098, 254-263, 2003.

Document

Transcript Document

Directory