Efficient Rule-Based Attribute-Oriented Induction for Data Mining
Download
Report
Transcript Efficient Rule-Based Attribute-Oriented Induction for Data Mining
Efficient Rule-Based
Attribute-Oriented Induction
for Data Mining
Authors: Cheung et al.
Graduate: Yu-Wei Su
Advisor: Dr. Hsu
Outline
Motivation
Objective
Introduction
Basic attribute-oriented induction
Rule-based concept generalization
Rule-based attribute-oriented induction
Path relation
An efficient rule-based AOI
Performance study
Conclusion
Opinion
Motivation
AOI induction capability is limited by the
unconditional concept generalization
Basic AOI has the problem of time
complexity
Objective
Extending the concept generalization to
rule-based concept hierarchy
Developing an efficient algorithm to
facilitate induction
Introduction
The growth in size of existing databases
has far exceeded the human abilities to
analyze
AOI has been implemented in DBMiner
The engine for concept generalization in
basic AOI is the concept ascension
To further enhance the capability of AOI,
need to replace concept ascension
Introduction( cont.)
Rule-based Concept Graph can be
generalized to more than one higher level
concept
Induction anomaly would occur, when a
concept tree is applied directly to a
concept graph
Instead multi-dimensional data cube to
generalized relation can improve
performance
Basic attribute-oriented induction
The purpose of AOI is to discover rules
from relations
The primary is concept generalization
Basic attribute-oriented induction( cont.)
By merging the records which have the
same generalized values, characteristics of
the data can be captured
Primitives in AOI
Task-relevant data
Data that want to be analyzed, called initial relation
Background knowledge
To support concept hierarchies
Basic attribute-oriented induction( cont.)
Representation of learning results
Concept generalization
Notation
Desirable level: an attribute contain a small number
of distinct values
Desirable attribute threshold: to control the number
of distinct values of an attribute
Minimum desirable level: when generalized to a level
lower than the current one, the distinct number would
more than desirable attribute threshold
Basic attribute-oriented induction( cont.)
1.
2.
Prime relation: every attribute is at minimum
desirable level R’
Inducing the prime relation R’ from initial
relation
Progressive generalization
Basic attribute-oriented induction( cont.)
Rule-based concept generalization
Rule-based concept generalization is a
more general scheme and a concept can
ascend via more than one path
Generalization rule can be determine
whether a concept can be generalized
along one path
Rule-based concept generalization( cont.)
Deductive rule generalization
The rule associated with a generalization path
Concept hierarchy associated with deduction
generalization rules is called deduction-rulebased concept graph
Rule-based attribute-oriented induction
General model definition (DB, CH, DS, KR,
t a)
DB: underlying database
CH: a set of rule-based concept hierarchies
DS: deduction system supporting CH
KR: representation scheme of learned result
Ta: desirable attribute threshold
The generalization and rule creation
process is fundamentally the same as
basic AOI
Rule-based attribute-oriented induction( cont.)
Concept ascension require additional
information and it may not available in the
prime relation, called Induction anomaly
An depending attribute has been removed
An depending attribute can be generalized too
high to match the condition
An depending condition can only be evaluated
against the initial relation
Rule-based attribute-oriented induction( cont.)
Path relation
Further generalization in the rule-based
case has to be started again from the
initial relation and it is costly and wasteful
Using a path relation to capture the
generalization result from one of the rules
on the initial relation
A tuple in initial relation can only be
generalized via a unique path to the root
Path relation( cont.)
Before induction starts, a preprocessing is
used to identify and label the
generalization paths of all the attributes
Every tuple can be transformed into a
tuple of ids of the associated
generalization paths
Path relation has captured completely the
generalization result of the initial relation
Path relation( cont.)
Path relation( cont.)
Another issue in rule-based
generalization is the cyclic dependency
Generalization is depended on another
attribute, if dependency is cyclic, the
deadlock will happened
Path relation( cont.)
Path relation( cont.)
Data structure for generalization
An efficient rule-based AOI
An efficient rule-based AOI( cont.)
Performance study
The time complexity of relation path
algorithm is O(n) and backtracking
algorithm is O(n log n)
The difference between two algorithms
Backtracking uses generalized relation as data
structure
Further generalization after prime relation has
been generated is based in the prime relation
Performance study( cont.)
Experiment data is a synthesized student
data
Attribute name: name, status, sex, age,
GPA
Conditions
Graduate students are at least 22 years
Ph.D. student are at least 25 years
Graduate student’s GPA are at least 3
Generalization order
Performance study( cont.)
Performance study( cont.)
Conclusion
Concept hierarchies may contain
unconditional and conditional rules, which
enlarges the application domain
Path relation algorithm has an improved
complexity of O(n)
Rule-based induction can extend the
capability of DBMiner
Opinion
The numerical attribute problem still exist
Experiments are to few and can’t see the
improved results