投影片 1 - 國立雲林科技大學
Download
Report
Transcript 投影片 1 - 國立雲林科技大學
國立雲林科技大學
National Yunlin University of Science and Technology
Toward Exploratory Test-Instance-Centered
Diagnosis in High-Dimensional Classification
Charu C. Aggarwal
TKDE, Vol. 19, No. 8, 2007, pp. 1001-1015.
Presenter : Wei-Shen Tai
Advisor : Professor Chung-Chian Hsu
2007/10/3
Intelligent Database Systems Lab
Outline
Introduction
Quantification of discriminatory subspaces
Exploratory construction of decision paths
N.Y.U.S.T.
I. M.
Determination of subspace alternatives for path construction
Construction of visual density profiles
Isolation of instance-centered local data segments
Experimental results
Conclusion and summary
Comments
2
Intelligent Database Systems Lab
Root
Motivation
Age
Education
Gender
N.Y.U.S.T.
I. M.
Family
Decision Tree and rule-based system
Salary
Strict hierarchical partitioning makes one cluster be divided
into many different nodes.
A large number of overlapping rules that are not particularly
optimized to the test instance.
Basic limitation of classification methods
The succinct summary may fail to capture such instancespecific characteristics.
This incompleteness in data characterization may result in the
particular structure of the classifiers to be more or less suited to
particular kinds of test instances.
3
Intelligent Database Systems Lab
Objective
Diagnostic classification
N.Y.U.S.T.
I. M.
Comprehensive exploratory ability for individual test
instances.
Decision path construction
For an exploratory classification of high dimensional data.
Finding the diagnostic classification behavior of a particular
test instance.
Providing user a visual representation of the data in a small
number of well chosen subspaces.
4
Intelligent Database Systems Lab
Quantification of discriminatory subspaces
N.Y.U.S.T.
I. M.
Kernel density estimation
One way of intuitively characterizing the discrimination in a
subspace is to quantify the difference in class distribution at
each point in the space.
Accuracy density A(x, Ci, D) for the class Ci
Interest density
for the class Ci
ii
The class Ci is overrepresented at x when the interest density is
larger than 1.
5
Intelligent Database Systems Lab
Exploratory construction of decision paths
N.Y.U.S.T.
I. M.
Subspace determination process
Finds the appropriate local discriminative subspaces for
a given test example.
In each of these subspaces, the user is provided
with a visual profile of the accuracy density.
In the event that a decision path is chosen, which is not
strongly indicative of any class, the user has the option to
backtrack to a higher level node and explore a different
path of the tree.
6
Intelligent Database Systems Lab
Determination of subspace alternatives for path
construction
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Speeding up subspace determination
Representative points
For the first level of the decision process, we randomly
sampled maxrep points from the database and
computed their (dominant) interest density and class in
each of the possible 2D combinations.
Maximum size sample
For subsequent iterations (lower levels of the decision
process), only a random sample of the maximum size
(maxsiz) was used for determining the classification
subspace.
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Construction of visual density profiles
Visual profile of the accuracy density
Once the most discriminatory subspaces have been
determined at a given node, we construct it in these
projections.
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Isolation of instance-centered local data segments
N.Y.U.S.T.
I. M.
Concept
An easy way of isolating smaller segments of the data
is for the user to specify an accuracy density threshold.
A well-defined local region around the test instance
can be clearly distinguished
Such a judgment can be effectively made only by
human perception and intuition.
10
Intelligent Database Systems Lab
Termination
User determination
N.Y.U.S.T.
I. M.
Provides the user with an open-ended exploratory
ability, the final decision of termination is dependent
upon the user.
Cumulative dominance level
Statistical measure of the level of significance of a
given path is obtained by computing the cumulative
dominance level of each class Ci along PATH.
11
Intelligent Database Systems Lab
Experimental results
N.Y.U.S.T.
I. M.
Commercial document from UCI
The class label was binary, 150,000 records containing
46 attributes corresponding to topical content.
12
Intelligent Database Systems Lab
Conclusion and summary
SD-Path method
N.Y.U.S.T.
I. M.
An effective exploratory instance-based approach for the
decision path construction for high-dimensional data sets.
Combining the data mining process with human interaction
in order to provide a good understanding of the
classification characteristics of a given test instance.
The ability to explore multiple paths of an instancespecific process
Provides the user with multiple perspectives of the important
characteristics in the instance.
13
Intelligent Database Systems Lab
Comments
Advantage
This proposed method provides a novel solution for finding decision
path from indeterminate instances in general classification methods.
This diagnosis helps the user understand the various combinations of
dimensions, which reveal this contradicting behavior.
Drawback
N.Y.U.S.T.
I. M.
Is it possible to remove the judgment of human intuition from the
proposed method , if the performance of SD(Subspace Decision)-path
can be measured via specified equations?
Application
Insight exploration of indeterminate instance which cannot be classified
via general classification method.
14
Intelligent Database Systems Lab