Transcript 6-PBC
Perception-Based Classification
(PBC) System
Salvador Ledezma
[email protected]
April 25, 2002
Introduction
Concepts
Demo of PBC
References:
“Towards and Effective Cooperation of the User and
Computer for Classification”
“Visual Data Mining with Pixel-oriented Visualization
Techniques”
“Visual Classification: An Interactive Approach to
Decision Tree Construction”
Mihael Ankerst, author or coauthor
Data Mining
Exploration and Analysis, by automatic or
semi-automatic means, of large quantities of
data in order to discover meaningful patterns
and rules
Part of Knowledge Discovery in Databases
(KDD) process
Classification
Major task of Data Mining
Assign object to one of a set of given classes
based on object attributes
Classification Algorithms
Decision Tree Classifier
Training set – set of objects whose attributes and
class is already known
Using training set, tree classifier determines a
classification function represented by a decision
tree
Model for class attribute as a function of the values of
other attributes
Test set – validates the classification function
Classification Example
Classification (cont)
Usually algorithms are black boxes with no
user interaction or intervention
Reasons for user involvement in decision
tree construction:
Use human pattern recognition capabilities
User will have better understanding of tree
User provides domain knowledge
Visual Data Mining
Tackle data mining tasks by
enabling human involvement
Incorporating perceptivity of humans
Visual Classification
Construction of decision trees is decomposed into
substeps
Enables human involvement
Example: PBC
Data visualization based on 2 concepts
Each attribute of training data is visualized in a separate
part of screen
Different class labels of training objects are represented
by different colors
Pixel-Oriented Visualization
Techniques
Represent each attribute value as a single
colored pixel
Map the range of possible attribute values to
a fixed color map
Maximizes the amount of information
represented at one time without any overlap
Circle Segments Technique
Data is a circle divided into segments
Each segment represents an attribute
Attribute values are mapped by a single
colored pixel and arrangement starts in the
center and proceeds outward
Example
Light =
high stock
price
Dark =
low stock
price
Represents 50 stocks. 1 circle represents the prices
of different stocks at the same time
Bar Visualization
For each attribute
Attribute values are sorted into attribute lists
Classes are defined by colors
Within a bar, sorted attribute values are
mapped to pixels, line by line
Each attribute is placed in a different bar
DNA Training Data
Attribute 85 and
attribute 90 visually
are good candidates
for splitting tree
Algorithm picks 90
as the optimal split
PBC
Uses pixel-oriented visualization
Visualizes training data in order to support
interactive decision tree construction
Examples of use
Automatic
Automatic-manual (top 2 levels)
Manual-automatic
Manual
Actual use lies somewhere in between this spectrum
Additional Functionality
Propose split
Look-ahead
For a hypothetical split
Expand tree
Automatic expanding and construction
PBC demo