Transcript 6-PBC

Perception-Based Classification
(PBC) System
Salvador Ledezma
[email protected]
April 25, 2002
Introduction
 Concepts
 Demo of PBC
 References:




“Towards and Effective Cooperation of the User and
Computer for Classification”
“Visual Data Mining with Pixel-oriented Visualization
Techniques”
“Visual Classification: An Interactive Approach to
Decision Tree Construction”
Mihael Ankerst, author or coauthor
Data Mining
 Exploration and Analysis, by automatic or
semi-automatic means, of large quantities of
data in order to discover meaningful patterns
and rules
 Part of Knowledge Discovery in Databases
(KDD) process
Classification
 Major task of Data Mining
 Assign object to one of a set of given classes
based on object attributes
Classification Algorithms
 Decision Tree Classifier
 Training set – set of objects whose attributes and
class is already known
 Using training set, tree classifier determines a
classification function represented by a decision
tree

Model for class attribute as a function of the values of
other attributes
 Test set – validates the classification function
Classification Example
Classification (cont)
 Usually algorithms are black boxes with no
user interaction or intervention
 Reasons for user involvement in decision
tree construction:



Use human pattern recognition capabilities
User will have better understanding of tree
User provides domain knowledge
Visual Data Mining
 Tackle data mining tasks by


enabling human involvement
Incorporating perceptivity of humans
Visual Classification
 Construction of decision trees is decomposed into
substeps
 Enables human involvement
 Example: PBC
 Data visualization based on 2 concepts


Each attribute of training data is visualized in a separate
part of screen
Different class labels of training objects are represented
by different colors
Pixel-Oriented Visualization
Techniques
 Represent each attribute value as a single
colored pixel
 Map the range of possible attribute values to
a fixed color map
 Maximizes the amount of information
represented at one time without any overlap
Circle Segments Technique
 Data is a circle divided into segments
 Each segment represents an attribute
 Attribute values are mapped by a single
colored pixel and arrangement starts in the
center and proceeds outward
 Example
Light =
high stock
price
Dark =
low stock
price
Represents 50 stocks. 1 circle represents the prices
of different stocks at the same time
Bar Visualization
 For each attribute


Attribute values are sorted into attribute lists
Classes are defined by colors
 Within a bar, sorted attribute values are
mapped to pixels, line by line
 Each attribute is placed in a different bar
DNA Training Data
Attribute 85 and
attribute 90 visually
are good candidates
for splitting tree
Algorithm picks 90
as the optimal split
PBC
 Uses pixel-oriented visualization
 Visualizes training data in order to support
interactive decision tree construction
 Examples of use





Automatic
Automatic-manual (top 2 levels)
Manual-automatic
Manual
Actual use lies somewhere in between this spectrum
Additional Functionality
 Propose split
 Look-ahead

For a hypothetical split
 Expand tree

Automatic expanding and construction
PBC demo