PowerPoint slides - University of Maryland at College Park

Download Report

Transcript PowerPoint slides - University of Maryland at College Park

Interactive Exploration of
Hierarchical Clustering Results
HCE (Hierarchical Clustering Explorer)
Jinwook Seo and Ben Shneiderman
Human-Computer Interaction Lab
Department of Computer Science
University of Maryland, College Park
[email protected]
Cluster Analysis of Microarray
Experiment Data
• About 100 ~ 20,000 gene samples
• Under 2 ~ 80 experimental conditions
• Identify similar gene samples
– startup point for studying unknown genes
• Identify similar experimental conditions
– develop a better treatment for a special group
• Clustering algorithms
– Hierarchical, K-means, etc.
Dendrogram
-3.64
4.87
Dendrogram
-3.64
4.87
Dendrogram
-3.64
4.87
Interactive Exploration Techniques
• Dynamic Query Controls
– Number of clusters, Level of detail
• Coordinated Display
– Bi-directional interaction with 2D scattergrams
• Overview of the entire dataset
– Coupled with detail view
• Visual Comparison of Different Results
– Different results by different methods
Demonstration
• 99 Yeast genes
• 7 variables (time points)
• Download HCE at
– www.cs.umd.edu/hcil/multi-cluster
• More demonstration
– A.V. Williams Bldg, 3174
– 3:30-5:00pm, May 31.
Dynamic Query Controls
Filter out less similar genes
 By pulling down the
minimum similarity bar
 Show only the clusters that
satisfy the minimum
similarity threshold
 Help users determine the
proper number of clusters
 Easy to find the most similar
genes
Dynamic Query Controls
Adjust level of detail
 By dragging up the
detail cutoff bar
 Show the
representative pattern
of each cluster
 Hide detail below the
bar
 Easy to view global
structure
Coordinated Displays
• Two experimental conditions for the x and y
axes
• Two-dimensional scattergrams
– limited to two variables at a time
– readily understood by most users
– users can concentrate on the data without
distraction
• Bi-directional interactions between displays
Overview in a limited screen space
• What if there are more than 1,600 items to display?
• Compressed Overview : averaging adjacent leaves
• Easy to locate interesting spots
Melanoma Microarray Experiment (3614 x 38)
Overview in a limited screen space
• What if there are more than 1,600 items to display?
• Alternative Overview : changing bar width (2~10)
• Show more detail, but need scrolling
Cluster Comparison
•
•
•
•
There is no perfect clustering algorithm!
Different Distance Measures
Different Linkage Methods
Two dendrograms at the same time
– Show the mapping of each gene between the two
dendrograms
– Busy screen with crossing lines
– Easy to see anomalies
Cluster Comparison
Conclusion
• Integrate four features to interactively
explore clustering results to gain a stronger
understanding of the significance of the
clusters
– Overview, Dynamic Query, Coordination,
Cluster Comparison
• Powerful algorithms + Interactive tools
• Bioinformatics Visualization
www.cs.umd.edu/hcil/multi-cluster
July 2002 IEEE Computer Special Issue on BioInformatics
Hierarchical Clustering
Initial Data Items
Distance Matrix
Dist
A
B
C
A
B
C
D
D
A
B
C
D
20
7
2
10
25
3
Hierarchical Clustering
Initial Data Items
Distance Matrix
Dist
A
B
C
A
B
C
D
D
A
B
C
D
20
7
2
10
25
3
Hierarchical Clustering
Single Linkage
Current Clusters
Distance Matrix
Dist
A
B
C
2
A
D
B
C
D
A
B
C
D
20
7
2
10
25
3
Hierarchical Clustering
Single Linkage
Current Clusters
Distance Matrix
Dist AD
B
C
AD
20
3
B
C
A
D
B
C
10
Hierarchical Clustering
Single Linkage
Current Clusters
Distance Matrix
Dist AD
B
C
AD
20
3
B
C
A
D
B
C
10
Hierarchical Clustering
Single Linkage
Current Clusters
Distance Matrix
Dist AD
B
C
AD
20
3
B
C
3
A
D
C
B
10
Hierarchical Clustering
Single Linkage
Current Clusters
Distance Matrix
Dist AD
C
AD
C
B
A
D
C
B
B
10
Hierarchical Clustering
Single Linkage
Current Clusters
Distance Matrix
Dist AD
C
AD
C
B
A
D
C
B
B
10
Hierarchical Clustering
Single Linkage
Current Clusters
Distance Matrix
Dist AD
C
AD
C
10
B
A
D
C
B
B
10
Hierarchical Clustering
Single Linkage
Final Result
Distance Matrix
Dist AD
CB
AD
CB
A
D
C
B