PowerPoint slides - University of Maryland at College Park
Download
Report
Transcript PowerPoint slides - University of Maryland at College Park
Interactive Exploration of
Hierarchical Clustering Results
HCE (Hierarchical Clustering Explorer)
Jinwook Seo and Ben Shneiderman
Human-Computer Interaction Lab
Department of Computer Science
University of Maryland, College Park
[email protected]
Cluster Analysis of Microarray
Experiment Data
• About 100 ~ 20,000 gene samples
• Under 2 ~ 80 experimental conditions
• Identify similar gene samples
– startup point for studying unknown genes
• Identify similar experimental conditions
– develop a better treatment for a special group
• Clustering algorithms
– Hierarchical, K-means, etc.
Dendrogram
-3.64
4.87
Dendrogram
-3.64
4.87
Dendrogram
-3.64
4.87
Interactive Exploration Techniques
• Dynamic Query Controls
– Number of clusters, Level of detail
• Coordinated Display
– Bi-directional interaction with 2D scattergrams
• Overview of the entire dataset
– Coupled with detail view
• Visual Comparison of Different Results
– Different results by different methods
Demonstration
• 99 Yeast genes
• 7 variables (time points)
• Download HCE at
– www.cs.umd.edu/hcil/multi-cluster
• More demonstration
– A.V. Williams Bldg, 3174
– 3:30-5:00pm, May 31.
Dynamic Query Controls
Filter out less similar genes
By pulling down the
minimum similarity bar
Show only the clusters that
satisfy the minimum
similarity threshold
Help users determine the
proper number of clusters
Easy to find the most similar
genes
Dynamic Query Controls
Adjust level of detail
By dragging up the
detail cutoff bar
Show the
representative pattern
of each cluster
Hide detail below the
bar
Easy to view global
structure
Coordinated Displays
• Two experimental conditions for the x and y
axes
• Two-dimensional scattergrams
– limited to two variables at a time
– readily understood by most users
– users can concentrate on the data without
distraction
• Bi-directional interactions between displays
Overview in a limited screen space
• What if there are more than 1,600 items to display?
• Compressed Overview : averaging adjacent leaves
• Easy to locate interesting spots
Melanoma Microarray Experiment (3614 x 38)
Overview in a limited screen space
• What if there are more than 1,600 items to display?
• Alternative Overview : changing bar width (2~10)
• Show more detail, but need scrolling
Cluster Comparison
•
•
•
•
There is no perfect clustering algorithm!
Different Distance Measures
Different Linkage Methods
Two dendrograms at the same time
– Show the mapping of each gene between the two
dendrograms
– Busy screen with crossing lines
– Easy to see anomalies
Cluster Comparison
Conclusion
• Integrate four features to interactively
explore clustering results to gain a stronger
understanding of the significance of the
clusters
– Overview, Dynamic Query, Coordination,
Cluster Comparison
• Powerful algorithms + Interactive tools
• Bioinformatics Visualization
www.cs.umd.edu/hcil/multi-cluster
July 2002 IEEE Computer Special Issue on BioInformatics
Hierarchical Clustering
Initial Data Items
Distance Matrix
Dist
A
B
C
A
B
C
D
D
A
B
C
D
20
7
2
10
25
3
Hierarchical Clustering
Initial Data Items
Distance Matrix
Dist
A
B
C
A
B
C
D
D
A
B
C
D
20
7
2
10
25
3
Hierarchical Clustering
Single Linkage
Current Clusters
Distance Matrix
Dist
A
B
C
2
A
D
B
C
D
A
B
C
D
20
7
2
10
25
3
Hierarchical Clustering
Single Linkage
Current Clusters
Distance Matrix
Dist AD
B
C
AD
20
3
B
C
A
D
B
C
10
Hierarchical Clustering
Single Linkage
Current Clusters
Distance Matrix
Dist AD
B
C
AD
20
3
B
C
A
D
B
C
10
Hierarchical Clustering
Single Linkage
Current Clusters
Distance Matrix
Dist AD
B
C
AD
20
3
B
C
3
A
D
C
B
10
Hierarchical Clustering
Single Linkage
Current Clusters
Distance Matrix
Dist AD
C
AD
C
B
A
D
C
B
B
10
Hierarchical Clustering
Single Linkage
Current Clusters
Distance Matrix
Dist AD
C
AD
C
B
A
D
C
B
B
10
Hierarchical Clustering
Single Linkage
Current Clusters
Distance Matrix
Dist AD
C
AD
C
10
B
A
D
C
B
B
10
Hierarchical Clustering
Single Linkage
Final Result
Distance Matrix
Dist AD
CB
AD
CB
A
D
C
B