An Evaluation of Microarray Visualization Tools for Biological Insight
Download
Report
Transcript An Evaluation of Microarray Visualization Tools for Biological Insight
An Evaluation of Microarray Visualization Tools
for Biological Insight
Purvi Saraiya
Chris North
Dept. of Computer Science
Virginia Polytechnic Institute
and State University
Karen Duca
Virginia Bioinformatics
Institute
Virginia Polytechnic Institute
and State University
Presented by
Tugrul Ince and Nir Peer
University of Maryland
Goals
Evaluate five popular visualization tools
Cluster/Treeview
TimeSearcher
Hierarchical Clustering Explorer (HCE)
Spotfire
GeneSpring
Do so in the context of bioinformatics data
exploration
2
Goals
Research Questions
How successful are these tools in stimulating insight?
How do various visualization techniques affect the
users’ perception of data?
How does users’ background affect the tool usage?
How do these tools support hypothesis generation?
Can insight be measured in a controlled experiment?
3
Visualization Evaluations
Typically evaluations consist of
controlled measurements of user performance and
accuracy on predetermined tasks
We are looking for an evaluation that better
simulates a bioinformatics data analysis scenario
We use a protocol the focuses on
recognition and quantification of insights gained from
actual exploratory use of visualizations
4
Insights
Hard to define what is an “insight”
We need this term to be quantifiable and
reproducible
Solution
Encourage users to think aloud
and report any findings they have about the dataset
Videotape a session to capture and characterize
individual insights as they occur
generally provides more information than subjective
measures from post-experiment surveys
5
Insights
Define insight as
an individual observation about the data by the
participant
a unit of discovery
Essentially, any data observation made during the
think aloud protocol
Now we can quantify some characteristics of
each insight
6
Insight Characteristics
Observation
Time
Hypothesis and direction of research
Directed vs. Unexpected
The significance of the insight. Coded by a domain expert.
Hypotheses
The amount of time taken to reach the insight
Domain Value
The actual finding about the data
Recall: participants are asked to identify questions they want to explore
Correctness
Breadth vs. Depth
7
Insight Characteristics
Category
Overview – overall distributions of gene expression
Patterns – identification or comparison across data
attributes
Groups – identification or comparison of groups of
genes
Details – focused information about specific genes
8
Experiment Design
A 35 between-subjects design
between-subjects different subjects for each pair
Dataset: 3 treatments
Visualization tool: 5 treatments
9
Experiment Design
Participants
2 participants per dataset per tool
Have at least a Bachelor’s degree in a biological field
Assigned to tools they had never worked with before
to prevent advantage
measure learning time
Categories
10 Domain Experts
11 Domain Novices
Senior researchers with extensive experience in microarray experiments
and microarray data analysis
Lab technicians or graduate student research assistants
9 Software Developers
Professionals who implement microarray software tools
10
Protocol and Measures
Chose new users with only minimal tool training
Participants received an initial training
Success in the initial usage period is critical for the tool’s
adoption by biologists
Background description about the dataset
15-minute tool tutorial
Participants listed some analysis questions
Instructed to examine the data with the tool as long as
needed
They were allowed to ask for help about the tool
Simulates training by colleagues
11
Protocol and Measures
Every 15 minutes, participants estimated percent
of total potential insight they obtained so far
Finally, assessed overall experience with the
tools during session
Entire session was videotaped for later analysis
Later, all individual occurrences of insights were
identified and codified
12
Show me pictures
Here are the tools!!!
13
Cluster/TreeView = ClusterView
Cluster
TreeView
to cluster data
Visualize the clusters
Uses heat-maps
14
TimeSearcher 1
Parallel Coordinate
Visualization
Interactive Filtering
Line Graphs for each
data entity
15
HCE
Clusters data
Several Visualizations
Heat-Maps
Parallel Coordinates
Scatter Plots
Histograms
Brushing and Linking
16
Spotfire
General Purpose
Visualization Tool
Several Displays
Scatter Plots
Bar Graphs
Histograms
Pie/Line Charts
Others…
Dynamic Query Sliders
Brushing and Linking
17
GeneSpring
Suitable for Microarray
data analysis
Shows physical positions
on genomes
Array layouts
Pathways
Gene-to-gene comparison
Brushing and Linking
Clustering capability
18
Enough about Tools,
Tell me the Results!!!
19
Number of Insights
ClusterView TimeSearcher 1
HCE
Spotfire
GeneSpring
Spotfire: Highest number of insights
HCE: poorest
20
Total Domain Value
ClusterView TimeSearcher 1
HCE
Spotfire
GeneSpring
Spotfire: Highest insight value
HCE, GeneSpring: poorer
21
Avg. Final Amount Learned
ClusterView TimeSearcher 1
HCE
Spotfire
GeneSpring
Spotfire: high value in learning
ClusterView and HCE are poor
22
Avg. Time to First Insight
ClusterView TimeSearcher 1
HCE
Spotfire
GeneSpring
ClusterView: very short time to first insight
TimeSearcher 1 and Spotfire are also quick
23
Avg. Total Time
ClusterView TimeSearcher 1
HCE
Spotfire
GeneSpring
Total time users spent using the tool
Low Values: Efficient or Not useful for insight
24
Unexpected Insights
HCE revealed several unexpected results
ClusterView provided a few
TimeSearcher 1 for time series data
Spotfire contributed to 2 unexpected insights
Hypotheses
A few insights led to hypotheses
Spotfire 3
ClusterView 2
TimeSearcher 1 1
HCE 1
25
Tools vs. Datasets
26
Insight Categories
Overall Gene Expression
Expression Patterns
Searching patterns is critical
Clustering is useful
Grouping
Overview of genes in general
Some users wanted to group genes
GeneSpring enables grouping
Detail Information
Users want detailed information about genes that are familiar
to them
27
Visual Representations and Interactions
Although some tools have many visualization
techniques, users tend to use only a few
Spotfire users preferred heat-maps
GeneSpring users preferred parallel coordinates
Lupus dataset: visualized best with heat-maps
Most users preferred outputs of clustering
algorithms
HCE not useful when a particular column
arrangement is useful
28
Running out of time, So, wrap up
Use a Visualization tool (that’s why we’re here!)
Spotfire: best general performance
GeneSpring: Hard to use
Dataset dictates best tool!
Time Series data: TimeSearcher
Others: Spotfire, GeneSpring?
Interaction is the key
Grouping and Clustering are necessary features
29
Critique
In all fairness, measuring insights is really hard! Here
are some possible issues
Subjectivity
Experiment relies on users always thinking aloud
Also, depends on a domain expert to evaluate insights
Results may vary widely based on participants expertise (only
two per tool-dataset pair)
Some insight characteristics are inherently subjective
Domain Value
Breadth vs. Depth
30
Critique
How do one count insights?
Assumes honest reporting by participants
Some insights may be of no great value
What if a discovery just reaffirms a known fact? Is
that an insight?
Measuring time taken to reach an insight
Maybe instead of measuring from beginning of
session we should measure from last insight
31