Ruebel, O., Weber, G.H., Huang, M.-Y., Bethel, E.W., Biggin, M.D.

Download Report

Transcript Ruebel, O., Weber, G.H., Huang, M.-Y., Bethel, E.W., Biggin, M.D.

Applications of Visualization and Data Clustering to 3D Gene Expression Data
Oliver Rübel1,2,3,7, Gunther H. Weber3,7, Min-Yu Huang1,7, E. Wes Bethel3, Mark D. Biggin4,7, Charless C. Fowlkes5,7, Cris L. Luengo Hendriks6,7, Soile V. E. Keränen4,7,
Michael B. Eisen4,7, David W. Knowles6,7, Jitendra Malik5,7, Hans Hagen2, and Bernd Hamann1,2,3,7
1.
2.
3.
4.
5.
6.
7.
Institute for Data Analysis and Visualization, University of California, Davis, One Shields Avenue, Davis CA 95616, USA
International Research Training Group “Visualization of Large and Unstructured Data Sets,” University of Kaiserslautern, Germany
Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94620, USA
Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94620, USA
Computer Science Division,University of California, Berkeley, CA, USA
Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94620, USA
Berkeley Drosophila Transcription Network Project, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94620, USA, http://bdtnp.lbl.gov/Fly-Net/
Single Pattern Analysis
Biological Background
Animals comprise dynamic 3D arrays of cells that express gene products in intricate spatial and temporal patterns. These patterns of
gene expression determine the shape and form of the animal. Biologists have typically analyzed gene expression and morphology by
visual inspection of 2D microscopic images. A rigorous understanding of developmental processes requires methods that can
quantitatively analyze these phenomenally complex arrays at the level of cellular resolution.
3D Gene Expression Data
Genes are frequently expressed in complex patterns consisting of quantitative differences in
expression between cells of an embryo. Clustering can be used effectively to discretize the
expression pattern of a gene. Discretization of expression patterns can be very useful, e.g., to
create logical models of gene networks. Here the pattern of eve (a) is classified into 2, 3, and
6 levels (b-d). Based on the results shown in (d), seven clusters, each selecting one stripe of
the eve pattern, are created using cluster post-processing techniques. Characteristics of the
seven stripes are revealed in the scatter-plot of three eve regulators gt, hb and, Kr.
The BDTNP has developed a suite of methods to quantitate the expression of genes in 3D at cellular resolution from whole Drosophila
embryos. Drosophila embryos are first imaged using twophoton fluorescence microscopy . The resulting 3D image stacks
are segmented in order to extract information about the expression
of genes on a per cell basis. Currently datasets with information up
to about 100 genes at up to six different time steps are available.
PointCloudXplore: A Framework for Visualization and Clustering of 3D Gene Expression Data
In our software called PointCloudXplore we have linked dedicated physical and information visualization views of the data via the
concept of brushing (cell selection). A user can select and highlight cells of interest in any view. All brushes (cell selectors) are then
stored in a central cell selector management system allowing one to highlight all selections in any view. Data clustering provides
means for automatic detection and definition of data features by automatically classifying cells into groups of similar behavior , the
clusters. Clusters, each defining a selection of cells, can be managed and visualized in the same way as user-defined cell selections.
Visualization is used for validation and improvement of clustering results while clustering is used to analyze the data as well as to
improve the visualization. For improvement of clustering results we have developed dedicated cluster post-processing techniques,
such as splitting, merging and filtering of clusters based on spatial cell positions.
Temporal Variation Analysis
Gene expression patterns are not static but are highly dynamic. Understanding the temporal profile of a gene expression pattern is
therefore essential if we are to understand complex relationships between genes. To assist in the analysis of the spatio-temporal
expression pattern of genes we use PointCloudXplore to cluster cells into groups based on the similarity of their temporal expression
profiles. The example here shows the classification of the
spatio-temporal pattern of giant (gt) expression. Cluster
statistics, such as average temporal expression profiles
of clusters, reveal the complex changes of gene patterns
and allow quantitation of their temporal variation.
PointCloudXplore
Data Clustering
Visualization
Physical Views
Abstract Views
Cell Selector
Statistics
Cell Selector
Management
Data Selection
Data Clustering
Clusters
Multiple Pattern Analysis
Post-Processing
To dissect the complex regulatory interactions between genes, the expression patterns of multiple potential regulatory transcription
factors can be used as input to cluster analysis. Cells are classified into clusters that have similar combinations of expression for the
input set of regulators. Each cluster describes one potential sub-pattern that a regulatory network composed of these factors could give
rise to. The results of such a clustering can also be compared to the expression patterns of suspected target genes to assess possible
regulatory relationships. Here, the pattern of the genes giant (gt), hunchback (hb), and Krüppel (Kr) have been used as input to the
clustering. Clustering results are compared to stripe two pf the eve expression pattern, suggesting that the anterior and posterior
border of the stripe as well as the ventral dip in eve expression can be modeled using gt, hb, and Kr expression levels.
Clustering-based False Coloring
Using hierarchical clustering one can define a linear order of the cells. This linear order can be used as basis for false coloring of the
data. By defining ranges in this linear cell order one can also easily define data features based on cell similarity.
hunchback (hb)
tailless (tll)
Krüppel (Kr)
giant (gt)
e)