D. melanogaster - Institute for Data Analysis and Visualization

Download Report

Transcript D. melanogaster - Institute for Data Analysis and Visualization

Computational embryology can reveal subtle
differences between species
On computational analysis of quantitative, 3D spatial
expression in Drosophila blastoderm
With cellular resolution data, new morphological features and morphogenetic
phenomena can be discovered, which will also lead to new insights in comparative
embryology.
Soile V. E. Keränen1, Angela DePace2, Ann Hammonds1, Bill Fisher1, Oliver Rübel1, Gunther Weber1, Clara
Henriquez1, Charless Fowlkes3, Cris L. Luengo Hendriks4, E. Wes Bethel1, Hans Hagen5, Bernt Hamann6,
Jitendra Malik7, Susan E. Celniker1, David W. Knowles1, Michael B. Eisen7, Mark D. Biggin1
D. melanogaster
D. pseudoobscura
1) Lawrence Berkeley National Laboratory, Berkeley, CA, USA; 2) Harvard Medical School, Boston, MA, USA; 3) UC Riverside,
Riverside, CA, USA ; 4) Uppsala University, Uppsala, Sweden; 5) University of Kaiserslautern, Kaiserslautern, Germany; 6) UC Davis,
Davis, CA, USA; 7) UC Berkeley, Berkeley, CA, USA
eve
Kr
3D cellular resolution quantitation of spatial gene expression
(http://bdtnp.lbl.gov/)
t5:26-50%
t5:51-75%
1
5
t5:76-100%.
BDTNP data
>6800 PointClouds for stages 4 and 5
~3.5 Tb raw image data
~6.8 Gb stage 4 and 5 PointCloud data
Nuclear segmentation
Different views of 6 genes in
a D. melanogaster
VirtualEmbryo displayed in
PointCloudXplore (see Rübel
et al., 2006). Top, a 3D
embryo view. Center, a
cylindrical projection. Bottom,
a cylindrical projection with a
id,x,y,z,Nx,Ny,Nz,Da,Db,Vn,Vc,eve,ftz,hb,kni,kr,rho,slp1,sna,tll,croc,fkh,twi,trn,gt
******
1009,142.45,124.288,38.4751,-0.2286,0.37424,-0.89871,0,0,317.559,864.635,13.7058,81.2576,14.1843,21.9375,4.9696,7.9223,10.912,6.5325,18.3249,17.3787,18.499,10.9727,38.7896,80.
height map showing gene
2018,159.302,123.878,164.709,-0.10815,0.3775,0.91967,0,0,308.755,865.85,70.6029,10.0973,25.2076,17.8543,8.8861,33.3451,13.8321,58.1229,17.7753,27.252,20.3735,70.5197,10.4731,2
3027,111.273,115.159,158.502,-0.23647,0.3794,0.8945,0,0,268.984,685.818,12.2596,5.7213,8.0693,69.3157,6.614,28.261,70.3689,58.5122,15.6545,22.3175,15.5599,63.354,45.7422,40.88
expression levels.
4036,341.931,30.2893,75.7972,0.18502,-0.94545,-0.26812,0,0,231.338,537.361,19.9463,53.7362,65.8337,21.3702,8.2404,19.4855,8.0923,5.6487,25.3556,18.5914,22.0013,10.0438,22.6887
t5:9-25%
5
% egg length
>130 genes including:
D. pseudoobscura and D. yakuba wild type
1
23 CRM-lines in D. melanogaster background
2
5
4
3
gt
CRMs showing weak to moderate blastoderm expression are
expressed at much higher levels in late embryos, often at multiple
stages and cell types.
CRM-8493
D. melanogaster
stripe 1
80
60
40
20
0
n = 126
n = 192
n = 390
n = 362
n = 289
V
80
60
40
20
D
D. melanogaster
stripe 5
80
60
40
20
4-8%
9-25%
26-50%
51-75%
76-100%
n = 54
n = 194
n = 174
n = 55
n = 77
D
V
4-8%
9-25%
26-50%
51-75%
76-100%
n = 54
n = 194
n = 174
n = 55
n = 77
D
D. pseudoobscura
stripe 5
80
60
40
20
0
0
D
To make cellular resolution comparisons between the
expression output of a CRM-transgene and the wild
type expression, we have measured the 3D
expression patterns of a set of CRM-transgenes and
aligned them to a VirtualEmbryo containing wild type
gene expression patterns.
D. pseudoobscura
stripe 1
0
D
4-8%
9-25%
26-50%
51-75%
76-100%
3D analyses of cis-regulatory output show subtle quantitative differences to the wild type pattern
n = 126
n = 192
n = 390
n = 362
n = 289
Relative expression intensity %
The atlas data can be used for computational analyses,
e.g., to make predictions on putative regulators of
spatial gene expression domains (Fowlkes et al., 2008)
4-8%
9-25%
26-50%
51-75%
76-100%
Relative expression intensity %
1 Mb text file for computational analyses
To analyze how sequence is read into expression
pattern, the BDTNP are testing with expression
constructs suspected or known cis-regulatory
modules (CRMs), identified with sequence analyses.
Visual examination of a 2D photographs shows that
many of these putative CRMs partially recapitulate
the wild type blastoderm patterns.
% egg length
The density differences between the species are likely to depend on subtle
quantitative differences in the expression patterns of developmental regulators and
in the cellular responsiveness to developmental signaling between the species.
5045,168.5,25.747,67.1179,-0.10967,-0.90025,-0.42134,0,0,224.052,630.261,51.517,8.0767,22.2996,23.0654,11.2735,46.5546,12.3596,5.6873,15.8975,24.598,25.5458,9.9865,16.8356,20.
6054,119.682,108.526,168.746,-0.19491,0.21631,0.95667,0,0,170.923,505.484,67.6004,6.7196,10.9492,24.8757,7.5595,58.568,66.6675,36.9671,23.4779,15.9494,18.4064,59.0062,17.4559,
1,175.229,155.285,55.1238,-0.14639,0.80211,-0.57896,0,0,285.682,582.293,10.2466,89.2554,23.3279,18.3,10.1307,16.224,13.789,5.0749,15.5772,21.7618,22,15.5339,34.8946,16.5044
...
The Berkeley Drosophila Transcription Network Project (BDTNP) is
developing cellular resolution, quantitative expression maps of
Drosophila embryos in a computationally analyzable format. Files
representing data from one embryo are called PointClouds. Multiple
PointClouds are then aligned to a common framework, termed a
VirtualEmbryo, to allow modeling and simulation of multiple genes
regulation in a 3D environment.
nuclei / µm2
Analysis of Drosophila melanogaster blastoderm (left) shows significant anterior-posterior (AP) and
dorsal-ventral (DV) nuclear density differences. Similar analyses on D. pseudoobscura blastoderm
(right) show analogous but different density patterns.
Drosophila melanogaster wild type + 7 mutants
i
nuclei / µm2
Relative expression intensity %
300 Mb image stack
For Drosophila blastoderm stage embryos, we now have
data for the protein and mRNA expression of over 100
genes, 7 patterning mutant strains, 23 transgenic
promoter constructs, and 3 Drosophila species.
t5:4-8%
eve
Kr
Relative expression intensity %
The development of species specific morphologies results from complex,
quantitative action of gene expression networks. The analysis of such
networks requires computationally analyzable, cellular resolution
datasets of spatial gene expression. The methods to construct them
should be as robust and automated as possible to increase the speed of
analysis and to decrease the error.
stage 5:0-3%
1
V
D
D
V
D
Computational analyses of gene expression show the qualitatively similar eve stripe 1 and eve
stripe 5 have relative intensity differences between D. melanogaster and D. pseudoobscura. The
error bars show 95% confidence intervals.
CRM-8198
We think that small quantitative expression differences are very common between
even closely related species and that they play a significant role in evolution of
species specific morphologies and in speciation in general.
CRM-8331
We found that most of the studied CRM-transgenes
have subtle or not so subtle quantitative differences in
expression compared to intact genes. While
discrepancies between CRMs and intact gene
patterns have been noted in a few case before (e.g.,
Schroeder et al., 2004) more often the similarities
have been emphasized. In comparison, our results
suggest that quantitative and qualitative differences
are so common that the current gene regulatory
models based on CRM sequences need to be
calibrated against actual experimental data.
Three example D. melanogaster
CRM-transgenes with blastoderm
patterns, 2D photographs vs 3D
expression intensity maps: 8001
(above) accurately replicates gt
posterior band; 8086 (right) has
some similarity to kni pattern but
with clear spatial and quantitative
a/p and d/v differences; 8010 (far
right) replicates odd stripes 3 and 6
but with differences in d/v intensity
profile. (red CRM-LacZ, green wild
type expression pattern)
Cellular resolution data can also be used for rapidly
measuring features, that are theoretically
analyzable by hand counts, such as the total
number of nuclei and egg sizes, in large numbers of
embryos (left). With a sufficiently large sample size,
we can show that nuclear numbers at stage 5 scale
to the egg size in both Drosophila species, and that
the distributions of nuclear numbers are different,
though individual embryos may fall within the
parameters of another species.
Three example D. melanogaster CRMs with weak early pattern and strong late pattern
in D. melanogaster embryos.
We believe that many CRMs that drive early gene expression are
multifunctional. This is likely to impose extra evolutionary
constraints in the sequence and structure of the CRM-promoter
combinations. It is also possible that the early expression patterns
are largely non-functional and simply tolerated because of the
later gene function. Such patterns would be free to acquire new,
essential functions, thus imposing further constraints on the CRM.
Further basic measurements using large
data sets are likely to reveal further
developmental rules, or constraints to
morphological evolution.
Future prospects
Interactive displays in computational modeling and mining the PointCloud data: PointCloudXplore/Matlab - interface
While the spatial patterns of regulator availability (network input)
may have multiple implications on the function and evolution of
regulatory sequences, a VirtualEmbryo represents the spatial
expression profile of a blastoderm in tens of Mb large text file,
which can be nonintuitive to an average biologist. Hence, to
facilitate the use of these quantitative cellular resolution embryo
atlases we have provided a visualization and analysis tool,
PointCloudXplore, which can display the expression data both as
an embryo and as an abstract view.
To increase the flexibility of the PointCloudXplore, have now added
to it a Matlab interface which can be used for exporting the
PointCloud data for Matlab analyses, running the analysis scripts
and/or importing the results back to the PointCloudXplore (Rübel et
al, submitted). This enables versatile use of various Matlab
functions for rapid development of new analysis and data plotting
scripts without having to alter the PointCloudXplore itself.
As an example for a newly added data
mining capability, we chose the early and
late hb expression patterns (above right)
with PointCloudXplore/Matlab interface. An
associated short Matlab-script then
computed their difference, returned to the
PointCloudXplore to be viewed as an
artificial expression channel (below right).
Uses of cellular resolution embryo atlases:
To better disseminate the additional data
mining capabilities that are now easy to
write and implement, we are planning a
curated databank where the users can
submit or download scripts for
manipulation of PointCloud data.
Computational platforms for modeling and simulating gene regulatory network
function and pattern formation in a realistic environment (“virtual embryos as
in silico experimental model organisms”)
2
3
Novel form of computational biology resources for data mining and rapid
multidimensional analyses of morphogenesis, gene expression and potential
modes of microevolution and speciation
We are currently:
Analyzing the transcription factor binding in putative CRMs to combine ChIPchip and ChIP-seq data to the spatial pattern data
The PointCloudXplore/Matlab-interface can be used also for more complicated
simulation and analysis programs. We have used the PointCloudXplore/Matlabinterface to run a genetic algorithm to see if we can identify potential regulators of
individual eve stripes. We found that while the simulations were quite successful in
identifying the known regulators of eve stripe 2, the results from eve stripe 7 were
less promising. We assume that this is due to insufficient gene regulatory model for
our optimization algorithm. We also found that eve stripes 2 and 7 could theoretically
be co-regulated, as also suggested by some earlier authors (Hare et al. 2008,
Janssens et al. 2006).
1
Digital reference maps of gene expression and fine level anatomical detail
from anatomy to systems genomics (“computational histology”)
corr=97.31%
eve stripe 2
Expanding the expression pattern data set for wild type target gene mRNAs,
wild type regulator proteins and CRM-LacZ reporter strains
Expanding the methods for generating cellular resolution embryo atlases in
later stage Drosophila embryos
Developing approaches for computational analyses of spatial PointCloud data
corr=88.47%
eve stripe 7
Future needs:
Since differences between ab initio modeling and biological experiments can be
useful both for revealing weaknesses in theoretical models and for discovering new
phenomena, we believe that cellular resolution computer atlases of embryos and
organs and in silico experimentation with these will become an important tool in
biology, parallel to in vitro and in vivo experiments.
PointCloud maps of multiple late embryonic stages and more species
Developments in dissemination of computational tools and datasets to
Drosophila community and developmental biologists in general
corr=89.44%
Development of interactive ontology map for Drosophila embryo describing
each cell’s organ and tissue identity, expression profile, and lineage
eve stripe 2+7
The schematic view of PointCloudXplore/Matlab-interface: 1) The various
expression views and the built-in data filtering capabilities in PointCloudXplore
allow easy selection of genes of interest. 2) The exported data is analyzed using a
Matlab script. 3) The output is returned to PointCloudXplore for easy visualization
or further data filtering
The input data (above righ) used for ab initio attempt to find of potential regulators of individual eve stripes, and
the results (right). The predictions that agree with experimental data are shown in green, the predictions that
disagree are shown in red and the results that neither agree nor disagree with experimental data are shown in
black, as are the stripe 2+7 results. The stripe 7 results show that the output of a regulatory network can be
mimicked by a set of regulators that is different from the experimentally verified one.
More sophisticated algorithms and computational strategies for handling and
analysis of spatial expression data and for integrating it with other forms of
genomic data (sequence, microarray, protein-interaction, etc.)
Predicted regulators
Target pattern
Output pattern