Nir_Presentation
Download
Report
Transcript Nir_Presentation
Inference of Signaling Networks Using
Quantitative Morphological Signatures:
Parallel Computing Framework
Oaz Nir
18.337J
May 13, 2008
Challenges in Describing Signaling Networks
1 Connectivity
2 Flow of Signaling Information
3 Subcellular Distribution of
Local Networks
Cell Morphology = Signaling State
1
2
3
“stress fibers”
GTP
Rho Activation Rho
4
5
h
7
6
8
normal
9
Active Rho
Morphological Signature
= F1, F2, F3,…Fn
perimeter
“ruffles/lammelipodia
”
GTP
Rac Activation Rac
Active Rac
Morphological Signature
= F1, F2, F3,…Fn
“spiky”
d
GTP
Cdc42 Activation Cdc42
Active Cdc42
Morphological Signature
= F1, F2, F3,…Fn
Understanding how Signaling Networks that Regulate Morphology are Organized and
How Information Flows through these Networks
Transmembrane
Receptors
GPCRs/
G-proteins
G
G
G
Polarity Complexes
GEF
G
GAP
GEF GAP
GEF
GAP
GAP
GEF
GEF
Adhesion
Structures
GAP
GAP
GEF
GEF
GAP
GAP
GEF
GEF
GAP
GAP
GEF
GEF
GAP
GAP
Rho
GTPases
GEF
GEF
GAP
GAP
GEF
Actin
regulators
GEF
GAP
GAP
GEF
GEF
GAP
GAP
GEF
GEF
GEF GAP GEF
Actin/MT coordinators
GAP
MT regulators
Lipid
Regulators
Acquiring Morphological Signatures from Complex Images
1. Cell Culturing
+ GFP
+/- dsRNA
+/- Gene overexpression
features
Normalized
feature values
cell “x1”
condition “a1”
cell “x2”
condition “a1”
…
DAPI
GFP
F-Actin
2. Image Acquisition (GFP)
3. CellSegmenter
N Treatment Conditions (TCs)
GFP
Cell Segment
Half Mass fr. Centroid
Ruffle Area
Half Mass fr. Boundary
Edge
Process Area
Drainage Area
Gaussian Fit
Low Smooth/Best Ellipse
Fit
High Smooth/Best Ellipse
Fit
x=0.358
y=0.357
=-0.248
Raw Morphological Data and Data Reduction
145 phenotypic features
Feature n
All features
A
B
A
B
Feature y
B
Feature a
Reduce
Dimensionality
A
Feature b
Classifier (test)
Using Feature Graphs to Model Single-cell Distributions
i
j
i
j
dsRNA x
Feature values: i,j,k
-3i,-3j,-4k
3i,3j,-4k
dsRNA y
k
Feature values: i,j,k
For each dsRNA
treatment, defined a
graph as follows:
2i,2j,-2k
4i,4j,-4k
Draw a vertex for
each feature (neural
network classifier)
Draw an edge if the
correlation between
corresponding
features among
single cell data
exceeds a threshold
• Create structures that
allow inference of signaling
pathways
• Utilize single-cell data
• Linear correlations are fast
and easy to compute
• Graphs on the same vertex
set are comparable by
various algorithms from
graph theory
Inference Based on Feature Graphs
This is the unknown
signaling network we
will infer. For this
slide, assume we
know the signaling
network ahead of time
Intuition for Inference Based
on Feature Graphs
Question: What is the
relationship between
feature graphs of genes
in a signaling pathway?
[F1, F2]
C
A
RNAi Gene A
B
[F1, F2, F3]
[F1, F2, F4]
Expect feature graph
to have relatively large
number of edges
RNAi Gene B
Expect feature graph
to have a relatively
large number of edges
RNAi Gene C
Expect feature graph
to have a relatively
small number of edges
Feature graph is
approximately the
intersection of the
feature graphs for
RNAi A and RNAi B
Focus on Details of Feature Graph Construction
Drosophila Data Set
N =250 Treatment Conditions (TCs)
C = 50 cells per TC
F=150 features per cell
(future data sets will be larger)
For each dsRNA
treatment (TC),
defined a graph as
follows:
Draw a vertex for
each feature (neural
network classifier)
Draw an edge if the
correlation between
corresponding
features among
single cell data
exceeds a threshold
For each FG, need to compute
linear correlation for C=50 data
points for all F*(F-1)/2=150*149/2
pairs of features.
Since there are N = 250 TCs, there
are a total of N*F*(F-1)/2 linear
correlations to compute.
Focus on Details of Feature Graph Construction
Drosophila Data Set
N =250 Treatment Conditions (TCs)
C = 50 cells per TC
F=150 features per cell
For each FG, need to compute
linear correlation for C=50 data
points for all F*(F-1)/2=150*149/2
pairs of features.
How to compute all pairwise
correlations efficiently?
matmul of FxC and CxF
Computation is dominated by
matmul of FxC and CxF
matmul of Fx1 and 1xF
**Matlab built-in “corr” does
not work with ppeval
Parallelize in Dimension of TCs
Speed-up?
Serial
Parallel
What if the TCs Have Different Numbers of Cells (C)?
What if the TCs Have Different Numbers of Cells (C)?
Serial
Parallel
Summary and Conclusions
• Feature graph construction depends on computation of numerous linear correlations
• Parallelization was implemented
• But speed-ups were not realized (why not?)
• In fact, slower because of time required to move data to/from the server
• Speed-ups are realized for *very* large data sets because the server can handle
larger data more smoothly than a typical PC. But this is not due to parallelization,
rather due to hard drive usage.
• Why didn’t parallelization result in gains in speed?
• Interactive Supercomputing doesn’t preallocate matrices in Matlab
• Structure of problem?
• Coding?
Acknowledgments:
Chris
Bakal
Bonnie
Berger
John Aach
Norbert
Perrimon
George
Church