Understanding multi

Download Report

Transcript Understanding multi

Understanding multi-cellular
systems
Jin Chen
CSE891-001
2012 Fall
1
EXAMPLE
Multi-cellular signaling network of 169 genes/proteins regulated during the first 4 hours of
EGFR activation in human mammary cells. Results were obtained by integrating microarray,
proteomic and Western blot data in PNNL.
http://www.sysbio.org/sysbio/multicellular/index.stm
2
Why multi-cellular systems?
• Understanding multi-cellular systems is an imperative
element for adapting systems biology studies to medical
applications
• Example: understand host-pathogen interactions
– How cells on the defensive front lines interact with an invader
– How different host cell types involved in the inflammatory
cascade triggered by an infection interact with one another
• Studying signaling among different cell types in a single
organism is important for understanding diseases and the
processes involved in disease, such as metastasis.
http://www.sysbio.org/sysbio/multicellular/index.stm
3
Strategies to study multi-cellular networks
• Expand the capabilities of analytical and
computational tools
• Coupling the development of both modeling and
experimental approaches
• Reduce the enormous complexity of biological
organisms to simplest terms
http://www.sysbio.org/sysbio/multicellular/index.stm
4
Rules to follow
1. The usefulness of high-throughput data can be
greatly increased by integrating multiple data types
that are obtained in parallel on the same system
2. Signaling networks can be accurately modeled as a
series of functional modules instead of networks of
individual interactions
http://www.sysbio.org/sysbio/multicellular/index.stm
5
PNNL’s functional module approach
• Reduce the network complexity by describing them in terms of functional
modules, rather than individual molecular reactions - follow the flow of
information among multiple cell types to connect molecular events to
overall system behavior
• A top-down perspective
1.
2.
3.
4.
5.
6.
7.
Consider a biological system as an input-output system.
Collect baseline, multidimensional data on the overall system.
Analyze the data in the context of the literature and current databases to
identify functional modules.
Construct a high-level model based on tentatively identified functional
modules.
Make testable predictions of outputs in response to specific inputs and test
those predictions with experiments specifically designed to enable step 6.
Use differences between predictions and experiments to refine the original
model, adding more modules when necessary.
Repeat 5
6
MetaCore
• MetaCore is an integrated knowledge database and software suite
for pathway analysis of experimental data and gene lists.
• MetaCore is based on a proprietary manually curated database of
human protein-protein, protein-DNA and protein compound
interactions, metabolic and signaling pathways for human, mouse
and rat, supported by proprietary ontologies and controlled
vocabulary.
• The analytical package includes tools for search, data visualization,
mapping and exchange, biological networks and interactome.
– MetaRodent
– MetaLink
– MetaSearch
http://www.genego.com
7
Multi-cellular differentiation
2009
• Understanding the processes involved in multi-cellular pattern formation
is a central problem of developmental biology.
• Defining suitable computational techniques for development modeling,
able to perform in silico simulation experiments, is an open and
challenging problem.
Bonzanni N et al. Bioinformatics 2009;25:2049-2056
8
Background
• Many efforts have been undertaken to elucidate how cells are
able to coordinate different and sometimes conflicting signals,
producing a precise phenotype during the animal
organogenesis
• C. elegans vulva development provides an elegant and
relatively well-charted model to study how multiple pathways,
in multiple cells, interact to produce developmental patterns.
VPC
9
Background
• The first diagrammatic model, describing the regulatory network
underlying VPC determination, was proposed by Sternberg and Horvitz
(1989). Since then, global understanding of the biological network has
improved greatly.
• The first computational model, proposed by Kam et al.(2003), combined
multiple experimental ‘scenarios’ from Sternberg and Horvitz (1986) into a
single model, using Live Sequence Charts (LSCs).
• Afterwards, in two landmark papers, Fisher et al., 2005, 2007) suggested
two state-based mechanistic models.
• Two other insightful models of C.elegans vulval development have been
published. Giurumescu et al. (2006) proposed a partial model based on
ODEs, while Sun and Hong (2007) developed a model based on
automatically learned dynamic Bayesian networks with discrete states.
• Liet al. (2009) recently modeled part of C.elegans vulval development
using hybrid functional Petri nets with extensions.
10
Petri nets
• Apply a discrete, non-deterministic Petri nets-based
model to C.elegans vulval development.
• Petri nets are a convenient formalism to represent
biological networks. This formalism models process
synchronization, asynchronous events, conflicts and in
general concurrent systems in a natural way.
• Petri nets offer direct insights into causal relationships,
and allow a graphical visualization that resembles the
diagrams used to describe biological knowledge.
11
Petri net
A Petri net is a directed bipartite graph, in which the nodes represent
transitions (bars) and places (circles). The directed arcs describe which places
are pre- and/or post-conditions for which transitions occurs.
Some sources state that Petri nets were invented in August 1939 by Carl Adam
Petri — at the age of 13 — for the purpose of describing chemical processes.
He documented the Petri net in 1962 as part of his dissertation
http://en.wikipedia.org/wiki/Petri_net
12
Petri net
• Places in a Petri net contain a discrete number of marks called tokens.
• Any distribution of tokens over the places will represent a configuration of
the net called a marking.
• In an abstract sense relating to a Petri net diagram, a transition of a Petri
net may fire whenever there are sufficient tokens at the start of all input
arcs
• When it fires, it consumes these tokens, and places tokens at the end of all
output arcs. A firing is atomic, i.e., a single non-interruptible step.
13
Model Design
• Aim: to mimic the underlying biological mechanisms as much as
possible, and not only to reproduce the expected phenotype
according to a specific set of mutations.
• To achieve this, a principle of maximal parallelism is applied, and is
bounded execution with overshooting.
• Using this simple framework, we can identify different modules,
each corresponding to different biological functions.
• Thus, combining functional modules into cells, and joining such cells
together, we iteratively developed the whole network.
14
Model Design
• Focus on preserving the simplicity of the formalism, and
develop an execution semantics which resembles biology
– Places = genes, protein species and complexes
– Transitions = biological processes
– Firing of a transition is execution of a process, e.g. consuming
substrates or creating products
– Number of tokens is interpreted in two ways.
• For genes as a Boolean value, 0 means not present and 1 present.
• For proteins, we use abstract concentration levels 0−6: going from not
present, via low, medium and high concentration to saturated level.
• The rationale behind this approach is to abstract away from unknown
absolute molecule concentration levels, as we intend to represent
relative concentrations.
15
Model Design - Maximal parallelism
• The maximal parallel execution semantics can be summarized
informally as execute greedily as many transitions as possible in one
step.
• Definition: A maximally parallel step 𝒮 is a step that leaves no
enabled transitions in the net, and in principle should be developed
in such a way that it corresponds to one time step in the evolution
of the biological system.
• The modeler can capture relative speeds using appropriate weights
on arcs. Typically, if in one time unit a protein A is produced four
times more than a protein B, then the transition that captures
production of A should have a weight that is four times as large as
the weight of the one that captures B production.
16
Model Design - Maximal parallelism
• Implementing a pure maximally parallel semantics requires to
generate all possible partitions of tokens, and select one
randomly, uniformly. However, with the growth of the
network, this procedure becomes prohibitively slow.
• This paper approximated it by building a maximally parallel
step incrementally, selecting one transition after another,
randomly, until all enabled transitions have been exhausted.
17
Model Design - Maximum capacity
• Unrestricted production of proteins is usually not realistic, as in
nature the cell would saturate with the product, and the reaction
would slow down or stop.
• To guarantee that the highest concentration level can be attained,
we introduced bounded execution with overshooting.
– Each place has a predefined maximum capacity 𝒩 = 6
– A transition can only fire if each output place holds fewer than 𝒩
tokens
• Since each transition can possibly move more than one token at
once into its output places, each transition can overshoot the pregiven capacity 𝒩 at most once. Therefore, the network is bounded
with a finite bound k ≥ 𝒩.
18
Model design – example
VAV-1 down-regulation by decreasing
the translation rate of the gene vav-1.
If mir-61 is not present, the reaction
VAV-1 PRO is enabled and produces
the protein. However, when mir-61 is
present, the reaction VAV-1 DR is
enabled and has 50% chance of firing
compared with VAV-1 PRO, thus the
production of VAV-1 will halve.
Two connected basic modules, a
gene expression and the
endocytosis mediated downregulation of LIN-12. Activation
of the Ras/MAPK cascade leads
to the transcription of a hitherto
unknown gene that enhances
the LIN-12 endocytosis.
19
Results
• Petri net model for cell fate determination during C.elegans vulval
induction.
• The entire network comprises 600 nodes (places and transitions) and 1000
arcs. It includes the VPC network out of six interconnected cells as
identical modules of a multi-potent cell. A separate block for the AC
(producing the inductive signal) and for the hyp7 is also built.
• It helps us to identify different modules that correspond to different
biological functions, such as gene expression, protein activation and
protein degradation.
20
Schematic representation of the whole system.
How the six VPCs, AC and hyp7 modules are connected. Adjacent cells are linked with each
other, the hyp7 connects to all six cells, and the AC can directly influence cells P5.p, P6.p and
P7.p.
Results
• Multi-level network
– Level 1: basic biological functions
– Level 2: protein interactions
– Level 3: pathways
– Level 4: cells
– Level 5: multi-cellular interactions
Level 3
22
Results
• Multi-level network
– Level 1: basic biological functions
– Level 2: protein interactions
– Level 3: pathways
– Level 4: cells
– Level 5: multi-cellular interactions
Level 4
23
Results
• Multi-level network
– Level 1: basic biological functions
– Level 2: protein interactions
– Level 3: pathways
– Level 4: cells
– Level 5: multi-cellular interactions
Level 5
24
Multi-level network
25
A single VPC cell
26
Results
Comparison between photomicrographs of gene activity by fluorescently
labeled gene products, and simulation results. (a) Photomicrographs of the
graded expression of the inductive signal adapted from Yoo et al. (2004). (b)
Time series plot generated by this model, showing the graded expression of
the inductive signal, initially faintly present in P5.p and P6.p. Maximally
parallel steps on the horizontal.
27