Pathology - specific Gene Discovery Program

Download Report

Transcript Pathology - specific Gene Discovery Program

Molecule as Computation
Ehud Shapiro
Weizmann Institute of Science
Joint work with Aviv Regev and Bill Silverman
In collaboration with Corrado Priami, Naama Barkai and
Luca Cardelli
The talk has three parts:
1. Briefly introduce molecular biology
2. Computer-based consolidation of molecular biology
3. Our work on helping this happen
Part I
Brief Introduction to
Molecular Biology
Pentium II
E. Coli
Pentium II
 3 million transistors
 1/4 million bytes of
memory
 80 million
operations per
second
E. Coli
 1 million
macromolecules
 1 million bytes of
static genetic
memory
 1 million aminoacids per second
Comparison courtesy of Eric Winfree
Pentium II
E. Coli
Pentium II
1 micron
E. Coli
Pentium II
1 micron
E. Coli
1 micron
Inside E. Coli
Inside E. Coli
(1Mbyte)
Ribosomes in operation
Ribosomes translate RNA to Proteins
RNA Polymerase transcribes DNA to RNA
Ribosomes in operation
(= protein)
Computationally: A stateless string transducer from the RNA alphabet of nucleic acids
to the Protein alphabet of amino acids
Ribosome operation
Ribosome operation
Ribosome operation
Ribosome operation
Seqeunces and String Transducers
Ribosomes translate RNA to Proteins
RNA Polymerase transcribes DNA to RNA
Molecular Biology in One Slide
 Sequence: Sequence of DNA and Proteins
Molecule as Computation
Ehud Shapiro
Weizmann Institute of Science
Joint work with Aviv Regev and Bill Silverman
In collaboration with Corrado Priami, Naama Barkai and
Luca Cardelli
The talk has three parts:
1. Briefly introduce molecular biology
2. Computer-based consolidation of molecular biology
3. Our work on helping this happen
Part I
Brief Introduction to
Molecular Biology
Pentium II
E. Coli
Pentium II
 3 million transistors
 1/4 million bytes of
memory
 80 million
operations per
second
E. Coli
 1 million
macromolecules
 1 million bytes of
static genetic
memory
 1 million aminoacids per second
Comparison courtesy of Eric Winfree
What about “The Rest” of
biology: the function, activity
and interaction of molecular
systems in cells?
?
Part III
An Abstraction for Molecular
Systems
The “New Biology”
 The cell as an information processing
device
 Cellular information processing and
passing are carried out by networks of
interacting molecules
 Ultimate understanding of the cell
requires an information processing model
 Which?
“We have no real ‘algebra’ for describing
regulatory circuits across different
systems...”
-
T. F. Smith (TIG 14:291-293, 1998)
“The data are accumulating and the
computers are humming, what we are lacking
are the words, the grammar and the syntax
of a new language…”
-
D. Bray (TIBS 22:325-326, 1997)
Our Proposal:
Molecule as Computational Process
A system of interacting molecular
entities is described and modelled
by a system of interacting
computational entities.
“Cellular Abstractions: Cells as Computation”,
to appear in Nature, September 26th, 2002
Composition of two processes is a
process, therefore:
 Molecular ensembles as processes
 Molecular networks as processes
 Cells as processes (virtual cell)
 Multi-cellular organisms as processes
 Collections of organisms as processes
Towards “Molecule as Process”
1. Use the p-calculus process algebra as
molecule description language
The p-calculus
(Milner, Walker and Parrow 1989)
 A program specifies a network of interacting
processes
 Processes are defined by their potential
communication activities
 Communication occurs on complementary
channels, identified by names
 Message content: Channel name
p-calculus key constructs
Parallel
A|B
Choice
A;B
Communication
X ! M or X ? Y
Recursion, with state
change
P :- … P’…
Molecules as Processes
Molecule
Process
Interaction capability
Channel
Interaction
Communication
Modification
State change
Na + Cl < Na+ + ClNa | Na | … | Na | Cl | Cl | … | Cl
Na::= e ! [] , Na_plus .
Na_plus::= e ? [] , Na .
Cl::= e ? [] , Cl_minus .
Cl_minus::= e ! [] , Cl .
Processes, guarded communication, alternation
between two states.
The RTK-MAPK pathway
GF GF
RTK
RTK
 16 molecular species
SHC
SOS
 24 domains; 15 sub-domains
GRB2
RAS
PP2A
 Four cellular compartments
MKP1
GAP
RAF
MKK1
ERK1
IEP
MP1
J F
IEP
 Binding, dimerization,
phosphorylation,
de-phosphorylation,
conformational changes,
translocation
 ~100 literature articles
 250 lines of code
IEG
Molecular systems with p-calculus
 Can express, qualitatively, the behavior of
many complex molecular systems
 Cannot express quantitative aspects
Towards “Molecule as Process”
1. Use the p-calculus process algebra as
molecule description language
2. Provide a biochemistry-oriented stochastic
extension (with Corrado Priami)
Stochastic p-Calculus
(Priami, 1995,
Regev, Priami, Shapiro, Silverman 2000)
 Every channel x attached with a base rate r
 A global (external) clock is maintained
 The clock is advanced and a communication is
selected according to a race condition
 Rate calculation and race condition adapted for
chemical reactions:
 Rate(A+B  C) = BaseRate *[A]*[B]
 [A] = number of A’s willing to communicate with B’s.
 [B] = number of B’s willing to communicate with A’s.
BioSPI implementation:
p-calculus + Gillespie’s algorithm
 Gillespie (1977): Accurate stochastic
simulation of chemical reactions
 The BioSPI system:
 Compiles (full) p-calculus
 Runtime incorporates Gillespie’s algorithm
Na + Cl < Na+ + Cl100
90
80
global(e1(100),e2(10)).
70
60
50
40
30
Na::= e1 ! [] , Na_plus .
20
10
Na_plus::= e2 ? [] , Na .
Cl::= e1 ? [] , Cl_minus .
Cl_minus::= e2 ! [] , Cl .
0
0
0.5
1
1.5
2
2.5
3
3.5
4
-3
100
x 10
90
80
70
60
50
40
30
20
10
0
0
0.005
0.01
0.015
0.02
0.025
0.03
Programming Experience with
Stochastic Pi Calculus
 Taught semesterial M.Sc. Course (available
online) with lots of examples, exercises and
final projects
 Textbook examples from chemistry, organic
chemistry, enzymatic reactions, metabolic
pathways, signal-transduction pathways…
Circadian Clocks
J. Dunlap, Science (1998) 280 1548-9
The circadian clock machinery
(Barkai and Leibler, Nature 2000)
A
degradation
R
A
R
UTRA
translation
transcription
PA
A_RNA
A_GENE
UTRR
degradation
translation
transcription
PR
R_RNA
R_GENE
Differential rates: Very fast, fast and slow
The machinery in p-calculus: “A” molecules
A_GENE::= PROMOTED_A + BASAL_A
PROMOTED_A::= pA ? {e}.ACTIVATED_TRANSCRIPTION_A(e)
BASAL_A::= bA ? [].( A_GENE | A_RNA)
ACTIVATED_TRANSCRIPTION_A::=
t1 . (ACTIVATED_TRANSCRIPTION_A | A_RNA) +
e ? [] . A_GENE
RNA_A::= TRANSLATION_A + DEGRADATION_mA
TRANSLATION_A::= utrA ? [] . (A_RNA | A_PROTEIN)
DEGRADATION_mA::= degmA ? [] . 0
A_Gene
A_RNA
A_PROTEIN::= (new e1,e2,e3)
PROMOTION_A-R + BINDING_R + DEGRADATION_A
PROMOTION_A-R ::=
pA!{e2}.e2![]. A_PROTEIN +
pR!{e3}.e3![]. A_PRTOEIN
BINDING_R ::= rbs ! {e1} . BOUND_A_PRTOEIN
BOUND_A_PROTEIN::= e1 ? [].A_PROTEIN + degpA ? [].e1 ![].0
DEGRADATION_A::= degpA ? [].0
A_protein
The machinery in p-calculus: “R” molecules
R_GENE::= PROMOTED_R + BASAL_R
PROMOTED_R::= pR ? {e}.ACTIVATED_TRANSCRIPTION_R(e)
BASAL_R::= bR ? [].( R_GENE | R_RNA)
ACTIVATED_TRANSCRIPTION_R::=
t2 . (ACTIVATED_TRANSCRIPTION_R | R_RNA) +
e ? [] . R_GENE
RNA_R::= TRANSLATION_R + DEGRADATION_mR
TRANSLATION_R::= utrR ? [] . (R_RNA | R_PROTEIN)
DEGRADATION_mR::= degmR ? [] . 0
R_Gene
R_RNA
R_PROTEIN::= BINDING_A + DEGRADATION_R
BINDING_R ::= rbs ? {e} . BOUND_R_PRTOEIN
BOUND_R_PROTEIN::= e1 ? [] . A_PROTEIN + degpR ? [].e1 ![].0
DEGRADATION_R::= degpR ? [].0
R_protein
BioSPI simulation
A
R
600
600
500
500
400
400
300
300
200
200
100
100
0
0
1000
2000
3000
4000
5000
6000
7000
8000
9000 10000
0
0
1000
2000
3000
4000
5000
6000
Robust to random perturbations
7000
8000
9000 10000
The A hysteresis module
A
A
ON
600
500
400
Fast
Fast
300
200
OFF
100
R
0
0
100
200
300
400
500
 The entire population of A molecules
(gene, RNA, and protein) behaves as one
bi-stable module
600
R
Hysteresis module
ON_H-MODULE(CA)::=
{CA<=T1} . OFF_H-MODULE(CA) +
{CA>T1} .
(rbs ! {e1} . ON_DECREASE +
e1 ! [] . ON_H_MODULE +
pR ! {e2} . (e2 ! [] .0 | ON_H_MODULE) +
t1 . ON_INCREASE)
ON_INCREASE::= {CA++} . ON_H-MODULE
ON_DECREASE::= {CA--} . ON_H-MODULE
OFF_H-MODULE(CA)::=
{CA>T2} . ON_H-MODULE(CA) +
{CA<=T2} .
(rbs ! {e1} . OFF_DECREASE +
e1 ! [] . OFF_H_MODULE +
t2 . OFF_INCREASE )
OFF_INCREASE::= {CA++} . OFF_H-MODULE
OFF_DECREASE::= {CA--} . OFF_H-MODULE
ON
OFF
Modular cell biology
 Build two representations in the p-calculus
 Implementation (how?): molecular level
 Specification (what?): functional module level
The circadian specification
R
Counter_A
R
UTRR
OFF
degradation
translation
ON
transcription
PR
R_RNA
R_GENE
R (gene, RNA, protein) processes are
unchanged (modular;compositional)
BioSPI simulation
Module, R protein and R RNA
500
R (module vs. molecules)
600
450
500
400
350
400
300
250
300
200
200
150
100
100
50
0
0
1000
2000
3000
4000
5000
6000
7000
8000
9000 10000
0
7500
8000
8500
9000
9500
10000
Modular cell biology
 Build two representations in the p-calculus
 Implementation (how?): molecular level
 Specification (what?): functional module level
 Ascribing a function to a biomolecular
system ~ equivalence between
specification and implementation
Limitation of stochastic p- calculus:
Lack of location information
 Membranes: Cells and cellular
compartments, “inside” and “outside”
 Molecular proximity: The identity of
complexes and single molecules
 Limited solution: programming tricks
Towards “Molecule as Process”
1. Use the p-calculus process algebra as
molecule description language
2. Provide a biochemistry-oriented stochastic
extension (with Corrado Priami)
3. Provide an Ambient Calculus extension (with
Luca Cardelli)
Mobile compartments
Compartment
Compartment
mobility
Cells
Cell movement
Organelles
and vesicles
Multimolecular
complexes
Process mobility
Trans-membranal
molecules (receptors,
channels,
Merging, budding, transporters);
bursting
Molecule entry and exit
Form and break
Bind and unbind to
molecular scaffolds
The ambient calculus
(Cardelli and Gordon)
 An ambient is a bounded place where
computation happens
Ambient
Processes
The ambient calculus
(Cardelli and Gordon)
 The ambient’s boundary restricts process
interactions across it
Ambient
Processes
The ambient calculus
(Cardelli and Gordon)
 Processes can move in and out of ambients
Ambient
Processes
Ambient are mobile processes, too !
Compartments as ambients
Cell
Nucleus
P
Q
R
R
cell [ P | Q | R | nuc [R] ]
Cells, vesicles, compartments ~ Ambients
Synchronized ambient movement
enter/accept
exit/expel
merge+/merge-
vesicle
merge
Lysozome
exit
enter
vesicle[merge- c. P|Q] |
lysozome [merge+ c . R|S]
merge

lysozome [P|Q|R|S]
Enter, exit, merge ~
Budding-in or -out, endo- or exo-cytosis
Molecules and complexes
enter/accept
Mol1
exit/expel
Mol2
P
Q
R
S
R
S
Complex
P
Q
merge+/merge-
Mol1 [P|merge+ c.Q]
Mol2[merge- c. R|S] |

Complex [P|Q|R|S]
Merge, enter, exit (with private channels) ~
Complex formation and breakage,
molecule re-localization
Vesicle merging
Vesicle
Cell
Cell
Single substrate reactions:
Enzyme and substrate as ambients
enter
S
exit
exit
X
Enzyme
enter
P
Bi-substrate reactions:
Inter-ambient communication
enter
S1
exit
exit
X
s2s
enter
S2
exit
enter
Y
Enzyme
P1
exit
enter
P2
Example: Multi-cellular system
(hypothalamic body weight
control system)
Efferent
signal
Glucose utilization in adipocytes
Fat cell mass
Leptin
expression
Input
IR
IRS-1
Insulin
expression
LR
tub
LR
JAK
STAT
NPY*/AgRP*
st order
NPY/AgRP expression
POMC
IRS-1
POMC*/CART*
cleavage
tub
CART
aMSH expression
NPY
NPYR
MC4
Gs
cAMP,PKA
nd
PFA
order
Orexin
MCH
LHA
PVN
PFA
LHA
PVN
TRH*
CRH*
Afferent
signal
Thyroid
axis
Controlled
system
Food intake
ARC
VMN
PVN
aMSH
AgRP
Gi
2
IR
JAK
STAT
1
Insulin resistance
OXY
Hypothalamic
Pituitary
Adrenal axis
Uterine
function
Energy expenditure
Weight gain / Weight loss
2
Conclusions
 The most advanced tools for computer
process description seem to be also the
best tools for the description of
biomolecular systems
 This intellectual economy validates the
decades-long study of concurrency in
computer science
 An essential foundation for the
forthcoming “Virtual Cell Project”