Programming Languages for Biology

Download Report

Transcript Programming Languages for Biology

Programming Languages for
Biology
Bor-Yuh Evan Chang
November 25, 2003
OSQ Group Meeting
Biological Perspective
F
FF
F [http://www.nocturnalvisions.freeservers.com/page6.html]
FF [Matsudaira et al. Molecular Cell Biology 4.0. Freeman, 2000]
11/25/2003
FF
FF
2
Traditional Biological Research
• Experiments must focus on a small, specific
piece of a system
– isolate the variable
– feasibility
• Have led to an enormous wealth of
(detailed) knowledge but in a fragmented
form
Virus Expert
11/25/2003
Cell Receptor Expert
3
Systems Biology
• Emerging area of biology
– study of the relationships and interactions
between biological components
– many thousand of molecules interact in
complex series of reactions to perform some
function (called a pathway)
• e.g., lactose interacting with a receptor triggers a
series of actions to create the enzyme capable of
breaking it down into usable form
– “pathways” may overlap
11/25/2003
4
Approaching Systems Biology
• Need a common language of
describing/modeling all components of a
system
– must be modular, compositional, and provided
varying levels of abstraction
• Abstraction is an absolute necessity
– 1 ribosome (eukaryotic) ¼ 82 proteins + rRNA
• 1 protein ¼ hundreds/thousands amino acids
– 1 membrane ¼ thousands of molecules (lipids,
proteins, carbohydrates)
11/25/2003
5
The Biologist’s View
• How do biologists think about or view
biological entities (e.g., proteins)?
– an entity can interact with certain other types of
entities
– an entity can be in a certain “state”
– interaction causes some action or state change
• Analogous to a system of thousands of
concurrent computational processes
– Walter Fontana, a theoretical biologist,
examined -calculus and linear logic for
describing biological systems (¼1995).
11/25/2003
6
Example “Textbook” Description
http://vcell.ndsu.nodak.edu/~christjo/vcell/animationSite/lacOperon/
11/25/2003
7
Our Role
• Finding suitable abstractions for describing
computation is our specialty!
• Discovering/proving/checking properties of such
descriptions (i.e., programs) is also our specialty!
• Goal:
– Find a mathematical abstraction convenient for
describing, reasoning, simulating biological systems
• DNA ! string over the alphabet {A,C,G,T}
– enables the use of string comparison algorithms
• Cellular Pathways ! ?
11/25/2003
8
Outline
•
•
•
•
•
Why PL is at all related to Biology?
Previous Abstractions in Biology
Possible Directions of Work
PML
Conclusion
11/25/2003
9
Previous Abstractions
• Chemical kinetic models
– can derive differential equations
– well-studied, with considerable theoretical basis
– variables do not directly correspond with
biological entities
– may become difficult to see how multiple
equations relate to each other
11/25/2003
10
Previous Abstractions
• Pathway Databases (e.g., EcoCyc, KEGG)
– store information in a symbolic form and provide ways
to query the database
– behavior of biological entities not directly described
• Petri nets
– directed bipartite multigraph (P,T,E) of places,
transitions, and edges; places contain tokens
– place = molecular species, token = molecule, transition
= reaction
2
11/25/2003
11
Previous Abstractions
• Concurrent computational processes
– each biological entity is a process that may
carry some state and interacts with other
processes
– each process described by a “program”
– prior proposals based on process algebras,
such as the -calculus [Regev et al. ’01]
11/25/2003
12
Possible Directions of Work
• Biologically-motivated “process calculi”
– finding a suitable machine model to serve as a common
basis for describing biological systems
– Cardelli, Danos, Laneve, …
• High-level languages
– find suitable high-level languages to make descriptions
closer to informal ones
– [Chang and Sridharan ’03]
• Program analyses, simulation, and other tools
– simulation will likely be insufficient
• Creating models for obtaining results in biology
11/25/2003
13
Outline
•
•
•
•
•
Why PL is at all related to Biology?
Previous Abstractions in Biology
Possible Directions of Work
PML
Conclusion
11/25/2003
14
Modeling in the -calculus
• The -calculus is concise and compact, yet
powerful [Milner ’90]
– take this as the underlying machine model
– not looking for another machine model
• However, it is far too low-level for direct
modeling (ad-hoc structuring)
11/25/2003
15
Informal Graphical Diagrams
k-1
Protein
Enzyme
sites
k
Protein
rules
Protein
Enzyme
kcat
Enzyme
domains
11/25/2003
16
PML: Enzyme
parameterized
bind_substrate
Enzyme
declared in outer
scope
interactions within
the complex
11/25/2003
17
PML: Protein
Protein
11/25/2003
bind_substrate
Protein
bind_product
18
PML: A Simple System
11/25/2003
19
Larger Models
• Modeled a general description of ER
cotranslational-translocation
– unclearly or incompletely specified aspects
became apparent
• e.g., can the signal sequence and translocon bind
without SRP? Yes [Herskovits and Bibi ’00]
• Extended to model targeting ER membrane
with minor modifications
11/25/2003
20
PML: Summary
• Domains
– set of mutually dependent binding sites
– defines at the lowest-level the reactions a biological
entity can undergo
• Groups
– static structure for controlling namespace
– may represent a large biological entity
• large complex, a system, etc.
• [Compartments]
– special groups that define boundaries
• Semantics defined via a translation to the calculus
11/25/2003
21
PML: Summary
• Benefits
– easier to write and understand because of a more
direct biological metaphor
– block structure for controlling namespace and
modularity
• Future Work
–
–
–
–
–
–
11/25/2003
naming?
proximity of molecules
integrating quantitative information (reaction rates, etc.)
type-checking PML specifications
exceptional / higher-level specifications
graphical and simulation tools
22
Conclusion
• Systems biology needs a mathematical
foundation
– languages for describing concurrent computation seem
like a step in the right direction
• Status: all very preliminary
– biologically-motivated process calculi
• BioSPI, BioAmbients, Brane Calculus, …
– high-level languages
• PML
– analyses and tools (emerging)
– creating models for results in biology (emerging)
11/25/2003
23
Conclusion
• Abundance of new challenges for PL
– language design: biologically-motivated
operators
– analysis and simulation: dealing with the scale
–…
• How much biology does one need to learn
to begin?
11/25/2003
24
Bonus Slides
Compartments
Compartments
• Critical part of biological pathways
– prevents interactions that would otherwise
occur
• Description of the behavior of a molecule
should not depend on the compartment
• Regev et al. use “private” channels in the calculus for both complexing and
compartmentalization
11/25/2003
28
PML: Simple Compartments Example
MolB
MolA
bind_a
11/25/2003
bind_a
29
PML: Simple Compartments Example
ER
Cytosol
MolB
11/25/2003
CytERBridge
MolA
30
PML: Simple Compartments Example
ER
Cytosol
MolB
11/25/2003
CytERBridge
MolA
31
Semantics of PML
Semantics of PML
• Defined in terms of the -calculus via two
translations
– from PML to CorePML
• “flattens” compartments, removes bridges
11/25/2003
33
Semantics of PML
– from CorePML to the -calculus
11/25/2003
34
Syntax of PML
Syntax of PML
11/25/2003
36
Syntax of PML
11/25/2003
37
Example: Cotranslational
Translocation
Example: Cotranslational Translocation
• Ribosome translates mRNA exposing a signal
sequence
• Signal sequence attracts SRP stopping translation
• SRP receptor (on ER membrane) attracts SRP
• Signal sequence interacts with translocon, SRP
disassociates resuming translation
• Signal peptidase cleaves the signal sequence in
the ER lumen, Hsc70 chaperones aid in protein
folding
11/25/2003
39
Example: Cotranslational Translocation
11/25/2003
40
Example: Cotranslational Translocation
11/25/2003
41
Example: Cotranslational Translocation
11/25/2003
42
Example: Cotranslational Translocation
11/25/2003
43
Example: Cotranslational Translocation
11/25/2003
44
Example: Cotranslational Translocation
11/25/2003
45
Example: Cotranslational Translocation
11/25/2003
46