Mathematical Models in Molecular Biology

Download Report

Transcript Mathematical Models in Molecular Biology

Mathematical Models in Molecular Biology
Harvey J. Greenberg
and
William L. Briggs
Mathematics Department
University of Colorado at Denver
What purpose does a mathematical model serve?
 Insight
– Identifying crucial dependencies
– Understanding dynamics
– Interaction effects
 Finding “best” experiments
– Guide to the most information for least cost ($, time)
– Learning (feedback) paradigm
 Ability to predict in silico
– Fundamental use of models
– Quality could be relative, rather than absolute
(measuring change could be accurate even if both predictions are off)
Some History
 Genetics – Statistics (Mendel, 1866)
 Population Genetics – Differential & difference eqns.
(Fisher, Wright, Sewall, 1920s)
 Epidemiology – Differential eqns, statistics (1950s)
 Neurology – Networks (McCulloch-Pitts, 1943),
Partial Differential eqns (Hodgkin-Huxley, 1952)
 DNA segments & cloning – Graph theory (Benzer, 1959)
Human Genome Project
Genome for E. coli (1997)
– 4.7 million base pairs
Human genome published (2001)
– 3 billion base pairs
Exponential Growth in Databases
Protein Data Bank
GenBank
Databases doubling less than every 18 months
(Defies Moore’s Law for growth of computer power)
Birth of a New Field
from Inevitable Marriage of Mathematics,
Computer Science, and Biosciences
Surge of data and computer power
Bioinformatics/Computational (Molecular) Biology
Math.
Models
Problems
 Sequencing
 Homology
 Phylogenetics
 Assembly
 Gene finding
 Gene mapping
 Structure recognition
 Structure prediction
 Pathway inference









&
Comp.
Methods
Graph theory
Combinatorics
Differential equations
Dynamical systems
Information theory
Neural networks
Optimization
Probability
Statistics
… much more
in vivo  in vitro  in silico
C
o
m
p
u
t
e
r
S
c
i
e
n
c
e
So much to learn!
Life
Biochemistry
DNA/RNA
Evolution
Organisms
Genes
Cells
Genomics
Proteomics
Instruments
Opportunities galore!
Alignment Models
What
 DNA – fragments, chromosomes, genes
 RNA – coils, sheets, turns
 Proteins – sequences, structures
How
 Minimizing edit distance
 Maximizing similarity
(used by BLAST for database searches)
DNA Alignment
Simplistic distance measure = # replacements:
GCTACTG
CGTCACT
D=6
Other evolutionary events – insertion/deletion: – GCTACTG
CGTCACT–
D = 2 + 2i
– reversal:
– GCT ACTG
CG TC ACT–
D = 2i + r
More evolutionary events can be accounted for with more complex
mathematical scoring, leading to challenges in algorithm design.
Protein Similarity at Native State
Contact map represents amino acid neighbors in native state
44 residues; 43 contacts
31 shared contacts
58 residues; 53 contacts
Source: R. Carr, G. Lancia and S. Istrail, RCOMB 2001.
Protein Folding
Predict

Primary Structure
= Sequence of amino acids
Tertiary Structure
= Folded protein (native state)
Lattice Model
Hydrophobic
Hydrophilic
hydrophobic contact
Score = # hydrophobic contacts
 Grossly oversimplified – yes, but
biology insights from surprise folds, not from best predictions
 NP-hard – yes, but
approximation algorithms getting better
 Mathematically complex – yes, but
new approaches under development (e.g., symmetry exclusion)
Phylogenetic Trees
Goal: understand evolutionary relations
(any scale – species to genes)
Models & Methods:
 Hierarchical clustering (of sequences)
 Maximum likelihood
 Maximum parsimony
Phylogenetic tree of placental
Campbell & Heyer
mammals with a marsupial as
the root. This tree used the 2,947 bp
nuclear sequences, which were available
for a wider range of species than the
longer 5,808 bp sequences (mixture of
nuclear and mitochondrial sequences).
The letters at each branch point indicate
a decreasing likelihood with “a” being
the most likely rating. Blue arrow
highlights the location of the human
branch.
Source: A.M. Campbell & L.J. Heyer
Discovering Genomics, Proteomics,
& Bioinformatics, 2003
example
Pathway Inference
Importance
 Discover cause of disease
 Find drug targets
 Predict drug side effects
 Find optimal drug dose
 Reduce animal models needed for testing
Mathematical Methods
 Boolean networks/Finite state machines
 Linear programming/Stoichiometry
 Logical/Integer programming
 Graph theory
 Differential equations
Ras-MAPK Cascade
(Boolean network of cell signaling)
Source: F. Schacherer
Equilibrium 4-cycle
ODE Models
S-systems
dS
Vmax S
dP
— = – ———— = – —
dt
KM + S
dt
Flux-Balance Analysis (FBA)
dx
— = Av – b
dt
Generalized Mass Action (GMA)
dxi
— =  rik  xj fijk
dt
k
j
Optimization Models
Objectives to set phenotype range:
• maximize growth
• minimize by-product production
• minimize mass nutrient uptake
Objectives to filtering pathways:
• maximize reliability
• minimize number of reactions
• minimize gene regulation
Constraints:
• Stoichiometric equations: Av = b (vj = flux of reaction j )
• Flux bounds: L  v  U
• Logical: conditional inclusion/exclusion
Mixed Integer Programming Model
optimize cv + dx : Av=b, Ljxj  v  Ujxj, xj {0, 1}
xj = 0  reaction j suppressed
inhibit pathway P: jP j xj  1
Turn off one member of pathway
(can choose, by some criteria)
Extends to include multiple gene regulation, with arbitrary
logical conditions to determine forced expressions and inhibitions.
Frontiers
 Better models
– Scope (depth; breadth)
– Flexibility (manipulate structures, parameters)
– Features (fragility, uncertainty)
 Better algorithms
– Scalability (parallel)
– Robustness
– Greater complexity & size
 Analysis support
– Visualization
– Structural analysis
– Simplification