Discovering Cellular Machinery

Download Report

Transcript Discovering Cellular Machinery

Understanding Science Through
the Lens of Computation
Richard M. Karp
Nov. 3, 2007
The Power of the Computational
Perspective
• Exposes the computational nature of natural
processes and provides a language for their
description.
• Brings to bear fundamental algorithmic concepts:
adversarial and probabilistic models, asymptotic
analysis, intractability, computational learning
theory, threshold behavior, fault tolerance, …
• Alters the worldviews of many scientific fields.
Computational Processes in the
Sciences
• Regulation of protein production,
metabolism and embryonic development
• Mechanisms of learning
• Molecular self-assembly
• Strategic behavior of companies
A Computational View of
Quantum Mechanics
• QM is the right setting for studying computation at
subatomic levels.
• Theoretical computer science provides a
mathematical model for a quantum computer, in
which the unit of information is the qubit, a
superposition of the values 0 and 1; n qubits exist
in 2n states at the same time.
• There is evidence that quantum computers are
more powerful than classical computers.
• The quest to realize quantum computation will test
the foundations of QM.
“Quantum Computation is as
much about testing Quantum
Physics as it is about building
powerful computers”
Umesh Vazirani
Links Between Statistical Physics
and Computer Science
• Both fields study how macroscopic
properties of large systems arise from local
interactions.
– Statistical physics: properties of water and
magnetic materials
– Computer science: global properties of World
Wide Web, structure of complex combinatorial
problems
Similarities of Models and
Methods
• Probabilistic models capture statistical
behavior of large, complex, heterogeneous
and incompletely known systems.
• Phase transitions in statistical physics have
close parallels with sharp thresholds in
computer science.
Areas of Convergence
• Constraint satisfaction problems
• Belief propagation and error-correcting
codes
• Markov Chain Monte Carlo
• Percolation and sensor networks
Computational Models of the
Web
• The Internet and the Web are
simultaneously computational, social and
economic. They support new modes of
interaction.
• Novel algorithmic problems: ranking
methods of search engines, reputation
systems, recommendation systems,
mechanism design.
Social Sciences and the Web
• The Web is a powerful laboratory for
studying social and economic systems as
computational processes.
• Insights from algorithmic game theory are
indispensable for understanding the new
markets and economic mechanisms that the
Internet has spawned.
“The Internet is an equilibrium,
we just have to identify the
game.”
Scott Shenker
Game Theory
Payoff Matrix
Left Middle Right
Top
0/0
1,2
2,1
Middle 2,1
0.0
1,2
Bottom 1,2
2,1
0,0
• Nash equilibrium: each player chooses
each pure strategy with probability 1/3
The Computational Lens on
Game Theory
• Finding a Nash equilibrium is an intractable
problem.
• But computational tractability is an
important modeling prerequisite.
• “If your laptop can’t find it, then neither can
the market.”
Kamal Jain
Computational Processes in
Biology
• Learning in neural networks
• Response of immune system to an invading
microbe
• Specialization of cells during embryonic
development
• Collective behavior of animal communities:
flocking of birds, self-organization of ant colonies
• Design of sensor-actuator control systems for
regulation of biological processes
• Evolution of species
Algorithmic Challenges in
Computational Molecular
Biology
Revolution in Biology
• Advances in computation and instrumentation
enable a quantitative characterization of biological
systems.
• Opportunity to advance understanding of
molecular processes of life and change the ways
we diagnose and treat disease.
• Multidisciplinary field: involves the biological,
physical, engineering and mathematical sciences.
Biological background
The eukaryotic cell
Goals of Computational
Molecular Biology
• Sequence and compare the genomes of
many organisms.
• Identify the genes and determine the
functions of the proteins they encode.
• Understand how genes, proteins and other
molecules act in concert to control cellular
processes.
Goals of Computational
Molecular Biology
• Trace the evolutionary history and
evolutionary relationships of existing
species.
• Understand the structure, function and
evolutionary history of proteins.
• Identify the associations between genetic
mutations and disease.
Regulation of Gene Expression
• Animals can be viewed as highly complex,
precisely regulated spatial and temporal
arrays of differential gene expression.
• Gene expression is regulated by a complex
network of interactions among proteins,
genomic DNA, RNA and chemicals within
the cell.
Levels of Regulation
• Genome: spells the names of the proteins.
• Transcription of genes to mRNA: regulated by
binding of transcription factors to DNA in control
regions of genes.
• Translation of mRNA into functioning proteins,
regulated by complex networks of protein-protein
and protein-RNA interactions, and by posttranslational modifications of proteins.
Levels of Regulation (Cont.)
• Regulation of metabolic processes: complex
network of chemical reactions catalyzed by
enzymes.
• Global phenotype such as disease: regulated
by interaction of many metabolic processes.
Regulatory Networks
“We can approach understanding how the whole
genome works by breaking it down into groups of
genes that interact strongly with each other. Once
researchers identify and understand these network
modules, the next step will be to figure out the
interactions within networks of networks, and so
on until we eventually understand how the whole
genome works, many years from now. ”
Garrett Odell
Key Research Areas
• Analysis of protein-DNA interactions: breaking
the cis-regulatory code.
“ Regulatory interactions mandated by circuitry
encoded in the genome determine whether each
gene is expressed in each cell, throughout
developmental space and time, and, if so, at what
amplitude.” Eric Davidson
• Analysis of protein-protein interactions:
identification of molecular machines and signal
transduction cascades.
Tools for Analysis
• Measurement of protein-DNA and protein-protein
interactions, and of mRNA production under
selected perturbed conditions.
• DNA sequence analysis to identify genes, their
regulatory regions and the transcription factor
binding sites within them.
• Phylogenetic analysis to identify regulatory
structures conserved across species.
• Classification of proteins according to structure
and function.
Regulatory Control of
Transcription
• E.H. Davidson Genomic Regulatory
Systems
• Regulatory interactions mandated by
circuitry encoded in the genome determine
whether each gene is expressed in each cell,
throughout developmental space and time
and, if so, at what amplitude.
The Ultimate Goal
• ``Portions of the endo16 cis-regulatory system of
Strongylocentrotus are to date the most
extensively explored of any, with respect to the
functional meaning of each interaction that takes
place within them. What emerges is almost
astounding: a network of logic interactions
programmed into the DNA sequence that amounts
essentially to a hardwired biological
computational device.
Analysis of Protein-DNA
Interactions
• Transcription is regulated by proteins called
transcription factors that bind to DNA near the
transcription start site of a gene and influence the
rate of transcription.
• Goals: identify the transcription factors,
characterize the sites they bind to in the genome,
and determine how they act in combination to
enhance or inhibit transcription. This information
is referred to as the cis-regulatory code.
Transcription Factors and
Binding Sites
Binding Motifs
• Motif: short sequence pattern recognized by
a transcription factor.
• Occurs repeatedly in the genome, but with
considerable stochastic variation. Some
positions are highly conserved, others
exhibit great variation.
• Certain combinations of motifs occur
repeatedly in clusters.
Occurrences of A Motif
5’- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT …HIS7
5’- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG …ARO4
5’- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT …ILV6
5’- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC …THR4
5’- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA …ARO1
5’- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA …HOM2
1:
2:
.
.
.
.
.
M:
AAAAGAGTCA
AAATGACTCA
AAGTGAGTCA
AAAAGAGTCA
GGATGAGTCA
AAATGAGTCA
GAATGAGTCA
AAAAGAGTCA
Informativeness:
2+∑bpbllog2pbl
5’- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA …PRO3
1YSA
Regulatory Module
• Set of mutually cooperating transcription factors
that can bind to the control regions of many genes
to enhance or inhibit transcription.
• Given a database of transcription factors and their
binding motifs, can identify such modules by
searching for sets of transcription factors whose
binding sites tend to co-occur in control regions.
Discovery of Conserved Protein
Networks
Signaling and regulatory pathways and
protein complexes consist of genes, proteins
and small molecules ``wired’’ together in a
complex network of intermolecular
interactions.
Two proteins interact if they bind to each
other or one of them binds to the control
regions of the other’s gene.
Conserved Complexes and
Pathways
Many complexes and pathways are conserved – they
have evolved over evolutionary time and occur, in
modified forms, in many organisms.
Our goal: using databases of protein-protein
interactions in several species, in conjunction
with data about protein sequence, structure,
function and expression, identify conserved
protein complexes and signaling pathways.
Discovery of conserved pathways and complexes
allows transfer of functional annotation and
prediction of interactions from one species tto
another.
Pairwise Network Alignment
Bacterial pathogen
(Helicobacter pylori)
http://www.pathblast.org
Baker’s yeast
(Saccharomyes cerevisiae)
Kelley et al. PNAS 2003
Sharan et al. RECOMB 2004
3-way Comparison
S. cerevisiae
• 4389 proteins
• 14319 interactions
C. elegans
• 2718 proteins
• 3926 interactions
D. melanogaster
• 7038 proteins
• 20720 interactions
Putative Complexes Conserved
in Yeast, Worm and Fly
173 complexes associated with:
DNA, RNA and phosphorus metabolism
intracellular transport
regulation of transcription
protein folding, synthesis and degradation
homeostasis
cell proliferation, development and growth
RNA localization
Future Work
• Identifying the proteins that interact to form
a molecular machine is only a first step. The
next, and far more difficult step, is to
construct detailed predictive dynamical
models of the behavior of molecular
machines, under both normal and perturbed
conditions.