Summary of Protein and RNA Folding - Parasol Laboratory

Download Report

Transcript Summary of Protein and RNA Folding - Parasol Laboratory

Exploring Folding Landscapes
with Motion Planning Techniques
Bonnie Kirkpatrick
Montana State University
Dr. Nancy Amato
Guang Song
Xinyu Tang
Parallel Architectures, Algorithms,
and Optimizations Laboratory
Texas A&M University
Outline

Motivation: Biopolymers

Goal: Folding Landscapes

Method: Motion Planning

Application 1: Protein Folding

Application 2: RNA Folding
Motivation: Biopolymers
Protein and RNA Molecules

Protein and RNA molecules are a
complex 3-dimensional folding of a
sequence of bases.

Primary Structure
–
–
–
–

Sequence of bases
Each base is represented by a
letter of the alphabet
i.e. ACGUGCCAUCG
Obtained from experiment
Tertiary Structure
–
The sequence loops back on itself
and folds in 3-dimensions.
Tertiary representation of an
RNA molecule.
Tertiary Structure

Chemical bonds (or contacts) form
between complementary bases in close
proximity.

There are many possible conformations of
the primary sequence.
–
–

Example sequence: CACAGAGUGU
Two possible conformations are shown.
Potential energy calculations based on
number and types of bonds are used to
classify conformations.
–
–
The lowest energy conformation is known as
the native structure.
Conformations with few bonds and high
energy are referred to as unfolded.
Two possible
conformations of
the sequence.
Bonds are blue.
Goal: Folding Landscapes
The Folding Process (The Black Box)
Unfolded Conformation (high energy)
AGGCUACUGGGAGCCUUCUCCCC
Physical
Laws
cause
folding
Native Conformation (low energy)
Folding Landscapes



Description of the “black box”
A space in which every point
corresponds to a conformation
(or set of conformations) and
its associated potential energy
value (C-space).
A complete folding landscape
contains a point for every
possible conformation of a
given sequence.
Conformation
Space
Potential
Native State
Tetrahymena Ribozyme
Landscape
[Russell, Zhuang, Babcock, Millett,
Doniach, Chu, and Herschlag, 2002]
Folding Landscapes (cont.)

Conformational changes describe how a
molecule changes physically to fold from one
conformation to another
–
Continuous


–
Protein Folding Model
Bond angles change with continuous rotations
Discrete


RNA Folding Model
Bonds either exist or do not exist
Features of Folding Landscapes

Folding pathways consist of the set of
conformational changes a molecule is
likely to fold though when moving from
one conformation to another.
–

Native
State
Energy barriers are areas of the
landscape with high energy that separate
groups of conformations.
–

N to X to Y
Y is separated from X and N
Intermediate states are conformations
lying on the folding pathway represent
local minimums of potential.
–
Y and X
Mutant α mRNA fragment
[Chen and Dill, 2000]
A Protein Folding Pathway
unfolded
folded
A RNA Folding Pathway
Unfolded
Energy Barrier
Native
State
Phenylalanine tRNA [Hofacker, 1998]
Mapping Folding Landscapes
•
Existing techniques for mapping landscapes are
limited to relatively short sequences (~200
nucleotides).
•
A robotics motion planning technique called PRM
has successfully been applied to protein folding.
Method: Motion Planning
Motion Planning
(Basic) Motion Planning
(in a nutshell):
Motion Planning for
Foldable Objects:
Given a movable object, find a
sequence of valid configurations
that moves the object from the
start to the goal.
Given a foldable object, find a
valid folding sequence that
transforms the object from
one folded state to another.
start
goal
obstacles
[Kavraki, Svestka, and Latombe, 1996]
Probabilistic Roadmap Method (PRM)
Conformation space
Construct the roadmap:
1. Generate nodes.
2. Connect to form roadmap
The Roadmap is like
a net being laid
down on protein’s
potential landscape.
Potential
Native state
A conformation
Now the roadmap can be used:
1. To find a path
2. To extract multiple paths
Application 1: Protein Folding
Outline

Probabilistic Roadmap Methods (PRMs) for
Protein Folding
–
–

the native fold is known
bias sampling around native fold
Results
–
–
Protein folding landscapes
Secondary structure formation order and validation

–
timed contact map
Folding kinetics
[Song and Amato, 2001]
Model of a protein


amino acid: pair of phi/psi angles
protein: a sequence of amino acids.
– conformation node is:
PRM: Node Generation

Take advantage of the known native state.
–
–
–

map the potential landscape/funnel leading to it.
sample around it and gradually grow out.
generate conformations by randomly selecting phi/psi
angles
Criterion for accepting a node: Compute potential
energy E of each node and retain it with probability
P(E):
N
PRM: Node Generation


Start with native structure.
Gradually grow out.
Denser distribution
around native state
Native state
PRM: Roadmap Connection
1. Find k closest nodes for each
roadmap node
2. Assign edge weight to reflect
energetic feasibility:
lower weight  more feasible
[Singh, Latombe, Brutlag, 1999]
Native state
Energy Computation

Potential (ref. Levitt’83)
–
–

van der Waals + hydrogen bonds + disulphide bonds +
hydrophobic effect
All-atom model
Free Energy (ref. Fiebig & Dill ’93, Munoz & Eaton’99)
where
PRMs for Protein Folding: Key Issues

Validation
–

In RECOMB ‘01 (Song & Amato), our results
validated with hydrogen exchange experiments.
[Li & Woodward 1999]
Energy Functions
–
The degree to which the roadmap accurately
reflects folding landscape depends on the quality
of energy calculation.
Analysis of Landscape

Folding Potential Landscape

Secondary structure formation order
–
–

timed contact map
experimental validation
Studying Folding Kinetics
–
–
–
2-state folding kinetics
calculation of folding rates
identifying 2-state, 3-state, … k-state kinetics
Distributions for different types:
Potential Energy vs. RMSD for roadmap nodes
all alpha
alpha + beta
all beta
protein G (domain B1)
Timed Contact Map:
formation order for a Path
residue #
(IV:  1-4)
1-2
135
140 143
 1-4
142
140 143 140
141 142 144
139 143 143
Average T = 142

114
Formation order:
,  3-4,  1-2,  1-4
 3-4
131
Protein GB1 (56 amino acids)
—
1 alpha helix & 4 beta-strands
Validating Folding Pathways
Hydrogen Exchange Results
first helix, and beta 3-4
[Li & Woodward 1999]
Our Paths
80%: helix, beta 3-4, beta 1-2, beta 1-4
20%: helix, beta 1-2, beta 3-4, beta 1-4
• our paths are: from all the nodes with little
structure to the native state
• secondary structure formation order checked
on each path w/ timed contact map
Secondary structure formation order
and validation
PDB name
Num of
Residues
2nd structures
Comparison w/ Exp.
[Li & Woodward ’99]
1GB1
56
1 alpha + 4 beta
Agreed
1BDD
60
3 alpha
Agreed
1SHG
62
5 beta
N/a
1COA
64
1 alpha + 4beta
Agreed
1SRL
64
5 beta
N/a
1CSP
67
7 beta
N/a
1NYF
67
3 beta
N/a
1MJC
69
7 beta
N/a
2AIT
74
7 beta
N/a
1UBQ
76
1alpha + 5 beta
Agreed
1PKS
79
1 alpha + 5 beta
N/a
1PBA
81
3 alpha + 3 beta
N/a
2ABD
86
5 alpha
N/a
•Proteins primarily
from [Munoz & Eaton
PNAS’99] for
comparison purposes
•Contact us if you
want us to analyze
your proteins!
Folding kinetics:
statistical mechanical model




Proteins treated as statistical system
Define interactions => partition function
Free energy as a function of reaction coordinate (R)
Then decide folding kinetics and folding rate

[Munoz & Eaton, Alm & Baker. PNAS’99]

Assumption and limitation of statistical model
–
–
–

F
U
Very limited interactions to simplify partition function calculation
Assume selected reaction coordinate good (monotonically increasing)
Cannot provide folding trajectories
Strength: as a theoretical model, it is good for analysis
N
R
Free Energy Landscape:
2-state folding kinetics
our
roadmap
model
statistical
model
[Munoz & Eaton
PNAS’99]
Native Contacts
Blue, red and green lines are
three levels of approximation



Blue line: free energy average
Our method can produce similar results (plus more).
– 2-state folding kinetics indicated
Both plots ‘imply’ nativelikeness should always increase
Plots like these lose info because of averaging effect
Studying folding kinetics
at pathway level
A
B
cluster paths into several
groups
extract 2-state,3-state, …, kstate kinetics from same
roadmap
A
2-state
B
3-state
Average
2-state
not possible with statistical
mechanical models
Protein G
Studying folding kinetics
at pathway level
Native contacts
Free energy
• trajectories not
available from
statistical model
• native contact
not monotonically
increasing
•Diverse free
energy profiles
Protein Folding:
Conclusion & Future Work
• PRM roadmaps approximate folding landscapes
• Efficiently produce multiple folding pathways
– Secondary structure formation order
– better than trajectory-based simulation methods, such as
Monte Carlo, molecular dynamics
• Provide a good way to study folding kinetics
– multiple folding kinetics in same landscape (roadmap)
– natural way to study the statistical behavior of folding
– more realistic than statistical models (e.g. Lattice models,
Baker’s model PNAS’99, Munoz’s model, PNAS’99)
Application 2: RNA Folding
Outline






RNA Model using secondary structure
Conformation Space
Node Generation
Node Evaluation
Node Connection
Edge Weights
RNA Secondary Structure




Two-dimensional representation of the tertiary structure
Planar representation
Sufficient structural information
Pseudo knots are considered a tertiary structure, rather than a
secondary structure
Secondary Structure Formalized


A secondary structure conformation is specified by a set of
intra-chain contacts (bonds between base pairs) that follow
certain rules.
Given any two intra-chain contacts [i, j] with i < j and [k, l]
with k < l, then:
1) If i = k, then j = l
•
Each base can appear in only one contact pair
2) If k < j, then i < k < l < j
•
No pseudo-knots
Violates criteria (1)
Violates criteria (2)
C-space
PRM: Conformation Space

Let U be the set of every possible combination of contact pairs.
Where n is the number of possible contact pairs


Let C-space (the conformation space), C, be the sub-set of U
containing only valid secondary structures.
C-space is smaller than U, but is still very large.
–
–
–

Sequence: (ACGU)10
Length: 40 nucleotides
C-Space: 1.6x108 structures
Purpose: generate nodes in C-space that describe the space
without covering it
C-space
PRM: Node Generation

Random Node Generation Algorithm
–
–
–


Starting with an empty configuration, c, random contacts are
added to c one at a time.
Each step preserves the condition that c contains a valid set
of base pair contacts.
Contacts are added until there are no more contacts that do
not conflict with the contact set of c.
Every node generated has valid secondary structure
and is a member of C-space.
Since every generated node has the maximal
number of contacts, the sampling is biased toward
the area of C-space near the native state.
PRM: Node Evaluation

Evaluation of Nodes
–
–
–
Potential energy determines how good a node is.
Only add a node to the roadmap if it has a low
energy.
Probability of a node q being added to the
roadmap:
PRM: Node Connection

Given two nodes in C-Space, C1 and C2, find a path between
them consisting of a sequence of nodes:
{ C1 = S1, S2, …, Sn-1, Sn = C2 }
C1 = S1

S2
…
Sn-1
Sn = C2
The path must have the property that for each i, 1 < i < n, the
set of contact pair of Si differs from that of Si-1 by the application
of one transformation operation:
(1) open or (2) close
a single contact pair.
Node Connection (cont.)

There exists a path between any two nodes in
C-Space.

Not just any path will do; we want a good one.
Bad paths have high energy nodes in them.
How do we find the lowest energy path?


Node Connection (cont.)


more contacts  less potential energy
Heuristic: if a contact is opened by the
transition from one node to another, try to
close a contact in the next transition
c1 = s1:
..(.((..)))..
open
s2:
..(.(....))..
close
s3:
..(.((.).))..
open
s4:
..(..(.)..)..
close
C2 = s5:
..(.((.)).)..
Edge Weight



Depends on the nodes generated in the node
connection phase.
Difference in potential energy
ΔEi = E(si+1) – E(si)
Future Work

Analysis of the roadmap
–
–

Finding the low energy folding pathways
Shortest path algorithm
Validation
–
–
How do we know if our results agree with experimental results?
Proposal


–
Compare a fully enumerated roadmap to experimental folding rates
Compare a more sparse roadmap with a fully enumerated roadmap
Proposal


Solve the master equation using stacking pairs (they are
representative of all the dynamics) and our model
Compare our results with results from the Zhang and Chen’s statistical
mechanical model [2002]
References
Shi-Jie Chen and Ken A. Dill. Rna folding energy landscapes. PNAS, 97:646-651, 2000.
Ivo L. Hofacker. Rna secondary structures: A tractable model of biopolymer folding. J.Theor.Biol.,
212:35-46, 1998.
Ivo L. Hofacker Jan Cupal and Peter F. Stadler. Dynamic programming algorithm for the density of
states of rna secondry structures. Computer Science and Biology 96, 96:184-186, 1996.
L. Kavraki, P. Svestka, J. C. Latombe, and M. Overmars. Probabilistic roadmaps for path planning in
high-dimensional conguration spaces. IEEE Trans. Robot. Automat., 12(4):566-580, August 1996.
J. C. Latombe. Robot Motion Planning. Kluwer Academic Publishers, Boston, MA, 1991.
R. Li and C. Woodward. The hydrogen exchange core and protein folding. Protein Sci., 8:1571-1591,
1999.
John S. McCaskill. The equilibrium partition function and base pair binding probabilities for rna
secondary structure. Biopolymers, 29:1105-1119, 1990.
Ruth Nussinov, George Piecznik, Jerrold R. Griggs, and Danel J. Kleitman. Algorithms for loop
matching. SIAM J. Appl. Math., 35:68-82, 1972.
R. Russell, X. Zhuang, H. Babcock, I. Millet, S. Doniach, S. Chu, and D. Herschlag. Exploring the
folding landscape of a structured RNA. Proc. Natl. Acad. Sci., 99:155-60., 2002.
Proc. Natl. Acad. Sci. U.S.A. 99, 155-60.
D. Sanko and J.B. Kruskal. Time warps, string edits and macromolecules: the theory and practice of
sequence comparison. Addison Wesley, London, 1983.
A.P. Singh, J.C. Latombe, and D.L. Brutlag. A motion planning approach to exible ligand binding. In
7th Int. Conf. on Intelligent Systems for Molecular Biology (ISMB), pages 252-261, 1999.
G. Song and N. M. Amato. Using motion planning to study protein folding pathways. In Proc. Int. Conf.
Comput. Molecular Biology (RECOMB), pages 287-296, 2001.
Stefan Wuchty. Suboptimal secondary structures of rna. Master Thesis, 1998.