1-2 - Texas A&M University

Download Report

Transcript 1-2 - Texas A&M University

Using Motion Planning to Map
Protein Folding Landscapes
Nancy M. Amato
Parasol Lab,Texas A&M University
http://parasol.tamu.edu
Paper Folding via Motion Planning
Polyhedron
25 dof
(10 samples,
2 sec)
Box
12 (5) dof
(218 samples,
3 sec)
Soccer Ball
31 dof
(10 samples,
6 sec)
Periscope
11 dof
(450 samples,
6 sec)
Protein Folding via Motion Planning
Folding Paths for Proteins G & L
Protein G
Protein L
Protein Folding

We are interested in the folding process
– how the protein folds to its native structure

Different from protein structure prediction
– Predict native structure given amino acid sequence
– Native 3D structure is important b/c influences function
TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN
Why Study Folding Pathways?

Importance of Studying Pathways
prion protein
– insight into protein interactions & function
– may lead to better structure prediction algorithms
– Diseases such as Alzheimer’s & Mad Cow
related to misfolded proteins

Computational Techniques Critical
– Hard to study experimentally (happens too fast)
– Can study folding for thousands of already
solved structures
– Help guide/design future experiments
normal - misfold
Folding Landscapes
Configuration space

Each conformation has a
potential energy
– Native state is global
minimum

Potential

Native state
Set of all conformations
forms landscape
Shape of landscape
reflects folding behavior
Different proteins  different landscapes 
different folding behaviors
Using Motion Planning to Map
Folding Landscapes [RECOMB 01,02, 04; PSB 03]
Configuration space


Potential
A conformation
Native state
Use Probabilistic
Roadmap (PRM) method
from motion planning to
build roadmap
Roadmap approximates
the folding landscape
– Characterizes the main
features of landscape
– Can extract multiple
folding pathways from
roadmap
– Compute population
kinetics for roadmap
Related Work
Folding
landscape
Trajectory
(path #)
Path quality
Time dependent
(running time)
Folding kinetics
Native state
needed
Molecular
Dynamics
No
Yes (1)
good
Yes.
(very long)
No
No
Monte Carlo
No
Yes (1)
good
Yes
(very long)
No
No
Statistical
Model
Yes
No
N/A
No (short)
Yes (only
average)
Yes
Our PRM
approach
Yes
Yes
(many)
approximate
No (short)
Yes, multiple
kinetics
Yes
(RECOMB 01,
02,04, PSB 03)

Other PRM-Based approaches for studying molecular motions
– Other work on protein folding
([Apaydin et al, ICRA’01,RECOMB’02])
– Ligand binding
([Singh, Latombe, Brutlag, ISMB’99], [Bayazit, Song, Amato, ICRA’01])
– RNA Folding (Tang, Kirkpatrick, Thomas, Song, Amato [RECOMB 04])
Modeling Proteins
Primary Structure
TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN
One amino acid
Secondary Structure
+
a helix

+
Tertiary Structure
variable
loops
=
 sheet
We model an amino acid with 2 torsional
degrees of freedom:
– Standard practice by biochemists
Roadmap Construction:
Node Generation
•
Sample using known native state
–
–
•
sample around it, gradually grow out
generate conformations by randomly
selecting phi/psi angles
Criterion for accepting a node:
–
Compute potential energy E of each
node and retain it with probability:
Native state
N
Denser distribution
around native state
Ramachandran Plots for Different
Sampling Techniques
Uniform sampling
Gaussian sampling
Iterative Gaussian sampling
Distributions for different types:
Potential Energy vs. RMSD for roadmap nodes
all alpha
alpha + beta
all beta
Roadmap Construction
Node Connection
1. Find k closest nodes for each
roadmap node (k=20)
•
use Euclidean distance
2. Assign edge weight to reflect
energetic feasibility:
u
c1 c2 c3
…
cn
v
Edge weight w(u,v) = f(E(C1), E(C2),… E(Cn))
lower weight  more feasible
Native state
1
13
152
681
PRMs for Protein Folding: Key Issues
• Energy Functions
– The degree to which the roadmap accurately reflects
folding landscape depends on the quality of energy
calculation.
– We use our own coarse potential (fast) and well known
all atom potential (slow)
• Validation
– In [ICRA’01, RECOMB ’01, JCB ’02], results validated with
experimental results [Li & Woodward 1999].
One Folding Path of Protein A
A nice movie…. But so what?
Ribbon Model
B domain of staphylococcal protein A
Space-fill Model
Roadmap Analysis
Secondary Structure Formation Order
[RECOMB’01, JCB’02, RECOMB’02, JCB’03, PSB’03]
Order in which secondary structure forms during folding
helix
hairpin 1,2
Q: Which forms first?
Formation Time Calculation

Secondary structure has formed when x% of the native contacts
are present
– native contact: less than 7 A between Ca atoms in native state
10
30
20
40
time step at which
each contact forms
50
native contact
If we pick x% as 60%, then
at time step 30, three
contacts present, structure
considered formed
Contact Map
A contact map is a triangular
matrix which identifies all the
native contacts among
residues
Contact Maps
Secondary Structure Formation Order:
Timed Contact Map of a Path [JCB’02]
residue #
(IV:  1-4)
1-2
135
140 143
 1-4
142
140 143 140
141 142 144
139 143 143
Average T = 142
a
114
Formation order:
a,  3-4,  1-2,  1-4
protein G (domain B1)
 3-4
131
Secondary Structure Formation Order:
Timed Contact Map of a Path [JCB’02]
residue #
(IV:  1-4)
1-2
135
140 143
 1-4
142
140 143 140
141 142 144
139 143 143
Average T = 142
a
114
Formation order:
a,  3-4,  1-2,  1-4
protein G (domain B1)
 3-4
131
Secondary Structure Formation Order:
Validation Sample Summary
PDB
# of
Residues
#order
% of paths
Secondary structure formation order
Exp.
1GB1
56
2
66
34
a,3-4,1-2,1-4
a,1-2,3-4,1-4
Agreed
1BDD
60
1
100
a2,a3,a1,a2-a3, a1-a3
Agreed
1COA
64
2
90
10
a, 3-4, 2-3, 1-4, a-4
a, 3-4, 2-3, a-4, 1-4
Agreed
2AIT
74
66
9.1
7.4
4-5, 1-2 …
1-2, 4-5 …
Agreed
1UBQ
76
3
80
15
a,3-4,1-2, 3-5,1-5
3-4, a, 1-2, 3-5,1-5
Agreed
1BRN
110
4
75
8.3
a1,a2,a3 …
a1,a3,a2 …
Not sure
Detailed Study of Proteins G & L
[PSB’03]
Protein L
Protein G
Protein G
• Protein G & Protein L
• Similar structure (1 helix, 2 beta strands), but 15% sequence identity
• Fold differently
• Protein G: helix, beta 3-4, beta1-2, beta 1-4 [Kuszewski et al 1994, Orban et al. 1995]
• Protein L: helix, beta 1-2, beta 3-4, beta 1-4 [Yi & Baker 1996, Yi et al 1997]
• Can our approach detect the difference? Yes!
• 75% Protein G paths & 80% Protein L paths have “right” order
• Increases to 90% & 100%, resp., when use all atom potential
Helix and Beta Strands
Coarse Potential [PSB’03]
• Protein G:
(3- 4 forms first) over 2k paths analyzed
Analyze First x% Contacts
Contacts
all
hydrophobic
SS Formation Order
20
40
60
80
100
a, 3-, -, -
76
66
77
55
58
a, -, -, -
23
34
23
45
42
a, 3-, -, -
85
78
77
62
67
a, 3-, -, -
11
11
9
8
8
a, -, -, -
4
10
14
29
24
• Protein L:
1
4
3
(1- 2 forms first) over 2k paths
Analyze First x% Contacts
Contacts
all
hydrophobic
2
SS Formation Order
20
40
60
80
100
a, -, -, -
67
76
78
78
92
a, -, -, -
15
4
4
4
4
a, -, -, -
19
20
18
18
4
a, -, -, -
54
65
74
73
86
a, -, -, -
3
3
3
2
2
a, -, -, -
36
32
23
26
13
1
2
4
3
Helix and Beta Strands
All-atom Potential
• Protein G:
(3- 4 forms first)
Analyze First x% Contacts
Contacts
SS Formation Order
20
40
60
80
100
all
a, 3-, -, -
a, -, -, -
79
79
74
82
90
21
21
26
18
10
a, 3-, -, -
a, 1-, -, -
77
74
71
77
81
23
26
29
23
19
hydrophobic
• Protein L:
1
4
3
(1- 2 forms first)
Analyze First x% Contacts
Contacts
all
hydrophobic
2
SS Formation Order
20
40
60
80
100
a, -, -, -
100
100
100
100
100
a, -, -, -
99
100
99
99
99
a, -, -, -
1
0
1
1
1
1
2
4
3
Summary: PRM-Based Protein Folding
• PRM roadmaps approximate energy landscapes
• Efficiently produce multiple folding pathways
– Secondary structure formation order (e.g. G and L)
– better than trajectory-based simulation methods, such as Monte
Carlo, molecular dynamics
• Provide a good way to study folding kinetics
– multiple folding kinetics in same landscape (roadmap)
– natural way to study the statistical behavior of folding
– more realistic than statistical models (e.g. Lattice models, Baker’s
model PNAS’99, Munoz’s model, PNAS’99)
RNA Folding Results
X. Tang, B. Kirkpatrick, S. Thomas, G. Song
[RECOMB’04 ]
RNA energy landscape can be completely described by huge roadmaps.

Heuristics are used to approximate energy landscape using small roadmaps.

Our roadmaps contain many folding pathways.
Energy profile

Folding Steps
Population kinetics analysis on the roadmaps shows that heuristic 1 can efficiently
describe the energy landscape using a small subset of nodes
Population
Folding Steps
Map3 (Heuristic 2): 33 Nodes
Population
Map2 (Heuristic 1): 15 Nodes
Map1 (Complete): 142 Nodes
Population

Folding Steps
Folding Steps
Ligand Binding
[IEEE ICRA`01]
• Docking: Find a configuration of the ligand near the protein that satisfies
geometric, electro-static and chemical constraints
• PRM Approach (Singh, Latombe, Brutlag, 1999)
– rapidly explores high dimensional space
– We use OBPRM: better suited for generating conformations in binding site (near
protein surface)
• Haptic User interaction
– haptics (sense of touch) helps user understand molecular interaction
– User assists planner by suggesting promising regions, and planner will postprocess and ‘improve’
Contact Information
For more information, check out our
website:
http://parasol.tamu.edu/~amato/
Credits:
My students: Guang Song (now a Postdoc at Iowa
State), Shawna Thomas, Xinyu Tang
&
Ken Dill (UCSF) and Marty Scholtz (Texas A&M)