No Slide Title

Transcript No Slide Title

Exploring landscapes . . .for protein folding, binding,
and fitness
energy
700 K replica
200 K replica
“important coordinates”
Effective potential
Exploring landscapes for protein folding
Important coordinates
• Effective potentials - the AGBNP all atom model
•Sampling - replica exchange molecular dynamics (REMD)
• Network models for polypeptide folding pathways and kinetics
Solvation Models
1990s
Explicit
2000s
Implicit
e, r, ...
•
Most detailed/(accurate)
• Continuum approximation
•
Thermodynamics requires
averaging over solvent
coordinates
• Based on solvent PMF
•
Computationally expensive #
degrees of freedom average
solvent reaction field
• Relative solvation free energies
from single point energy
calculations.
Typical Modern Implicit Solvent Model
G  Gnp  Gel
Non-Polar Component:
• Surface Area Model,
Gnp  A
Electrostatic Component:
• Continuum Dielectric Models
– Poisson-Boltzmann solvers (accurate but numerical and
slow).
– Generalized Born Models (faster, analytical).
The AGBNP Implicit Solvent Model
Analytical Generalized Born + Non-Polar
Parameter-free pairwise descreening
implementation.
Cavity/vdW dispersion
decomposition.
Combined with OPLS-AA: OPLS/AGBNP
Motivation:
• Applicable to small and large molecules, arbitrary functional groups:
• Suitable for MD sampling:
– Analytical with analytical gradients
• Computationally efficient
Gallicchio E., and R.M. Levy. J. Comp. Chem., 25, 479-499 (2004)
Generalized Born Models
- +
Charging free energy in linear dielectric
medium:
qi q j
G   1  1  1 
2  e in e w 
f (r )
ij ij ij
1/ 2
 2
 rij2

fij  rij  Bi B j exp   4 B B 

i j 



Bi is the Born radius of atom i defined by (in the Coulomb
field approximation):
qi2
qi2
i




1
1
1
1
1
1
Gsingle   

 

d 3r


2  e in e w  B
8  e in e w 
4
i
r

r
i
Vw

GB implementations differ mostly in the way that Born radii
are computed.
AGBNP: Pairwise Descreening Scheme
E. Gallicchio, R. Levy, J. Comp. Chem. 25, 479 (2004)
Born radii: rescaled pairwise descreening approximation:
1
1
  41  s j Qij
Bi Ri
j
i
Rescale according to self-volume of j:
sj 
V j (self)
Vj
Self-volume of j (Poincarè formula):
Vj (self )  V j  12
 Vjk  13  Vjkl 
k
kl
Gaussian functions gi to estimate overlap volumes:
V jkl   g j g k g l dV
j
Non-Polar Hydration Free Energy Estimator
Cavity and van der Waals dispersion decomposition
Gnp  Gcav  GvdW
Explicit solvent hydration free
energies of alkanes
1. SASA alone is unable to describe
hydration free energy of linear vs.
cyclic alkanes
2. Cavity component correlates well with
SASA
3. Solute-solvent Van der Waals energy
not directly correlated with SASA
depends on number of atoms
4. Hydration free energy results from a
balance between the opposing cavity
and van der Waals components
Gallicchio, E., M. Kubo, and R.M. Levy. J. Phys. Chem., 104, 6271-6285 (2000)
Replica exchange molecular dynamics
rough energy landscapes and distributed computing
700 K
MD
MD
MD
MD
MD
450 K
320 K
energy
200 K
Y. Sugita, Y. Okamoto
Chem. Phys. Let., 314,
261 (1999)
“important coordinates”
Replica exchange molecular dynamics
rough energy landscapes and distributed computing
MD
MD
MD
MD
MD
MD
450 K
320 K
200 K
700 K replica
walker 1
walker 2
walker 3
energy
replica
700 K
MD
walker 4
200 K replica
Y. Sugita, Y. Okamoto (1999)
Chem. Phys. Let., 314:261
“important coordinates”
Protein folding: REMD and kinetic network
models
• free energy surfaces of the GB1 peptide from
REM and comparison with experiment
• kinetic network models and folding pathways
Andrec M, Felts AK, Gallicchio E, Levy RM.. PNAS (2005) 102:6801.
F2
U2
F1
U1
• kinetic network model of REMD
(simulations of simulations)
Zheng W, Andrec M, Gallicchio E, Levy RM. PNAS (2007) 104:15340.
The -Hairpin of B1 Domain of Protein G
Folding nucleus of the B1 domain
Blanco, Serrano. Eur. J. Biochem. 1995, 230, 634.
Kobayashi, Honda, Yoshii, Munekata. Biochemistry 2000, 39, 6564.
Features of a small protein: stabilized by 1) formation of secondary structure
2) association of hydrophobic residues
Munoz, Thompson, Hofrichter, Eaton. Nature 1997, 390, 196.
Computational studies using Explicit and Implicit solvent models
Pande, PNAS 1999
Ma & Nussinov, JMB, 2000
Garcia & Sanbonmatsu, Proteins, 2001
Dinner,Lazaridis,Karplus,PNAS,1999
Pande, et al., JMB, 2001
Zhou & Berne, PNAS, 2002
The -Hairpin of B1 Domain of Protein G
Simple (surf area) nonpolar model
OPLS/AGBNP
The potential of mean force of the capped peptide.
A Felts, Y. Harano, E. Gallicchio, and R. Levy, Proteins, 56, 310 (2004)
The -Hairpin of B1 Domain of Protein G
Simple (surf area) nonpolar model
OPLS/AGBNP
-hairpin
-helix
G
> 90%
< 10%
~ 2 kcal/mol
The potential of mean force of the capped peptide.
A Felts, Y. Harano, E. Gallicchio, and R. Levy, Proteins, 56, 310 (2004)
Kinetic network models for folding
Andrec, Felts, Gallicchio & Levy (2005) PNAS, 102, 6801
Network nodes are snapshots from
multiple temperatures of a replica
exchange simulation.
800,000 nodes
7.4 billion edges
Tcold
Thot
Dynamical/kinetic considerations:
Transition rates (edges) are motivated by Kramers theory: transitions are allowed if there
is sufficient structural similarity, and forbidden otherwise.
Simulations are performed using the Gillespie algorithm for simulating Markov
processes on discrete states:
• Waiting time in a state is an exponential random variable with mean = 1/(Sj kij)
• Next state is chosen with probability proportional to kij
Equilibrium considerations:
Sufficiently long trajectories must reproduce WHAM results.
Connection between kinetic model and equilibrium populations
Equilibrium populations for temperature T0 are
preserved if for each pair of nodes (i, j) the ratio
of transition rates follows WHAM weighting:
node i from temperature
TAhaving energy Ei
node j from temperature TB
having energy Ej
where fA(0) and fB(0) are free energy weights
for the TA and TB simulations at reference
temperature T0
These weights are order-parameter
independent and will give correct PMFs
for any projection.
T-WHAM PMF at low temperature contains
information from high temperature simulations
The majority of beta-hairpin folding trajectories pass
through alpha helical intermediate states

t = 2500 units ≈ 50 µs


t = 9 units ≈ 180 ns

Fraction of hairpin conformation averaged over 4000 stochastic trajectories run at 300 K
and begun from an initial state ensemble equilibrated at 700 K.
91% of 4000 temperature-quenched stochastic trajectories begun from
high-energy coil states pass through states with -helical content
Andrec, Felts, Gallicchio & Levy (2005) PNAS, 102, 6801
Exploring stability and fitness landscapes: a
biophysical view of protein evolution
from DePristo, Weinreich & Hartl (2005) Nature Rev. Genetics 6:678
Approach:
• Analyze mutational patterns in sequence databases (information theory)
• Develop biophysical models
• Relate to structures
Example: Evolution of Drug Resistance
in HIV Protease
Study of 13,608 HIV-1
protease amino acid
sequences from the
Stanford HIV Drug
Resistance Database
(8,229 drug-naïve, 2,677
PI-monotherapy, 2,702
multi-drug therapy).
drug-naïve patients
patients treated with
1 protease inhibitor
patients treated with
2 or more protease
inhibitors
Pairwise and higher-order correlations among drugresistance mutations in HIV protease
residues
type
drug
contact?
distance
correlation
(φ value)
82-71
primaryaccessory
yes-no
17 Å
0.33
46-10
primaryaccessory
no-no
22 Å
0.31
10-71-82
triplet
???
10-46-71-82
quartet
????
20–32–46–48–53–54–58–74–82–90
exposed
to 2 drugs
exposed
to 6 drugs
10-mer
??????????
Sequence Correlations and Biophysics
Correlations reflect the distribution of protein stabilities:
folded sequences
sequence
correlations: f, Ic(2),
etc.
all possible sequences
G
• Narrow stability range implies sequence correlation
• Given sequence correlation, what can we say about stability and the fitness
landscape?
Distribution of charged residues in observed
sequences of HIV Protease
Goal: Relate the charge patterns (~ 1200 distinct patterns in 13,000 sequences)
to structure and stability
positive charges
6
8
9
10
11
12
13
14
15
16
17
18
7
1
11
36
1 31
18
1 4
1
1
negative charges
8
9
10
6
3
1
31
33
13
96
148 69
127 (WT)
171 120
82
127 60
22
17
7
2
4
1
4
3
1
11
12
8
29
5
1
1
Reduced model for electrostatics
• 3D structure of HIV Protease, charges placed on backbone and side chains of charged
residues
• Scoring uses The Generalized Born Model: AGBNP2
• Estimate stability as the difference in energies of observed sequences and random
sequences
Observed mutations increase electrostatic
stability of HIV protease
<Eelectrostatic> for:
probability
5
4
3
2
all mutations with net
charge (+3) and total
charge (19)
observed sequences
random sequences
total electrostatic energy (kcal/mol)
Conclusions
• Drug resistance mutations help HIV protease evade
drugs, but at the cost of stability and activity
• Evolution of accessory mutations compensate for this loss
of fitness
• Correlations amongst these mutations are complex and
involve higher-order interactions beyond just pairs of
residues (connected information)
• We are trying to develop biophysical models for these
mutational patterns which link sequences with structure,
and can explain e.g. the mutation patterns observed for
charged residues
Effective potential
Exploring landscapes . . . for protein folding, binding,
and fitness
Important coordinates
• Effective potentials for folding and binding (AGBNP)
Emilio Gallicchio
• Protein folding pathways, network models, and kinetics
Michael Andrec, Weihua Zheng, Tony Felts, E.G.
• Protein stability and fitness landscapes
Omar Hag, Michael Andrec, Alex Morozov
Effective potential
Exploring landscapes for protein folding and binding
using replica exchange simulations
Important coordinates
• The AGBNP all atom effective solvation potential & REMD
• Peptide free energy surfaces & folding pathways from all atom
simulations and network models
• Temp. dependence of folding: physical kinetics and replica
exchange kinetics with a network model
• Replica exchange on a 2-d continuous potential with an entropic
barrier to folding
Drug treatment and mutational
correlations
Higher-order
Higher-order
Given the observed distribution
interactions are 28%
of total correlation
interactions are 41%
of total correlation
we fit
to maximize the likelihood of the
observed sample and calculate their
Shannon entropies
The total contribution from correlation is
The contribution from 2-body interactions is
20–32–46–48–53–
54–58–74–82–90
10–20–33–36–46–
54–55–63–71–73–
74–82–84–90–93
Recovering Folding Kinetics using Replica Exchange
Simulations, a Kinetic Network model and
Effective Stochastic Dynamics
Weihua Zheng†, Michael Andrec‡, Emilio Gallicchio‡ and Ronald M. Levy‡

No Slide Title

Transcript No Slide Title

Directory