Physics and structure of biomacromolecules

Download Report

Transcript Physics and structure of biomacromolecules

Physics and structure of
biomacromolecules
Konstantin Zeldovich
LRB 1004, x62354
Protein structure
•
•
•
•
•
PDB, the Protein Data Bank: ~63,000 structures
Primary, secondary, tertiary, … structure
Domains
Methods: X-ray and NMR
Computational approaches
• Diverse structures: from globular to knotted and
intrinsically disordered,
but a limited repertoire of ~1000 folds
Branden & Tooze, Introduction to protein structure
Interactions within a protein
•
•
•
•
•
•
Van der Waals
Hydrophobic forces
Electrostatic
Hydrogen bonds
Role of solvent
Hierarchy of energies (bond strength)
Many interactions of a similar energy scale (except chemical bonds).
Overall, a 300-residude protein has G ~ 5 kcal/mol
-per residue, a very small difference between folded and unfolded states
- SUBTLE BALANCE
Hydrophobic interactions drive folding to the compact structure
Thermodynamics of folding
Methods: calorimetry , thermal or chemical denaturation
Small proteins fold in a two-state fashion, folding is reversible
lysozyme
heat capacity
transition
state
unfolded
native
U
G
N
reaction coordinate
Privalov, J Chem Thermodyn 29: 447 (1997)
Kinetics of folding
For many proteins, folding rate is determined by their topology (contact order)
Contact order (CO) = average
sequence separation between
contacting residue pairs
Relative CO: normalized by chain
length
However: newer research suggests strong outliers; C.R. Matthews lab.
Plaxco et al, JMB 277:985 (1998);
Biochemistry 39:11177 (2000)
Most proteins are densely packed
Radius of gyration vs. chain length
V  a3 N
Rg3 ~ V
R ~ aN 1/ 3
All bacterial proteins from the PDB, June 2009
Anfinsen’s thermodynamic hypothesis
• Native state is entirely defined by sequence
• Native state is a minimum of free energy
– Unique
– Stable
– Kinetically accessible
All computational efforts depend on these ideas
Anfinsen, Science 181: 223 (1973)
How sequence defines structure?
•
•
•
•
Protein is a heteropolymer
How can a specific structure arise at all?
Protein-like sequences and energy gap
Folding landscape and “funnels”
Review papers:
Dill et all, Annu. Rev. Biophys. 2008 37:289-316
Shakhnovich, Chem. Rev. 2006 106:1559-1588
Onuchic, Luthey-Schulten, Wolynes, Annu. Rev. Phys. Chem. 1997 48:545-600
Toy models address basic questions
27-residue compact chain on 3x3x3 lattice
Conformational space is discrete, 103346 structures
Pairwise contact potentials: only nearest neighbors interact
Simulations are very quick
Discrete conformational space -> we can calculate the energies of the toy protein
in each and every of the possible configuration.
The configuration with the lowest energy is the native state
Lau & Dill, Macromolecules 22, 3986 (1989)
Shakhnovich & Gutin, J Chem Phys 93, 5967 (1990)
Proteins have a large energy gap
E
WEDNMIQAGWYCPLTRRHIFQFYCHFY
WHPCECQLLRYGNNDFRNLDMLFISFR
e  E0 / T
P(T )  103345
 Ei / T
e

i 0
Gap!
Also, a sparse spectrum for low E
compact lattice 27-mers with 10,000 possible conformations
Energy gap leads to stability
What is the probability to find a protein in its native state?
pN
e  EN / T
PN (T ) 
 M

 pi  e Ei / T
i 0
1
M
 ( Ei  E N ) / T
e

i 0

1
1  e ( E1  E N ) / T  ...
Gap!
The larger the gap, the more populated the native state is compared to other states
pN
protein
PN vs T is roughly equivalent to
CD spectra of thermal denaturation
random
polypeptide
T
Kinetics of folding and “funnels”
How does the protein find its native state?
Levinthal paradox: a brute-force search of all possible configurations
would be outrageously long. In reality, proteins fold in milliseconds.
Answer: the native state must be kinetically accessible
Empirically (from simulations),
a large gap is necessary for fast folding
The lower the energy, the more similar
conformations are.
Folding thus converges to the single native state
Dill et all, Annu. Rev. Biophys. 2008 37:289
To crystallize or to simulate?
To crystallize is hard, to sequence is cheap. Structure from sequence?
In a perfect world: knowing the all of the interactions, find the conformation
corresponding to the minimum energy. Voila, this is the native state.
Practical challenges:
-Interactions are not known exactly
-Interactions with solvent
-Very large parameter space (# bond angles ~# of atoms ~ 105)
-Rugged energy landscape with deep local minima – search algorithms are inefficient
•
•
•
•
Protein structure prediction
Homology modeling vs molecular simulations
Structural genomics
CASP competition
Threading using energies
Given a set of structures, determine which one is the best match for the given sequence
Rationale: the number of folds is limited
Thread the sequence into each structure (possibly with gaps), then
evaluate the energy of amino acid contacts.
Select the threading which yields the lowest energy (cf. the gap)
Works well even at low sequence homology
Jones, Taylor, Thornton, Nature 1992
Threading using profiles
profile
Residue type
position
For each position, assess:
-secondary structure
-fraction polar
-buried area, …
Average
over
homologous
sequences
with known
structures
A
32
-6
…
C
84
87
D
-92
34
E…
23
-5
Create profiles for different folds (using known structures with homologous sequences)
For a given sequence with unknown structure, match it to all profiles (with gaps)
Select the profile with best score.
Bowie, Luthy, Eisenberg, Science 1991
Homology modeling
Pairwise sequence alignment with PDB (BLAST)
Match to multiple seq.alignment (PSI-BLAST)
Threading, or 3D template matching to PDB
Rigid-body assembly
Segment matching (aligning conserved atoms)
Satisfaction of spatial restraints
Fold correctness? (by seq.similarity?)
Stereochemistry
Solvent accessibility
Positions of charged and hydrophobic groups
…
Marti-Renom,… Sali, Annu. Rev. Biophys. Biomol. Struct. 2000. 29:291–325
ab initio structure prediction
Anfinsen’s hypothesis:
-native structure is entirely determined by the sequence
-native structure is a unique energy minimum
Assuming we know interactions between the amino acids, can we just look for
this minimum???
Karplus, Scheraga, …
Polymer modeling is extensively used in materials science.
Is it applicable to proteins?
Two main methods: molecular dynamics
deterministic
reflects dynamics
and Monte Carlo
stochastic
no dynamics
Force fields and potentials
How do we know the strength of each interaction between atoms in a protein?
Ab initio approach: quantum chemistry can calculate the electron density
profiles , and thus the energy (isn’t a protein just one big Schroedinger equation?)
Potentials optimized to correctly predict known structures of small molecules
CHARMM, AMBER
Statistical approach: learn from the PDB by counting the contacts
Boltzmann law:
pij  e
U ij / RT
Inverting:
U ij   RT log pij   RT log
N ij
Ni N j
number of contacts
molar fractions
Training set must be carefully chosen: various folds, no homology, …
Miyazawa & Jernigan 1985, 1996


Molecular dynamics: F  ma
for a while
For i-th atom:
i
dU ij ( xi  x j )
dxi
U ij  U ijbond  U ijVdW  U ijelectr  U ijHB  ...
1
ai   Fij
m all j
j
vi  vi  ai t
xi  xi  vi t
x
force Fij  
Pros:
- Most detailed, most realistic
- True dynamics
Cons:
-Time-consuming
Trajectories of all atoms
time
Main issue: needs t ~ 10 12 s (picosecond)
to reproduce bond vibrations, but
folding occurs on microsecond to seconds
timescale so at least 107 iterations needed
Tools: AMBER, CHARMM, GROMACS, NAMD, …
Applications of molecular dynamics
•
•
•
•
Protein-ligand interactions
Dynamics of protein folding
Membrane proteins and ion channels
Sidechain packing
D.E.Shaw Research has developed a dedicated hardware supercomputer, Anton,
to run MD simulations much faster than any commodity clusters
hardware designed to run MD, using custom-built chips (ASIC and FPGA)
milliseconds are becoming accessible!
D.E.Shaw et al 2009, Proceedings of the ACM/IEEE
Conference on Supercomputing (SC09)
Monte-Carlo simulation
Sacrifices information about dynamics to better explore the full energy landscape
Trial move
energy
Eold
Enew
Elementary step:
Make a trial move, and accept or reject the new configuration
Enew  Eold
- always accept
Enew  Eold
- accept with probability
p  e ( Enew  Eold ) / k BT
(Metropolis sampling)
Different conformations are visited with the same frequency as in mol.dyn.
Monte-Carlo simulation (cont’d)
Typical moves are rotations around bonds
-local move, rotation of one atom rel. to its two neighbors
-global move, pivoting of the entire chain around a bond
Advantage over MD: no small/large timescale problem
However,
- no direct information about dynamics
- calculating rotations is expensive (trigonometry!)
Often used in coarse-grained simulations to explore large conformational space
and find basins of attraction (energy valleys).
If needed, these valleys can then be further explored by molecular dynamics
Tools: ProFASi
Hybrid techniques: I-TASSER
Wu, Skolnick, Zhang, BMC Biology 5:17 (2007)
Hybrid techniques: ROBETTA
Sequences parsed into putative domains
If homology is found, comparative modeling
If low homology, ab initio folding
3 or 9 residues fragment libraries are assembled
Selected decoys are clustered, cluster centroids
used as models
Sidechains repacked by MC simulations
using a rotamer library
Kim, Chivian, Baker, NAR 2004, vol. 32 W526–W531
Structural databases: SCOP, CATH
http://scop.mrc-lmb.cam.ac.uk/scop/
• Hierarchical structural
classification
• Class all-alpha, all-beta, alpha/beta, alpha+beta,
mulitdomain, membrane, small
• Fold
• Superfamily
• Family
Murzin et al, JMB 247:536(1995)
http://www.cathdb.info/
• Hierarchical domain
classification
• Class: mainly-alpha, mainly-beta and alpha-beta
• Architecture
• Topology (fold family)
• Homologous superfamily
Orengo et al, Structure 5:1093 (1997)
Tools & servers
PDB www.rcsb.org
Structure prediction servers and tools (just a few)
I-TASSER http://zhanglab.ccmb.med.umich.edu/I-TASSER/
ROBETTA http://robetta.bakerlab.org/
MODELLER http://salilab.org/modeller/
Molecular dynamics packages (general)
AMBER http://ambermd.org/
CHARMM http://www.charmm.org/
GROMACS http://www.gromacs.org/
NAMD http://www.ks.uiuc.edu/Research/namd/
Monte Carlo protein modeling
ProFASi http://cbbp.thep.lu.se/activities/profasi/
Structural biology software database
http://www.ks.uiuc.edu/Development/biosoftdb/