Transcript Talk
Means, Methods and Results in the
Statistical Mechanics of Polymeric Systems.
Toronto 21-22 June 2012
Exploring the Universe of
Protein Structures beyond the
Protein Data Bank
Flavio Seno
Dipartimento di Fisica e Astronomia
Universita’ di Padova
Protein structures
What are their distinctive properties?
secondary structures
stabilized by hydrogen bonds
Folds:arrangements of secondary
structures in the space
There is a limited set of folds:
same folds used to perform different
functions
There is not macroscopic evolution:
multiple separate discoveries during the
course of evolution
~ 7000 structures (new sequences) determined every year
Platonic folds : intrinsic features of the order of
nature (Denton and Marshall ,Nature 2002)
“SIMILARITY OF PROTEIN STRUCTURES IMPOSED BY
SOME PHYSICAL REGULARITIES” (Finkelstein-Ptitsyn 2002)
Are protein folds determined only by physical and geometrical laws
( crystal structures) and not by the chemistry of the amino-acid
sequence?
Is it possible to reproduce them in terms of general principles? Maybe
through an homopolymer that captures the main common features of all
the aminoacids?
Are the observed folds in a one to one correspondence with the whole
possible fold universe?
If not, why? Is there a selection principle?
Minimal Coarse-Grained Model
T.X. Hoang, L. Marsella, A. Trovato, J.R. Banavar, A. Maritan, F.S. PNAS, vol 103, 6883 (2006)
Ca - Representation
• Excluded volume (self-avoiding tube)
• Hydrogen bonding geometric constraint
• Hydrophobic interaction
• Local bending penalty
structures
in the
GroundHomopolypeptide
State Phase
Diagram
‘marginally compact’ phase
(compact + h-bonds) are protein-like
METADYNAMICS
A Laio, M Parrinello,
Escaping free-energy minima
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF, 99, 12562 (2002)
HOW TO FIND STABLE MINIMA WHICH ARE SEPARATED BY BARRIERS
THAT CANNOT CLEARED IN THE AVAILABLE SIMULATION TIME
THE METHOD IS BASED ON AN ARTIFICIAL
DYNAMICS (METADYNAMICS)
1) IDENTIFY COLLECTIVE VARIABLES S
WHICH ARE ASSUMED TO PROVIDE A
RELEVANT COARSE GRAINED DESCRIPTION
OF THE SYSTEM
2) TO BIAS THE DYNAMICS ALONG THESE
VARIABLES.
3) RUN IN PARALLEL SEVERAL MOLECULAR DYNAMICS
EACH BIASED WITH A METADYNAMIC POTENTIAL
4) SWAPS OF THE CONFIGURATIONS
S
ATOMISTIC MODEL
60 AMINO ACIDS POLYVALINE (VAL60)
• Why VAL? (is small but not too much)
• MD simulations with AMBER force field
and package GROMACS
• Bias-exchange METADYNAMICS with 6
replicas
• Six collective variables linked to secondary
structure elements
50 microseconds molecular dynamics simulation
We generate an ensemble of 30000 all-atom conformations
SIGNIFICANT SECONDARY STRUCTURE CONTENT AND SMALL RADIUS OF GYRATION
We verify they are local minima also for ALA-60
Structural quality resembles that of real protein
RAMACHANDRAN
PLOT
H-BOND ENERGY
COMPUTED WITH
PROCHECK
QUALITY
MEASURE
G-FACTOR
FRAGMENT
DISTANCE <
0.6 A
0.7 A
FIRST RESULT
FINDING BY MOLECULAR
DYNAMICS AT AN ALL-ATOM
LEVEL A LIBRARY OF 30000
PROTEIN LIKE STRUCTURES
http://datadryad.org/handle/10255/dryad.1922
RELATION BETWEEN VAL60
AND REAL PROTEINS
The Class Architecture Topology and Homologous
superfamily protein structure classification (CATH) is one of
the main databases providing hierarchical classification of
protein domain structures.
300 FOLDS
40 < L<75
SIMILARITY: TM-SCORE (Zhang Scolnick 2005)
ALLIGNMENTS OF SECONDARY
STRUCTURES ALLOWING INSERTIONS AND
DELETIONS (COVERAGE)
MINIMIZATION OF THE RELATIVE DISTANCE
BETWEEN ALIGNED RESIDUES (RMSD)
TM=0.45
1x9b
1ib8
1g29
COMPARISON
VAL60 VS CATH
40 < L < 75
300 FOLDS
1uxy
SECOND RESULT
THE COMPUTATIONAL SETUP
USED IN THIS WORK ALLOW US
TO EXPLORE THE MAJORITY OF
THE FOLDS IN NATURE (AT
LEAST FOR THESE LENGTHS)
COMPARISON
POLYVAL VS CATH
NOT ALL VAL60
ARE PRESENT IN
CATH!!!!!!!
TM =0.45
VAL60
7000
CATH
300
THIS MIGHT JUST DEPEND ON THE CHOSEN SIMILARITY THRESHOLD
DO STRUCUTRAL DESCRIPTORS DISCRIMINATE BETWEEN
CATH AND VAL60?
CONTACT ORDER:
Average sequence separation
between contacting residues
(related to folding rates Plaxco
Simons Baker 1998)
-Real protein strucures were selected
under a bias towards low CO
- protein structures are selected to be
topologically less entangled
THIRD RESULT
THERE IS NO ONE-TO-ONE
CORRESPONDENCE BETWEEN PDB
LIBRARY AND THE ENSEMBLE OF
COMPACT STRUCUTRES WITH
SIGNIFICANT SECONDARY
STRUCUTURE CONTENT (VAL60)
SUMMARY
• VAL60 SET IS REPRESENTATIVE OF REAL PROTEINS
(PROTEINS FOLDS SELECTED BY GEOMETRY AND SIMMETRY
AND NOT BY CHEMISTRY OF THE SEQUENCE)
• KNOWN FOLDS FORM ONLY A SMALL FRACTION OF
THE FULL DATABASE
• NATURAL FOLDS ARE CHARACTERIZED BY SMALL
CONTACT ORDER
WHY
KINETIC ACCESSIBILITY
HIGHER CO
HIGHER TENDENCY TO AGGREGATE?
APPLICATIONS
• REALISTIC DECOYS
• DESIGN NEW PROTEINS
• CHECK PREDICTIONS IN SYNTHETIC
BIOLOGY
• MODELS FOR MISFOLDED STRUCTURES
RELATED TO NEURODEGENERATIVE
DISEASES
COLLABORATORS
•
•
•
•
•
•
PILAR COSSIO (NIH WASHINGTON)
ALESSANDRO LAIO (SISSA TRIESTE)
DANIELE GRANATA (SISSA TRIESTE)
FABIO PIETRUCCI (CECAM – LAUSANNE)
AMOS MARITAN (PADOVA)
ANTONIO TROVATO (PADOVA)
Plos Computational Biology vol.6 e 1000957 (2010)
Scientific Reports 2, Art. No. 351 (2012)
CORRELATION BETWEEN POTENTIAL ENERGY AND
CONTACT ORDER FOR VAL60 AND ALA60 STRUCUTRES
Similarity between the VAL60 and CATH databases
CATH and VAL60 are explored with equal probability
Distribution of the radius of gyration for the VAL60,
VAL60+WATER,ALA60 and CATH 55–65 sets of structures.
Cα RMSD distributions for the 30,000 VAL60 and the
1500 ALA60 minimized through SD.
Probability of finding a structure in the VAL60
trajectory for different CO classes.
Number of independent structures
Bias Exchange Metadynamics
S Piana, A Laio, A bias-exchange approach to protein folding JOURNAL OF PHYSICAL CHEMISTRY B, 111, 4553 (2007)
IT IS AN APPROACH DESIGNED
FOR ACCELATING RARE
EVENTS IN VERY COMPLEXES
CASES IN WHICH THE
VARIABLES THAT ARE RELVANT
FOR THE PROCESS ARE MORE
THAN 2 OR 3
1)
2)
3)
List all the collective variables
Run in parallel several molecular dynamics each biased with a metadynamic potential
Swaps of the configuration
Are compact hydrogen-bonded polypetide structures in one-toone correspondence with protein structures
from the Protein Data Bank (PDB)?
Homopolypeptide ( side chain:C-beta atoms) with a very
minimal potential consisting of H-bonding, excluded volume,
and a uniform, pairwise attractive potential between side
chains.
YES!?
PNAS 103, 2605-2010 (2006)