Transcript PowerPoint
Protein Structure Prediction
[Based on Structural Bioinformatics, section VII]
.
Predicting protein 3d structure
Goal: 3d structure from 1d sequence
What kind of fold the
given sequence may adopt?
An existing fold
Fold recognition
Comparative
modeling
A new fold
ab-initio
Measuring progress
CASP – Critical Assessment of Structure
Prediction
CAFASP – Critical Assessment of Fully Automated
Structure Prediction
Targets: unpublished NMR or X-ray structures
Goal: predict target 3d structure and submit it
for independent and comparative review
What Forces Hold the Structure?
Hydrogen Bonds
What Forces Hold the Structure?
• Charge-charge interactions
• Positive charged groups prefer to be situated
against negatively charged groups
• Hydrophobic effect
What Forces Hold the Structure?
Disulfide
bonds
S-S bonds between
Cysteine residues
Homology modeling
Based on the two major observations:
1.
The structure of a protein is uniquely defined
by its amino acid sequence.
2.
Similar sequences adopt practically identical
structures, distantly related sequences still
fold into similar structures.
Growth of the Protein Data Bank
Fraction of New Folds
Two zones of sequence alignment
[Rost, Protein Eng. 1999]
The 7 steps to homology modeling
1.
Template recognition and initial alignment
― BLAST, FASTA
2.
Alignment correction
― Better alignment, MSA
The 7 steps to homology modeling
3.
Backbone generation
― Copy backbone atoms [and side-chains of
conserved residues]
4.
Loop modeling
― Knowledge based
― Energy based
The 7 steps to homology modeling
5.
Side-chain modeling
― Rotamer: a low energy
side-chain conformation
― Rotamer library [backbone
independent, dependent]
― HUGE search space [~5N]
High accuracy for residues in the hydrophobic
core [90%], much lower for residues in the
surface [50%]
The 7 steps to homology modeling
6.
Model optimization
― Predict the side-chains, then the resulting
shifts in the backbone, then the rotamers
for the new backbone …
7.
Model validation
― Calculating the model’s energy
― Determination of normality indices:
―
―
―
bond lengths, bond and torsion angles
Inside/outside distribution of polar residues
Radial distribution function
Predicting protein 3d structure
Goal: 3d structure from 1d sequence
What kind of fold the
given sequence may adopt?
An existing fold
Fold recognition
Comparative
modeling
A new fold
ab-initio
Fold recognition
Which of the known folds is likely to be similar to
the (unknown) fold of a new protein when only its
amino-acid sequence is known?
Fraction of new folds
(PDB new entries in 1998)
Koppensteiner et al., 2000,
JMB 296:1139-1152.
Unrelated proteins adopt similar folds
Only 100 folds account for ~50% of all protein
superfamilies
Possible explanations:
1. Divergent evolution
2. Convergent evolution
3. Limited number of folds
4. Misguided analysis
Proteins as seen by a Biologist
Does a new protein sequence belong to a given
family of proteins (with a specific set of
mutation rules)?
Fold recognition is based on:
• Sequence alignment, multiple sequence alignment
• Profile HMM, PSI-BLAST
Proteins as seen by a Physicist
“Thermodynamic hypothesis”: The native
conformation of a protein corresponds to a
global free energy minimum of the system
(protein + solvent)
Naïve approach: having a correct energy function,
search for the native structure in the
conformational space
Threading
Threading: energy based fold recognition
Define:
1. Protein model and interaction description
2. Alignment algorithm
4E
3. Energy parameterization
C3
E
Eaib j
positionsi, j
Eab
A
C
D
E
.
A C
-3 -1
-1 -4
0 1
0 2
. .
D
0
1
5
6
.
E …..
0 ..
2 ..
6 ..
7 ..
.
C2
A1
10
5
C
9
6 A
8
7 D
C
A
A
Find best fold for a protein sequence:
Fold recognition (threading)
1)
...
56)
...
MAHFPGFGQSLLFGYPVYVFGD...
-10
...
...
n)
...
-123
...
Potential fold
20.5
GenTHREADER
(Jones , 1999, JMB 287:797-815)
For each template provide MSA
align the query sequence with the MSA
assess the alignment by sequence alignment
score
assess the alignment by pairwise potentials
assess the alignment by solvation function
record lengths of: alignment, query, template
Essentials of GenTHREADER
Predicting protein 3d structure
Goal: 3d structure from 1d sequence
What kind of fold the
given sequence may adopt?
An existing fold
Fold recognition
Comparative
modeling
A new fold
ab-initio
Ab-initio folding
Goal: Predict structure from “first principles”
Requires:
A free energy function, sufficiently close to
the “true potential”
A method for searching the conformational
space
Benefits:
Works for novel folds
Shows that we understand the process
Ab-initio folding – the challenge
1.
2.
Current potential functions have limited
accuracy
The conformational space is HUGE
Possible simplifications:
Reduced representation
Simplified potentials
Coarse search strategies
Representation
Detailed representation – include all atoms of the
protein and the surrounding solvent
computational expansive
•
•
•
•
•
Implicit solvent models
United atom representation
Side-chain as centroid or cα
Restricted side-chain configurations (rotamers)
Restricted backbone torsion angles
Rosetta
[Simons et al. 1997]
•
•
“Structural” signatures are reoccurring within
protein structures
Use these as cues during structure search
I-sites Library – a catalog of local sequence-structure
correlations
Serine hairpin
Type-I hairpin
Frayed helix
Rosetta: a folding simulation program
fragments
Fragment insertion Monte Carlo
backbone torsion angles
accept or
reject
Choose a fragment
change
backbone angles
Energy
function
evaluate
Convert to 3D
Potential functions
•
Molecular mechanics – models the forces that
determines protein conformation
• Van der Waals: Lennard-Jones 12-6
• Electrostatic: Coulomb’s law
•
Scoring functions – empirically derived from
solved structures
• Useful with reduced complexity models
• Useful in treating aspects of protein
thermodynamics
Search methods
•
Molecular dynamics – Simulates the motion of a
molecule in a given potential
• Impractical …
•
Coarse sampling of energy landscape:
• Simulated annealing, genetic algorithms, …