Protein Folding Programs
Download
Report
Transcript Protein Folding Programs
Protein Folding
Programs
By
Asım OKUR
CSE 549
November 14, 2002
Protein Structure
DNA Sequence Protein Sequence
Structure (Mis)function
It is believed that all the information necessary
to determine the structure of a protein is
present in its primary sequence.
Protein Folding Programs
Protein folding is one of the biggest
computational challenges
Different types of folding and structure
predictions programs
Simulations
Homology Modeling Approaches
Simulations
Simulate the real behavior of proteins
High detail, short time scales
2 main simulation types
Molecular Dynamics
Monte Carlo
The Energy Function
Calculate energies for each particle
Since long range interactions important for each
pair of particles the pair-wise interactions
should be calculated
Aij Bij qi q j
Vn
E Pair K r (r req ) K ( eq ) 1 cos( n ) 12 6
2
R
R
R
bonds
angles
torsions
i j
ij
ij
ij
2
2
Homology Modeling
Template Selection and
Fold Assignment
Target – Template
Alignment
Model Building
Loop Modeling
Sidechain Modeling
Model Evaluation
Fold Assignment and Template
Selection
Identify all protein structures with sequences
related to the target, then select templates
3 main classes of comparison methods
Compare the target sequence with each database
sequence independently, pair-wise sequence –
sequence comparison, BLAST and FASTA
Multiple
sequence comparisons to improve
sensitivity, PSI-BLAST
Threading or 3-D template matching methods
Target – Template Alignment
Most important step in Homology Modeling
A specialized method should be used for
alignment
Over 40% identity the alignment is likely to be
correct.
Regions of low local sequence similarity become
common when overall sequence identity is under
40%. (Saqi et al., Protein Eng. 1999)
The alignment becomes difficult below 30%
sequence identity. (Rost, Protein Eng. 1999)
Model Building
Construct a 3-D model of the target sequence
based on its alignment on template structures
Three different model building approaches
Modeling by rigid body assembly
Modeling by segment matching
Modeling by satisfaction of spatial restraints
Accuracies of these models are similar
Template selection and alignment have larger
impact on the model
Screenshots from the Homology
Modeling Server Swiss-Model
• Construct a framework using known
protein structures
• Generate the location of the target
amino acids on the framework
• If loop regions not determined,
additional database search or short
simulations
Swiss-MOD Web Server
Procedure of the MODELLER
program
• After obtaining restraints run a
geometry optimization or realspace optimization to satisfy them
Errors in Homology Models
a. Errors in sidechain packing
b. Distortions and shifts in correctly aligned regions
c. Errors in regions without a template
d. Errors due to misalignment
e. Incorrect templates
Model Building Programs
COMPOSER
P
www-cryst.bioc.cam.ac.uk
CONGEN
P
www.congenomics.com/congen/congen.html
CPH models
S
www.cbs.dtu.dk/services/CPHmodels/
DRAGON
P
www.nimr.mrc.ac.uk/~mathbio/a-aszodi/dragon.html
ICM
P
www.molsoft.com
InsightII
P
www.msi.com
MODELLER
P
guitar.rockefeller.edu/modeller/modeller.html
LOOK
P
www.mag.com
QUANTA
P
www.msi.com
SYBYL
P
www.tripos.com
SCWRL
P
www.cmpharm.ucsf.edu/~bower/scrwl/scrwl.html
SWISS-MOD
S
www.expasy.ch/swissmod
WHAT IF
P
www.sander.embl-heidelberg.de/whatif/
Applications
Critical Assessment of protein
Structure Prediction (CASP)
Venclovas et al. Proteins, 2001
Critical Assessment of protein
Structure Prediction (CASP)
Venclovas et al. Proteins, 2001
Conclusions
Computer Simulations are powerful to show detailed
motions but they cannot cover long enough time spans
to simulate folding for large systems
Homology Modeling techniques can be successful if
the target protein has a known fold
The higher the sequence similarity the more likely the model
will be successful
With the implementation of better techniques the errors in
fold assignment, alignment, and sidechain and loop modeling
are decreasing
Theoretically, if at least one member of every possible fold is
known, it is possible to predict the structure of every coding
sequence to within a certain accuracy