Protein Folding Programs

Download Report

Transcript Protein Folding Programs

Protein Folding
Programs
By
Asım OKUR
CSE 549
November 14, 2002
Protein Structure


DNA Sequence  Protein Sequence 
Structure  (Mis)function
It is believed that all the information necessary
to determine the structure of a protein is
present in its primary sequence.
Protein Folding Programs


Protein folding is one of the biggest
computational challenges
Different types of folding and structure
predictions programs
Simulations
 Homology Modeling Approaches

Simulations



Simulate the real behavior of proteins
High detail, short time scales
2 main simulation types
Molecular Dynamics
 Monte Carlo

The Energy Function


Calculate energies for each particle
Since long range interactions important for each
pair of particles the pair-wise interactions
should be calculated
 Aij Bij qi q j 
Vn
E Pair   K r (r  req )   K (   eq )   1  cos( n   )    12  6 

2

R
R
R
bonds
angles
torsions
i j 
ij 
ij
 ij

2
2
Homology Modeling



Template Selection and
Fold Assignment
Target – Template
Alignment
Model Building



Loop Modeling
Sidechain Modeling
Model Evaluation
Fold Assignment and Template
Selection


Identify all protein structures with sequences
related to the target, then select templates
3 main classes of comparison methods
Compare the target sequence with each database
sequence independently, pair-wise sequence –
sequence comparison, BLAST and FASTA
 Multiple
sequence comparisons to improve
sensitivity, PSI-BLAST
 Threading or 3-D template matching methods

Target – Template Alignment


Most important step in Homology Modeling
A specialized method should be used for
alignment
Over 40% identity the alignment is likely to be
correct.
 Regions of low local sequence similarity become
common when overall sequence identity is under
40%. (Saqi et al., Protein Eng. 1999)
 The alignment becomes difficult below 30%
sequence identity. (Rost, Protein Eng. 1999)

Model Building


Construct a 3-D model of the target sequence
based on its alignment on template structures
Three different model building approaches
Modeling by rigid body assembly
 Modeling by segment matching
 Modeling by satisfaction of spatial restraints



Accuracies of these models are similar
Template selection and alignment have larger
impact on the model
Screenshots from the Homology
Modeling Server Swiss-Model
• Construct a framework using known
protein structures
• Generate the location of the target
amino acids on the framework
• If loop regions not determined,
additional database search or short
simulations
Swiss-MOD Web Server
Procedure of the MODELLER
program
• After obtaining restraints run a
geometry optimization or realspace optimization to satisfy them
Errors in Homology Models
a. Errors in sidechain packing
b. Distortions and shifts in correctly aligned regions
c. Errors in regions without a template
d. Errors due to misalignment
e. Incorrect templates
Model Building Programs
COMPOSER
P
www-cryst.bioc.cam.ac.uk
CONGEN
P
www.congenomics.com/congen/congen.html
CPH models
S
www.cbs.dtu.dk/services/CPHmodels/
DRAGON
P
www.nimr.mrc.ac.uk/~mathbio/a-aszodi/dragon.html
ICM
P
www.molsoft.com
InsightII
P
www.msi.com
MODELLER
P
guitar.rockefeller.edu/modeller/modeller.html
LOOK
P
www.mag.com
QUANTA
P
www.msi.com
SYBYL
P
www.tripos.com
SCWRL
P
www.cmpharm.ucsf.edu/~bower/scrwl/scrwl.html
SWISS-MOD
S
www.expasy.ch/swissmod
WHAT IF
P
www.sander.embl-heidelberg.de/whatif/
Applications
Critical Assessment of protein
Structure Prediction (CASP)
Venclovas et al. Proteins, 2001
Critical Assessment of protein
Structure Prediction (CASP)
Venclovas et al. Proteins, 2001
Conclusions


Computer Simulations are powerful to show detailed
motions but they cannot cover long enough time spans
to simulate folding for large systems
Homology Modeling techniques can be successful if
the target protein has a known fold



The higher the sequence similarity the more likely the model
will be successful
With the implementation of better techniques the errors in
fold assignment, alignment, and sidechain and loop modeling
are decreasing
Theoretically, if at least one member of every possible fold is
known, it is possible to predict the structure of every coding
sequence to within a certain accuracy