lecture09_09

Transcript lecture09_09

Structural Bioinformatics
Protein Tertiary
Structure Prediction
The Different levels of Protein Structure
Primary: amino acid linear sequence.
Secondary: -helices, β-sheets and loops.
Tertiary: the 3D shape of the fully folded
polypeptide chain
How can we view the protein
structure ?
• Download the coordinates of the structure from the PDB
http://www.rcsb.org/pdb/
• Launch a 3D viewer program
For example we will use the program Pymol
The program can be downloaded freely from
the Pymol homepage http://pymol.sourceforge.net/
• Upload the coordinates to the viewer
Pymol example
•
•
•
•
•
•
•
•
•
Launch Pymol
Open file “1aqb” (PDB coordinate file)
Display sequence
Hide everything
Show main chain / hide main chain
Show cartoon
Color by ss
Color red
Color green, resi 1:40
Help http://pymol.sourceforge.net/newman/user/toc.html
Predicting 3D Structure
Outstanding difficult problem
Based on sequence homology
– Comparative modeling (homology)
Based on structural homology
– Fold recognition (threading)
Comparative Modeling
Similar sequences suggests similar structure
Sequence and Structure alignments of two Retinol Binding Protein
Structure Alignments
There are many different algorithms for structural Alignment.
The outputs of a structural alignment are a superposition of the
atomic coordinates and a minimal Root Mean Square Distance
(RMSD) between the structures. The RMSD of two aligned
structures indicates their divergence from one another.
Low values of RMSD mean similar structures
Dali (Distance mAtrix aLIgnment)
DALI offers pairwise alignments of protein
structures. The algorithm uses the threedimensional coordinates of each protein to
calculate distance matrices comparing
residues.
See Holm L and Sander C (1993) J. Mol.
Biol. 233:123-138.
SALIGN http://salilab.org/DBALI/?page=tools
Fold classification based on structure-structure
alignment of proteins (FSSP)
FSSP is based on a comprehensive comparison of
PDB proteins (greater than 30 amino acids in length)
using DALI. Representative sets exclude sequence
homologs sharing > 25% amino acid identity.
http://www.ebi.ac.uk/dali/fssp
Page 293
Comparative Modeling
Similar sequence suggests similar structure
Comparative structure prediction
produces an all atom model of a
sequence, based on its alignment to one
or more related protein structures in the
database
Comparative Modeling
• Accuracy of the comparative model is
related to the sequence identity on which it is
based
>50% sequence identity = high accuracy
30%-50% sequence identity= 90% modeled
<30% sequence identity =low accuracy (many errors)
Homology Threshold for Different Alignment Lengths
90
80
70
Homology
Threshold (t)
60
50
40
30
20
10
0
0
20
40
60
80
100
Alignment length (L)
A sequence alignment between two proteins is considered to imply
structural homology if the sequence identity is equal to or above the
homology threshold t in a sequence region of a given length L.
The threshold values t(L) are derived from PDB
Comparative Modeling
• Similarity particularly high in core
– Alpha helices and beta sheets preserved
– Even near-identical sequences vary in loops
Comparative Modeling Methods
MODELLER (Sali –Rockefeller/UCSF)
SCWRL (Dunbrack- UCSF )
SWISS-MODEL
http://swissmodel.expasy.org//SWISS-MODEL.html
Comparative Modeling
Modeling of a sequence based on known structures
Consist of four major steps :
1. Finding a known structure(s) related to the sequence
to be modeled (template), using sequence comparison
methods such as PSI-BLAST
2. Aligning sequence with the templates
3. Building a model
4. Assessing the model
Fold Recognition
Protein Folds
• A combination of secondary structural units
– Forms basic level of classification
• Each protein family belongs to a fold
• Different sequences can share similar folds
Protein Folds: sequential and spatial
arrangement of secondary structures
Hemoglobin
TIM
Protein Folds
• A combination of secondary structural units
– Forms basic level of classification
• Each protein family belongs to a fold
• Different sequences can share similar folds
Similar folds usually mean similar function
Homeodomain
Transcription
factors
Protein Folds
• A combination of secondary structural units
– Forms basic level of classification
• Each protein family belongs to a fold
• Different sequences can share similar folds
The same fold can have multiple functions
Rossmann
12 functions
TIM barrel
31 functions
SCOP Structure Classification Of Proteins
Fold classification:
•Class:
All alpha
All beta
Alpha/beta
Alpha+beta
•Fold
•Superfamily
•Family
Retinol Binding Protein
Fold Recognition
• Methods of protein fold recognition attempt to
detect similarities between protein 3D structure
that have no significant sequence similarity.
• Search for folds that are compatible with a
particular sequence.
• "the turn the protein folding problem on it's head”
rather than predicting how a sequence will fold,
they predict how well a fold will fit a sequence
Basic steps in Fold Recognition :
Compare sequence against a Library of all known Protein Folds (finite number)
Query sequence
MTYGFRIPLNCERWGHKLSTVILKRP...
Goal: find to what folding template the sequence fits best
There are different ways to evaluate sequence-structure fit
There are different ways to evaluate sequence-structure fit
1)
...
56)
...
MAHFPGFGQSLLFGYPVYVFGD...
-10
...
...
n)
...
-123
...
Potential fold
20.5
Programs for fold recognition
•
•
•
•
TOPITS (Rost 1995)
GenTHREADER (Jones 1999)
SAMT02 (UCSC HMM)
3D-PSSM
http://www.sbg.bio.ic.ac.uk/~3dpssm/
Ab Initio Modeling
• Compute molecular structure from laws of
physics and chemistry alone
Theoretically Ideal solution
Practically nearly impossible
WHY ?
– Exceptionally complex calculations
– Biophysics understanding incomplete
Ab Initio Methods
• Rosetta (Bakers lab, Seattle)
• Undertaker (Karplus, UCSC)
CASP - Critical Assessment of
Structure Prediction
• Competition among different groups for resolving
the 3D structure of proteins that are about to be
solved experimentally.
• Current state – ab-initio - the worst, but greatly improved in the last
years.
– Modeling - performs very well when homologous
sequences with known structures exist.
– Fold recognition - performs well.
What’s Next
Predicting function from structure
Structural Genomics : a large scale structure
determination project designed to cover all
representative protein structures
ATP binding domain of protein MJ0577
Zarembinski, et al.,
Proc.Nat.Acad.Sci.USA, 99:15189
(1998)
Currently
~800 unique folds
~300
unique folds
in PDB
~1000- 3000
unique folds
Estimated
in “structure space”
Structure Genomics expectations
~ 5 proteins
to characterize the
~10000-15000 sequence
space
new structures
expected
corresponding to
1 fold
As a result of the Structure Genomic
initiative many structures of proteins
with unknown function will be solved
Wanted !
Automated methods to predict
function from the protein
structures resulting from the
structural genomic project.
Approaches for predicting function from structure
ConSurf - Mapping the evolution conservation on the
protein structure http://consurf.tau.ac.il/
Approaches for predicting function from structure
PHPlus – Identifying positive electrostatic patches on the
protein structure http://pfp.technion.ac.il/
Approaches for predicting function from structure
SHARP2 – Identifying positive electrostatic patches on the
protein structure http://www.bioinformatics.sussex.ac.uk/SHARP2

lecture09_09

Transcript lecture09_09

Directory