lecture10_12
Download
Report
Transcript lecture10_12
Structural Bioinformatics
Protein Tertiary
Structure Prediction
The Different levels of Protein Structure
Primary: amino acid linear sequence.
Secondary: -helices, β-sheets and loops.
Tertiary: the 3D shape of the fully folded
polypeptide chain
Predicting 3D Structure
Outstanding difficult problem
Based on sequence homology
– Comparative modeling (homology)
Based on structural homology
– Fold recognition (threading)
Comparative Modeling
Similar sequences suggests similar structure
Sequence and Structure alignments of two Retinol Binding Protein
Structure Alignments
There are many different algorithms for structural Alignment.
The outputs of a structural alignment are a superposition of the
atomic coordinates and a minimal Root Mean Square Distance
(RMSD) between the structures. The RMSD of two aligned
structures indicates their divergence from one another.
Low values of RMSD mean similar structures
Comparative Modeling
Similar sequence suggests similar structure
Builds a protein structure model based on
its alignment to one or more related
protein structures in the database
Comparative Modeling
• Accuracy of the comparative model is
related to the sequence identity on which it is
based
>50% sequence identity = high accuracy
30%-50% sequence identity= 90% modeled
<30% sequence identity =low accuracy (many errors)
Homology Threshold for Different Alignment Lengths
90
80
70
Homology
Threshold (t)
60
50
40
30
20
10
0
0
20
40
60
80
100
Alignment length (L)
A sequence alignment between two proteins is considered to imply
structural homology if the sequence identity is equal to or above the
homology threshold t in a sequence region of a given length L.
The threshold values t(L) are derived from PDB
Comparative Modeling
• Similarity particularly high in core
– Alpha helices and beta sheets preserved
– Even near-identical sequences vary in loops
Comparative Modeling Methods
MODELLER (Sali –Rockefeller/UCSF)
SCWRL (Dunbrack- UCSF )
SWISS-MODEL
http://swissmodel.expasy.org//SWISS-MODEL.html
Comparative Modeling
Modeling of a sequence based on known structures
Consist of four major steps :
1. Finding a known structure(s) related to the sequence
to be modeled (template), using sequence comparison
methods such as PSI-BLAST
2. Aligning sequence with the templates
3. Building a model
4. Assessing the model
Fold Recognition
Protein Folds: sequential and spatial
arrangement of secondary structures
Hemoglobin
TIM
Similar folds usually mean similar function
Homeodomain
Transcription
factors
The same fold can have multiple functions
Rossmann
12 functions
TIM barrel
31 functions
Fold Recognition
• Methods of protein fold recognition attempt to
detect similarities between protein 3D structure
that have no significant sequence similarity.
• Search for folds that are compatible with a
particular sequence.
• "the turn the protein folding problem on it's head”
rather than predicting how a sequence will fold,
they predict how well a fold will fit a sequence
Basic steps in Fold Recognition :
Compare sequence against a Library of all known Protein Folds (finite number)
Query sequence
MTYGFRIPLNCERWGHKLSTVILKRP...
Goal: find to what folding template the sequence fits best
There are different ways to evaluate sequence-structure fit
There are different ways to evaluate sequence-structure fit
1)
...
56)
...
MAHFPGFGQSLLFGYPVYVFGD...
-10
...
...
n)
...
-123
...
Potential fold
20.5
Programs for fold recognition
•
•
•
•
TOPITS (Rost 1995)
GenTHREADER (Jones 1999)
SAMT02 (UCSC HMM)
3D-PSSM
http://www.sbg.bio.ic.ac.uk/~3dpssm/
Ab Initio Modeling
• Compute molecular structure from laws of
physics and chemistry alone
Theoretically Ideal solution
Practically nearly impossible
WHY ?
– Exceptionally complex calculations
– Biophysics understanding incomplete
Ab Initio Methods
• Rosetta (Bakers lab, Seattle)
• Undertaker (Karplus, UCSC)
CASP - Critical Assessment of
Structure Prediction
• Competition among different groups for resolving
the 3D structure of proteins that are about to be
solved experimentally.
• Current state – ab-initio - the worst, but greatly improved in the last
years.
– Modeling - performs very well when homologous
sequences with known structures exist.
– Fold recognition - performs well.
What can you do?
FOLDIT
Solve Puzzles for Science
A computer game to fold proteins
http://fold.it/portal/puzzles
What’s Next
Predicting function from structure
Structural Genomics : a large scale structure
determination project designed to cover all
representative protein structures
ATP binding domain of protein MJ0577
Zarembinski, et al.,
Proc.Nat.Acad.Sci.USA, 99:15189
(1998)
As a result of the Structure Genomic
initiative many structures of proteins
with unknown function will be solved
Wanted !
Automated methods to predict
function from the protein
structures resulting from the
structural genomic project.
Approaches for predicting function from structure
ConSurf - Mapping the evolution conservation on the
protein structure http://consurf.tau.ac.il/
Approaches for predicting function from structure
PFPlus – Identifying positive electrostatic patches on the
protein structure http://pfp.technion.ac.il/
A method to distinguish DNA from RNA-binding
proteins
DNA binding interface
RNA binding interface
RNA and DNA binding interfaces tend to
have different geometric features
DNA binding interface
RNA binding interface
Applying Differential Geometry to
characterize DNA and RNA binding proteins
K1 - MINIMAL CURVATURE
K2- MAXIMAL CURVATURE
H=(k1+k2)/2
Mean Curvature
K=k1*k2
Gaussian Curvature
Applying Differential Geometry to
characterize DNA and RNA proteins
Peak
Flat
Pit
Minimal
Surface
Ridge
Saddle
ridge
Valley
Saddle
valley
Frequency of points
Applying Differential Geometry for
DNA and RNA function prediction
RNA binding surfaces are distinguished
from DNA binding surfaces based on
Differential Geometric features
76% RNA-binding
78% DNA binding
Frequency of points
Differential Geometry can correctly determine
whether a given binding domain binds
RNA or DNA
RNA pattern
DNA pattern
Shazman et al, NAR 2011
How can we view the protein
structure ?
• Download the coordinates of the structure from the PDB
http://www.rcsb.org/pdb/
• Launch a 3D viewer program
For example we will use the program Pymol
The program can be downloaded freely from
the Pymol homepage http://pymol.org
• Upload the coordinates to the viewer
Pymol example
•
•
•
•
•
•
•
•
•
Launch Pymol
Open file “1aqb” (PDB coordinate file)
Display sequence
Hide everything
Show main chain / hide main chain
Show cartoon
Color by ss
Color red
Color green, resi 1:40
Help : http://pymol.org