lecture10_13

Download Report

Transcript lecture10_13

Structural Bioinformatics
Protein Tertiary
Structure Prediction
The Different levels of Protein Structure
Primary: amino acid linear sequence.
Secondary: -helices, β-sheets and loops.
Tertiary: the 3D shape of the fully folded
polypeptide chain
The 3D structure of a protein is
stored in a coordinate file
Each atom is represented by
a coordinate in 3D (X, Y, Z)
The coordinate file can be viewed
graphically
RBP
Description is given in slides 35-36
Predicting 3D Structure
Outstanding difficult problem
Based on sequence homology
– Comparative modeling (homology)
Based on structural homology
– Fold recognition (threading)
Comparative Modeling
Similar sequences suggests similar structure
Sequence and Structure alignments of two Retinol Binding Protein
Structure Alignments
There are many different algorithms for structural Alignment.
The outputs of a structural alignment are a superposition of the
atomic coordinates and a minimal Root Mean Square Distance
(RMSD) between the structures. The RMSD of two aligned
structures indicates their divergence from one another.
Low values of RMSD mean similar structures
Comparative Modeling
Similar sequence suggests similar structure
Builds a protein structure model based on
its alignment (sequence) to one or more
related protein structures in the database
Comparative Modeling
• Accuracy of the comparative model is
usually related to the sequence identity on
which it is based
>50% sequence identity = high accuracy
30%-50% sequence identity= 90% can be modeled
<30% sequence identity =low accuracy (many errors)
However other parameters (such as identify length)
can influence the results
Comparative Modeling
Modeling of a sequence based on known structures
Consist of four major steps :
1. Finding a known structure(s) related to the sequence
to be modeled (template), using sequence comparison
methods such as PSI-BLAST
2. Aligning sequence with the templates
3. Building a model
4. Assessing the model
What is a good model?
What is a good model?
What is a good model?
Fold Recognition
Protein Folds: sequential and spatial
arrangement of secondary structures
Globin
TIM
Similar folds usually mean similar function
Homeodomain
Transcription
factors
The same fold can have multiple functions
Rossmann
12 different
functions
TIM barrel
31 different
functions
Fold Recognition
• Fold recognition attempt to detect similarities
between protein 3D structure that have no
significant sequence similarity.
• Search for folds that are compatible with a
particular sequence.
• "the turn the protein folding problem on it's head”
rather than predicting how a sequence will fold,
they predict how well a fold will fit a sequence
Basic steps in Fold Recognition :
Compare sequence against a Library of all known Protein Folds (finite number)
Query sequence
MTYGFRIPLNCERWGHKLSTVILKRP...
Goal: find to what folding template the sequence fits best
There are different ways to evaluate sequence-structure fit
There are different ways to evaluate sequence-structure fit
1)
...
56)
...
MAHFPGFGQSLLFGYPVYVFGD...
-10
...
...
n)
...
-123
...
Potential fold
20.5
Ab Initio Modeling
• Compute molecular structure from laws of
physics and chemistry alone
Theoretically Ideal solution
Practically nearly impossible
WHY ?
– Exceptionally complex calculations
– Biophysics understanding incomplete
How do we know what is a good prediction ???
CASP - Critical Assessment of Structure Prediction
• Competition among different groups for resolving
the 3D structure of proteins that are about to be
solved experimentally.
• Current state – ab-initio - the worst, but greatly improved in the last
years.
– Modeling - performs very well when homologous
sequences with known structures exist.
– Fold recognition - performs well.
What can you do?
FOLDIT
Solve Puzzles for Science
A computer game to fold proteins
http://fold.it/portal/puzzles
What’s Next
Predicting function from structure
Structural Genomics : a large scale structure
determination project designed to cover all
representative protein structures
ATP binding domain of protein MJ0577
Zarembinski, et al.,
Proc.Nat.Acad.Sci.USA, 99:15189
(1998)
As a result of the Structure Genomic
initiative many structures of proteins
with unknown function are solved
Wanted !
Automated methods to predict
function from the protein
structures resulting from the
structural genomic project.
An “out of the box” approach for predicting
function from structure
DNA binding interface
RNA binding interface
RNA and DNA binding interfaces tend to
have different geometric features
DNA binding interface
RNA binding interface
Applying Differential Geometry to
characterize DNA and RNA binding proteins
K1 - MINIMAL CURVATURE
K2- MAXIMAL CURVATURE
H=(k1+k2)/2
Mean Curvature
K=k1*k2
Gaussian Curvature
Applying Differential Geometry to
characterize DNA and RNA proteins
Peak
Flat
Pit
Minimal
Surface
Ridge
Saddle
ridge
Valley
Saddle
valley
Frequency of points
Applying Differential Geometry for
DNA and RNA function prediction
RNA binding surfaces are distinguished
from DNA binding surfaces based on
Differential Geometric features
76% RNA-binding
78% DNA binding
Frequency of points
Differential Geometry can correctly determine
whether a given binding domain binds
RNA or DNA
RNA pattern
DNA pattern
Shazman et al, NAR 2011
How can we view the protein
structure ?
• Download the coordinates of the structure from the PDB
http://www.rcsb.org/pdb/
• Launch a 3D viewer program
For example we will use the program Pymol
The program can be downloaded freely from
the Pymol homepage http://pymol.org
• Upload the coordinates to the viewer
Pymol example
•
•
•
•
•
•
•
•
•
Launch Pymol
Open file “1aqb” (PDB coordinate file)
Display sequence
Hide everything
Show main chain / hide main chain
Show cartoon
Color by ss
Color red
Color green, resi 1:40
Help : http://pymol.org