lecture10_15_new

Download Report

Transcript lecture10_15_new

Structural Bioinformatics
Proteins
Structure Predictions
Reminder
3.1 Final date to chose a project
10.1 Submission project overview (one page)
-Title
-Main question
-Major Tools you are planning to use to answer the
questions
11.1 /18.1– meetings on projects
9.3 Poster submission
16.3 Poster presentation
2
The first high resolution structure of a protein-myoglobin
Was solved in 1958 by Max Perutz John Kendrew of Cambridge University.
(Won the 1962 and Nobel Prize in Chemistry)
In 22.12.2015 there were 114,402 protein structures in the
protein structure database.
3
The 3D structure of a protein is
stored in a coordinate file
Each atom is represented by
a coordinate in 3D (X, Y, Z)
The coordinate file can be viewed
graphically
RBP
MERFGYTRAANCEAP….
>10,000,000
>100,000
What can we do to bridge the gap??
Predicting the three dimensional structure
from sequence of a protein is very hard
(some times impossible)
However we can predict with relative high
precision the secondary structure
6
What do we mean by
Secondary Structure ?
Secondary structure are the building blocks of
the protein structure:
=
What do we mean by
Secondary Structure ?
Secondary structure is usually divided into
three categories:
Alpha helix
Beta strand (sheet)
Anything else –
turn/loop
8
The different secondary structures are
combined together to form the
Tertiary Structure of the Proteins
9
Tertiary
?
RBP
Globin
Secondary
?
?
10
Secondary Structure Prediction
• Given a primary sequence
ADSGHYRFASGFTYKKMNCTEAA
what secondary structure will it adopt
(alpha helix, beta strand or random coil) ?
11
Secondary Structure Prediction
Methods
• Statistical methods
– Based on amino acid frequencies
– HMM (Hidden Markov Model)
• Machine learning methods
– SVM , Neural networks
12
Statistical Methods for SS prediction
The propensity of an amino
acid to be part of a certain
secondary structure (e.g. –
Proline has a low
propensity of being in an
alpha helix or beta sheet 
breaker)
Chou and Fasman
(1974)
Name
Alanine
Arginine
Aspartic Acid
Asparagine
Cysteine
Glutamic Acid
Glutamine
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
P(a)
142
98
101
67
70
151
111
57
100
108
121
114
145
113
57
77
83
108
69
106
P(b)
83
93
54
89
119
037
110
75
87
160
130
74
105
138
55
75
119
137
147
170
Not very useful
for predictions
P(turn)
66
95
146
156
119
74
98
156
95
47
59
101
60
60
152
143
96
96
114
50
13
What is missing?
14
HMM (Hidden Markov Model)
An approach for predicting
Secondary Structure considering
dependency between the position
• HMM enables us to calculate the
probability of assigning a sequence to a
specific secondary structure
TGTAGPOLKCHIQWML
HHHHHHHLLLLBBBBB
p=?
15
Beginning
with an αhelix
α-helix
followed by
α-helix
The
probability of
observing
Alanine as
part of a βsheet
The probability of observing a residue which belongs to an
α-helix followed by a residue belonging to a turn = 0.15
Table built according to large database of known secondary
structures
16
• Example
What is the probability that the sequence TGQ
will be in a helical structure??
TGQ
HHH
p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8 x 0.0635 =
0.0020995
• What can we learn from secondary structure
predictions??
Mad Cow Disease
PrPc to PrPsc
PRPc
PRPsc
Predicting 3D Structure
based on homology
Comparative Modeling/homology modeling
Similar sequences suggests similar structure
Sequence and Structure alignments of two Retinol Binding Protein
How do we evaluate structure
similarity??
Structure Alignment
Structure Alignments
There are many different algorithms for structural Alignment.
The outputs of a structural alignment are a
superposition of the atomic coordinates and a
minimal Root Mean Square Distance (RMSD)
between the structures.
The RMSD of two aligned structures indicates
their divergence from one another.
Atom N (x, y, z)
Atom N (x, y, z)
Atoms in Protein V
Atoms in Protein W
Low values of RMSD mean similar structures
Different sequences can result in similar structures
1ecd
RMSD<1
2hhd
24
We can learn about the important features
which determine structure and function by
comparing the sequences and structures ?
25
The Globin Family
26
Why is Proline 36 conserved in all the globin family ?
27
Where are the gaps??
The gaps in the pairwise alignment are mapped to the loop regions
28
How are remote homologs related in terms of their structure?
RBD
retinol-binding
protein
apolipoprotein D
b-lactoglobulin
odorant-binding
protein
29
PSI-BLAST alignment of RBP and b-lactoglobulin: iteration 3
Score = 159 bits (404), Expect = 1e-38
Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%)
Query: 3
Sbjct: 1
Query: 55
Sbjct: 60
WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54
V L+ LA A
+ S V+ENFD ++ G WY + K
MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59
DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114
+ I A +S+ E G +
K
V +
++ +PAK +++++ +
NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112
Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164
+WI+ TDY+ YA+ YSC
+ ++ R+P LPPE
Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159
30
The Retinol Binding Protein
b-lactoglobulin
31
Taken together
MERFGYTRAANCEAP….
FUNCTION
32
Comparative Modeling
Similar sequence suggests similar structure
Builds a protein structure model based on
its alignment (sequence) to one or more
related protein structures in the database
Comparative Modeling
General algorithm
Modeling of a sequence based on known structures
Consist of four major steps :
1. Finding a known structure(s) related to the sequence
to be modeled (template), using sequence comparison
methods such as PSI-BLAST
2. Aligning sequence with the templates
3. Building a model
4. Assessing the model
Comparative Modeling
• Accuracy of the comparative model is
usually related to the sequence identity on
which it is based
>50% sequence identity = high accuracy
30%-50% sequence identity= 90% can be modeled
<30% sequence identity =low accuracy (many errors)
However other parameters (such as identify length)
can influence the results
What is a good model?
ModBase- for homology modelling
https://modbase.compbio.ucsf.edu/
What is a good model?
What is a good model?
Extra Slides (for your interest)
39
Alpha Helix: Pauling (1951)
• A consecutive stretch of 5-40 amino
acids (average 10).
• A right-handed spiral conformation.
3.6
• 3.6 amino acids per turn.
residues
5.6 Å
• Stabilized by Hydrogen bonds
40
Beta Strand: Pauling and Corey (1951)
β -strand
> An extended polypeptide chains
is called β –strand
(consists of 5-10 amino acids
> The chains are connected together
by Hydrogen bonds to form b-sheet
β -sheet
41
Loops
• Connect the secondary
structure elements
(alpha helix and beta
strands).
• Have various length
and shapes.
42