lecture09_14Class

Download Report

Transcript lecture09_14Class

Structural Bioinformatics
Proteins Secondary
Structure Predictions
The first high resolution structure of a protein-myoglobin
Was solved in 1958 by Max Perutz John Kendrew of Cambridge University.
(Won the 1962 and Nobel Prize in Chemistry)
In 12.12.2013 there were 89,110 protein structures in the protein
structure database.
Great increase but still a magnitude lower then the total number
of protein sequence databases (close to 1,000,000)
2
What can we do to bridge the gap??
MERFGYTRAANCEAP….
Predicting the three dimensional structure
from sequence of a protein is very hard
(some times impossible)
However we can predict with relative high
precision the secondary structure
3
What do we mean by
Secondary Structure ?
Secondary structure are the building blocks of
the protein structure:
=
What do we mean by
Secondary Structure ?
Secondary structure is usually divided into
three categories:
Alpha helix
Beta strand (sheet)
Anything else –
turn/loop
5
The different secondary structures are
combined together to form the
Tertiary Structure of the Proteins
6
Tertiary
?
RBP
Globin
Secondary
?
?
7
Secondary Structure Prediction
• Given a primary sequence
ADSGHYRFASGFTYKKMNCTEAA
what secondary structure will it adopt
(alpha helix, beta strand or random coil) ?
8
Secondary Structure Prediction
Methods
• Statistical methods
– Based on amino acid frequencies
– HMM (Hidden Markov Model)
• Machine learning methods
– SVM , Neural networks
9
Statistical Methods for SS prediction
The propensity of an amino
acid to be part of a certain
secondary structure (e.g. –
Proline has a low
propensity of being in an
alpha helix or beta sheet 
breaker)
Chou and Fasman
(1974)
Name
Alanine
Arginine
Aspartic Acid
Asparagine
Cysteine
Glutamic Acid
Glutamine
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
P(a)
142
98
101
67
70
151
111
57
100
108
121
114
145
113
57
77
83
108
69
106
P(b)
83
93
54
89
119
037
110
75
87
160
130
74
105
138
55
75
119
137
147
170
P(turn)
66
95
146
156
119
74
98
156
95
47
59
101
60
60
152
143
96
96
114
50
Success rate of 50%
10
Secondary Structure Method
Improvements
‘Sliding window’ approach
• Most alpha helices are ~12 residues long
Most beta strands are ~6 residues long
 Look at all windows of size 6/12
 Calculate a score for each window. If >threshold
 predict this is an alpha helix/beta sheet
TGTAGPQLKCHIQWMLPLKK
11
Improvements since 1980’s
• Adding information from conservation in
MSA
• Smarter algorithms (e.g. Machine learning,
HMM).
12
HMM (Hidden Markov Model)
approach for predicting
Secondary Structure
• HMM enables us to calculate the
probability of assigning a sequence to a
secondary structure
TGTAGPOLKCHIQWML
HHHHHHHLLLLBBBBB
p=?
13
Beginning
with an αhelix
α-helix
followed by
α-helix
The
probability of
observing
Alanine as
part of a βsheet
The probability of observing a residue which belongs to an
α-helix followed by a residue belonging to a turn = 0.15
Table built according to large database of known secondary
structures
14
• Example
What is the probability that the sequence TGQ
will be in a helical structure??
TGQ
HHH
p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8 x 0.0635 =
0.0020995
Success of HMM based methods-> 75%-80%
15
• What can we learn from secondary structure
predictions??
Mad Cow Disease
PrPc to PrPsc
PRPc
PRPsc
How do the protein structure relate
to the primary protein sequence??
18
SEQUENCE
-Early experiments have shown that the
sequence of the protein is sufficient to
determine its structure (Anfisen)
- Protein structure is more conserved than
protein sequence and more closely related
to function.
19
How (CAN) Different Amino Acid
Sequence Determine Similar Protein
Structure ??
Lesk and Chothia 1980
20
The Globin Family
21
Different sequences can result in similar structures
1ecd
2hhd
22
We can learn about the important features
which determine structure and function by
comparing the sequences and structures ?
23
The Globin Family
24
Why is Proline 36 conserved in all the globin family ?
25
Where are the gaps??
The gaps in the pairwise alignment are mapped to the loop regions
26
How are remote homologs related in terms of their structure?
RBD
retinol-binding
protein
apolipoprotein D
b-lactoglobulin
odorant-binding
protein
27
PSI-BLAST alignment of RBP and b-lactoglobulin: iteration 3
Score = 159 bits (404), Expect = 1e-38
Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%)
Query: 3
Sbjct: 1
Query: 55
Sbjct: 60
WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54
V L+ LA A
+ S V+ENFD ++ G WY + K
MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59
DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114
+ I A +S+ E G +
K
V +
++ +PAK +++++ +
NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112
Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164
+WI+ TDY+ YA+ YSC
+ ++ R+P LPPE
Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159
28
The Retinol Binding Protein
b-lactoglobulin
29
Taken together
MERFGYTRAANCEAP….
FUNCTION
30
Pfam
Database that contains a large collection of
multiple sequence alignments of protein families
(common structures)
Very useful for function prediction.
http://pfam.sanger.ac.uk/
The zinc-finger family (domain)
Known family of Transcription Factors
Protein sequence
ZINC FINGER DOMAIN
Pfam
Based on
Profile hidden Markov Models (HMMs) which
represents the protein family
HMM in comparison to PSSM is a model
which considers dependencies between the
different columns in the matrix (different
residues) and is thus much more powerful!!!!
http://pfam.sanger.ac.uk/
Profile HMM (Hidden Markov Model)
can accurately represent a MSA
D16
D17
delete
Match
insert
D 0.8
S 0.2
I16
X
M17
50%
16 17 18 19
100%
50%
M16
D19
100% D18
P 0.4
R 0.6
M18
100%
T 1.0
M19
100%
R 0.4
S 0.6
I17
I18
I19
X
X
X
DRTR
DRTS
S - - S
SP TR
DR TR
DP TS
D - - S
D - - S
D - - S
D - - R
Extra Slides (for your interest)
35
Alpha Helix: Pauling (1951)
• A consecutive stretch of 5-40 amino
acids (average 10).
• A right-handed spiral conformation.
3.6
• 3.6 amino acids per turn.
residues
5.6 Å
• Stabilized by Hydrogen bonds
36
Beta Strand: Pauling and Corey (1951)
β -strand
> An extended polypeptide chains
is called β –strand
(consists of 5-10 amino acids
> The chains are connected together
by Hydrogen bonds to form b-sheet
β -sheet
37
Loops
• Connect the secondary
structure elements
(alpha helix and beta
strands).
• Have various length
and shapes.
38