lecture08_11

Download Report

Transcript lecture08_11

Structural Bioinformatics
Proteins Secondary
Structure Predictions
Structure Prediction Motivation
• Better understand protein function
• Broaden homology
– Detect similar function where sequence differs
(only ~50% remote homologies can be detected
based on sequence)
• Explain disease
– Explain the effect of mutations
– Design drugs
2
Myoglobin – the first high resolution protein structure
Solved in 1958 by Max Perutz John Kendrew of Cambridge University.
Won the 1962 and Nobel Prize in Chemistry.
“ Perhaps the most remarkable features of the molecule are its
complexity and its lack of symmetry. The arrangement seems to
be almost totally lacking in the kind of regularities which one
instinctively anticipates.”
3
MERFGYTRAANCEAP….
Predicting the three dimensional structure
from sequence of a protein is very hard
(some times impossible)
However we can predict with relative high
precision the secondary structure
4
What do we mean by
Secondary Structure ?
Secondary structure are the building blocks of
the protein structure:
=
What do we mean by
Secondary Structure ?
Secondary structure is usually divided into
three categories:
Alpha helix
Beta strand (sheet)
Anything else –
turn/loop
6
Alpha Helix: Pauling (1951)
• A consecutive stretch of 5-40 amino
acids (average 10).
• A right-handed spiral conformation.
• 3.6 amino acids per turn.
3.6
residues
5.6 Å
• Stabilized by H-bonds
7
Beta Strand: Pauling and Corey (1951)
•
Different polypeptide chains run alongside each
other and are linked together by hydrogen bonds.
• Each section is called β -strand,
and consists of 5-10 amino acids.
β -strand
8
3.47Å
Beta Sheet
4.6Å
The strands become
adjacent to each other,
forming beta-sheet.
3.25Å
Antiparallel
Parallel
4.6Å
9
Loops
• Connect the secondary
structure elements.
• Have various length and
shapes.
• Located at the surface of
the folded protein and
therefore may have
important role in
biological recognition
processes.
10
Three dimensional
Tertiary Structure
Describes the packing of alpha-helices, beta-sheets
and random coils with respect to each other on the
level of one whole polypeptide chain
11
Tertiary
Secondary
RBP
Globin
12
How do the (secondary and tertiary)
structures relate to the primary
protein sequence??
13
SEQUENCE
STRUCTURE
-Early experiments have shown that the
sequence of the protein is sufficient to
determine its structure (Anfisen)
- Protein structure is more conserved than
protein sequence and more closely related
to function.
14
How (CAN) Different Amino Acid
Sequence Determine Similar Protein
Structure ??
Lesk and Chothia 1980
15
The Globin Family
16
Different sequences can result in similar structures
1ecd
2hhd
17
We can learn about the important features
which determine structure and function by
comparing the sequences and structures ?
18
The Globin Family
19
Why is Proline 36 conserved in all the globin family ?
20
Where are the gaps??
The gaps in the pairwise alignment are mapped to the loop regions
21
How are remote homologs related in terms of their structure?
RBD
retinol-binding
protein
apolipoprotein D
b-lactoglobulin
odorant-binding
protein
22
PSI-BLAST alignment of RBP and b-lactoglobulin: iteration 3
Score = 159 bits (404), Expect = 1e-38
Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%)
Query: 3
Sbjct: 1
Query: 55
Sbjct: 60
WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54
V L+ LA A
+ S V+ENFD ++ G WY + K
MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59
DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114
+ I A +S+ E G +
K
V +
++ +PAK +++++ +
NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112
Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164
+WI+ TDY+ YA+ YSC
+ ++ R+P LPPE
Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159
23
The Retinol Binding Protein
b-lactoglobulin
24
Structure Prediction: Motivation
• Hundreds of thousands of gene sequences
translated to proteins (genbanbk, SW, PIR)
• Only about ~50000 solved protein structures
• Experimental methods are time consuming and
not always possible
• Goal: Predict protein structure based
on sequence information
Prediction Approaches
• Tow stage
1. Primary (sequence) to secondary structure
2. Secondary to tertiary
• One stage
- Primary to tertiary structure
26
According to the most simplified model:
• In a first step, the secondary structure is
predicted based on the sequence.
• The secondary structure elements are then
arranged to produce the tertiary structure,
i.e. the structure of a protein chain.
• For molecules which are composed of
different subunits, the protein chains are
arranged to form the quaternary structure.
27
Secondary Structure Prediction
• Given a primary sequence
ADSGHYRFASGFTYKKMNCTEAA
what secondary structure will it adopt ?
28
Secondary Structure Prediction
Methods
• Chou-Fasman / GOR Method
– Based on amino acid frequencies
• Machine learning methods
– PHDsec and PSIpred
• HMM (Hidden Markov Model)
29
Chou and Fasman (1974)
The propensity of an amino
acid to be part of a certain
secondary structure (e.g. –
Proline has a low
propensity of being in an
alpha helix or beta sheet 
breaker)
Name
Alanine
Arginine
Aspartic Acid
Asparagine
Cysteine
Glutamic Acid
Glutamine
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
P(a)
142
98
101
67
70
151
111
57
100
108
121
114
145
113
57
77
83
108
69
106
P(b)
83
93
54
89
119
037
110
75
87
160
130
74
105
138
55
75
119
137
147
170
P(turn)
66
95
146
156
119
74
98
156
95
47
59
101
60
60
152
143
96
96
114
50
Success rate of 50%
30
Secondary Structure Method
Improvements
‘Sliding window’ approach
• Most alpha helices are ~12 residues long
Most beta strands are ~6 residues long
 Look at all windows of size 6/12
 Calculate a score for each window. If >threshold
 predict this is an alpha helix/beta sheet
TGTAGPOLKCHIQWMLPLKK
31
Improvements since 1980’s
• Adding information from conservation in
MSA
• Smarter algorithms (e.g. Machine learning,
HMM).
Success -> 75%-80%
32
Machine learning approach for predicting
Secondary Structure (PHD, PSIpred)
Query
SwissProt
Query
Subject
Subject
Subject
Subject
Step 1:
Generating a multiple sequence
alignment
33
Query
Step 2:
Additional sequences are added using a
profile. We end up with a MSA which
represents the protein family.
seed
MSA
Query
Subject
Subject
Subject
Subject
34
Query
Step 3:
The sequence profile of the protein family
is compared (by machine learning
methods) to sequences with known
secondary structure.
seed
MSA
Query
Subject
Subject
Subject
Subject
Machine
Learning
Approach
Known
structures
35
HMM approach for predicting
Secondary Structure (SAM)
• HMM enables us to calculate the
probability of assigning a sequence to a
secondary structure
TGTAGPOLKCHIQWML
HHHHHHHLLLLBBBBB
p=?
36
Beginning
with an αhelix
α-helix
followed by
α-helix
The
probability of
observing
Alanine as
part of a βsheet
The probability of observing a residue which belongs to an
α-helix followed by a residue belonging to a turn = 0.15
Table built according to large database of known secondary
structures
37
• The above table enables us to calculate the
probability of assigning secondary structure
to a protein
• Example
TGQ
HHH
p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8 x 0.0635 =
0.0020995
38
Secondary structure prediction
•
•
•
•
•
•
•
•
•
•
•
•
•
•
AGADIR - An algorithm to predict the helical content of peptides
APSSP - Advanced Protein Secondary Structure Prediction Server
GOR - Garnier et al, 1996
HNN - Hierarchical Neural Network method (Guermeur, 1997)
Jpred - A consensus method for protein secondary structure prediction at University
of Dundee
JUFO - Protein secondary structure prediction from sequence (neural network)
nnPredict - University of California at San Francisco (UCSF)
PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom,
EvalSec from Columbia University
Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction
PSA - BioMolecular Engineering Research Center (BMERC) / Boston
PSIpred - Various protein structure prediction methods at Brunel University
SOPMA - Geourjon and Del‫י‬age, 1995
SSpro - Secondary structure prediction using bidirectional recurrent neural networks
at University of California
DLP - Domain linker prediction at RIKEN
39