Proteins Secondary Structure Predictions

Download Report

Transcript Proteins Secondary Structure Predictions

Proteins
Secondary Structure Predictions
2
Specific databases of protein sequences
and structures
 Swissprot
 PIR
 TREMBL (translated from DNA)
 PDB (Three Dimensional Structures)
3
Protein Structure
Primary
Amino acid
sequence
Secondary
Alpha helices &
Beta sheets,
loops.
Tertiary
Packing of
secondary
elements.
Quaternary
Packing of several
polypeptide chains
4
Symbols for the 20 amino acids
A ala alanine
C cys cysteine
D asp aspartic acid
E glu glutamic acid
F phe phenylalanine
G gly glycine
H his histidine
I ile isoleucine
K lys lysine
L leu leucine
M met
N asn
P pro
Q gln
R arg
S ser
T thr
V val
W trp
Y tyr
methionine
aspargine
proline
glutamine
arginine
serine
threonine
valine
tryptophane
tyrosine
5
The 20 Amino Acids
Grouping amino acids to physio-chemical properties
7
Myoglobin – the first high resolution protein structure
Solved in 1958 by Max Perutz John Kendrew of Cambridge University.
Won the 1962 and Nobel Prize in Chemistry.
“ Perhaps the most remarkable features of the molecule are its
complexity and its lack of symmetry. The arrangement seems to
be almost totally lacking in the kind of regularities which one
instinctively anticipates.”
8
Alpha Helices
• Right-handed spiral
– 5 to 40 amino acids (10 average)
– 3.6 amino acids per turn
– Some a.a. are more frequent than others in helices.
9
Beta Sheets
N N
• Parallel – Strands run in the same direction (C to N)
N C
• Anti-parallel- Strands run in opposite directions
– Each strand has 5-10 amino acids (6 average)
– Some a.a. are more frequent than others
C
C
C N
10
Loop Regions
• All other protein regions
– Irregular shape and size
– Connect the secondary structure elements
11
Structure Presentation
Ribbon diagram:
Alpha helix
Beta Sheet
12
Structure Presentation
TOPS cartoon:
• beta sheets are
triangles
• alpha helices are
circles.
• the peptide chain runs
from N terminus to C
terminus.
13
Structure Prediction: Motivation
• Hundreds of thousands of gene sequences
translated to proteins (genbanbk, SW, PIR)
• Only about 28000 solved structures (PDB)
• Goal: Predict protein structure based
on sequence information
14
Structure Prediction: Motivation
• Understand protein function
– Locate binding sites
• Broaden homology
– Detect similar function where sequence differs
• Explain disease
– See effect of amino acid changes
– Design suitable compensatory drugs
15
Prediction Approaches
• Primary (sequence) to secondary structure
– Sequence characteristics
• Secondary to tertiary structure
– Fold recognition
– Threading against known structures
• Primary to tertiary structure
– Ab initio modelling
16
Can we predict the secondary structure from sequence ?
a-helix
b-sheet
nonpolar
polar
polar
polar
Non-polar
Secondary structures have an amphiphilic nature :
one face polar and the other non polar
17
Secondary Structure Prediction
• Why is it complex?
• A huge space of possible structures
– Assume a 100 aa chain
– only 2 possible conformations for each residue
– 2100~1030 different conformations for the chain as a
whole.
• Infer secondary structure from sequence is
problematic:
– Similar sequences may result in different structures
(mutations, different environments).
– Different sequences may result in similar structures (the
Globin fold).
18
Secondary Structure Prediction
Methods
• Chou-Fasman / GOR Method
– Based on amino acid frequencies
– No more than 60% accurate
• Artificial Neural Network (ANN) methods
– PHDsec and PSIpred
• Use multiple sequences
– Secondary structure based on family
• Best accuracy now ~78%
19
PHDsec and PSIpred
• PHDsec
– Rost & Sander, 1993
– Based on sequence family alignments
• PSIpred
– Jones, 1999
– Based on Position Specific Scoring Matrix
Generated by PSI-BLAST
• Both consider long-range interactions
20
Brain Neurons
• Outgoing signal determined by incoming
• Connected together in networks
• Learns from experience
21
SS prediction using ANN
A
C
D
E
F
G
H
I
K
L
M
N
P
Q
R
S
T
V
W
Y
.
Amino
acid at
position
Inputs for
one position
22
Position-Specific Scoring Matrix
23
PHDsec Neural Net
A
C
D
E
F
G
H
I
K
L
M
N
P
Q
R
S
T
V
W
Y
.
Amino
acid at
position
Inputs for
one position
Outputs
H= helix
E= strand
C= Coil
Confidence
0=low,9=high
Hidden
layer
24
Secondary structure prediction
•
•
•
•
•
•
•
•
•
•
•
•
•
•
AGADIR - An algorithm to predict the helical content of peptides
APSSP - Advanced Protein Secondary Structure Prediction Server
GOR - Garnier et al, 1996
HNN - Hierarchical Neural Network method (Guermeur, 1997)
Jpred - A consensus method for protein secondary structure prediction at
University of Dundee
JUFO - Protein secondary structure prediction from sequence (neural
network)
nnPredict - University of California at San Francisco (UCSF)
PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader,
MaxHom, EvalSec from Columbia University
Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction
PSA - BioMolecular Engineering Research Center (BMERC) / Boston
PSIpred - Various protein structure prediction methods at Brunel University
SOPMA - Geourjon and Del‫י‬age, 1995
SSpro - Secondary structure prediction using bidirectional recurrent neural
networks at University of California
DLP - Domain linker prediction at RIKEN
25