Protein Structure and Structure Prediction
Download
Report
Transcript Protein Structure and Structure Prediction
Protein Structure, Structure
Classification and Prediction
Bioinformatics X3
January 2005
P. Johansson, D. Madsen
Dept.of Cell & Molecular Biology,
Uppsala University
1
Overview
• Introduction to proteins, structure & classification
• Protein Folding
• Experimental techniques for structure determination
• Structure prediction
2
3
Proteins
• Proteins play a crucial role in virtually all biological processes
with a broad range of functions.
• The activity of an enzyme or the function of a protein is
governed by the three-dimensional structure
4
20 amino acids - the building blocks
5
The Amino Acids
6
Hydrophilic or hydrophobic..?
• Virtually all soluble proteins feature a hydrophobic core
surrounded by a hydrophilic surface
• But, peptide backbone is inherently polar ?
• Solution ; neutralize potential H-donors & acceptors using
ordered secondary structure
7
Secondary Structure: a-helix
8
Secondary Structure: a-helix
•
•
•
•
3.6 residues / turn
Axial dipole moment
Not Proline & Glycine
Protein surfaces
9
Secondary Structure: b-sheets
10
Secondary Structure: b-sheets
• Parallel or antiparallel
• Alternating side-chains
• No mixing
• Loops often have polar amino acids
11
Structural classification
• Databases
– SCOP, ’Structural Classification of Proteins’,
manual classification
– CATH, ’Class Architecture Topology Homology’,
on the SSAP algorithm
– FSSP, ’Family of Structurally Similar Proteins’,
on the DALI algorithm
– PClass, ’Protein Classification’
based on the LOCK and 3Dsearch algorithms
based
based
12
Structural classification, CATH
• Class, four types :
–
–
–
–
Mainly a
a / b structures
Mainly b
No secondary structure
• Arhitecture (fold)
• Topology (superfamily)
• Homology (family)
13
Structural classification..
14
Structural classification..
• Two types of algorithms
– Inter-Molecular, 3D, Rigid Body ; structural alignment in a
common coordinate system (hard) e.g. VAST, LOCK.. alg.
– Intra-Molecular, 2D, Internal Geometry ; structural
alignment using internal distances and angles e.g. DALI,
STRUCTURAL, SSAP.. alg.
15
Structural classification, SSAP
• SSAP, ‘Sequential Structure Alignment Program’
Basic idea ; The similarity between residue i in molecule A
and residue k in molecule B is characterised in terms of their
structural surroundings
This similarity can be quantified into a score, Sik
Based on this similarity score and some specified gap penalty,
dynamic programming is used to find the optimal structural
alignment
16
Structural classification, SSAP
The structural neighborhood of residue i in A compared
to residue k in B
i
k
17
Structural classification, SSAP..
Distance between residue i & j in molecule A ; dAi,j
Similarity for two pairs of residues, i j in A & k l in B ;
sij,kl
a
A
,
B
dij d kl b
a,b constants
Similarity between residue i in A and residue k in B ;
n
a
Si ,k A
B
m n d i ,i m d k , k m b
Idea ; Si,k is big if the distances from residue i in A to the 2n
nearest neighbours are similar to the corresponding distances
around k in B
18
Structural classification, SSAP..
This works well for small structures and local structural
alignments - however, insertions and deletions cause problems
unrelated distances
i=5
A : HSERAHVFIM..
B : GQ-VMAC-NW..
k=4
- The real algorithm uses Dynamic programming on two levels,
first to find which distances to compare Sik, then to align the
structures using these scores
19
Experimental techniques for structure
determination
• X-ray Crystallography
• Nuclear Magnetic Resonance
spectroscopy (NMR)
• Electron Microscopy/Diffraction
• Free electron lasers ?
20
X-ray Crystallography
21
X-ray Crystallography..
• From small molecules to viruses
• Information about the positions of
individual atoms
• Limited information about
dynamics
• Requires crystals
22
23
NMR
• Limited to molecules up to ~50kDa
(good quality up to 30 kDa)
• Distances between pairs of
hydrogen atoms
• Lots of information about dynamics
• Requires soluble, non-aggregating
material
• Assignment problem
24
Electron Microscopy/ Diffraction
• Low to medium resolution
• Limited information about
dynamics
• Can use very small crystals
(nm range)
• Can be used for very large
molecules and complexes
25
26
Structure Prediction
?
GPSRYIV…
27
Protein Folding
• Different sequence Different
structure
• Free energy difference small due
to large entropy decrease,
DG = DH - TDS
28
Structure Prediction
Why is structure prediction and especially ab
initio calculations hard..?
• Many degrees of freedom / residue
• Remote noncovalent interactions
• Nature does not go through all conformations
• Folding assisted by enzymes & chaperones
29
Molecular dynamics
Ab initio calculations used
for smaller problems ;
• Calculation of affinity
• Enzymatic pathways
30
Sequence Classification rev.
• Class : Secondary structure content
• Fold : Major structural similarity.
• Superfamily : Probable common
evolutionary origin.
• Family : Clear evolutionary relationship.
31
Structure Prediction
• Search sequence data banks for homologs
• Search methods e.g. BLAST, PSIBLAST,
FASTA…
• Homologue in PDB..?
IVTY…PGGG HYW…QHG
32
Structure Prediction
Multiple sequence / structure alignment
• Contains more information than a single sequence
for applications like homology modeling and
secondary structure prediction
• Gives location of conserved parts
and residues likely to be buried in
the protein core or exposed to solvent
33
Multiple alignment example
HFD fingerprint
34
Secondary Structure Prediction
• Statistical Analysis (old fashioned):
– For each amino acid type assign it’s ‘propensity’
to be in a helix, sheet, or coil.
• Limited accuracy ~55-60%.
• Random prediction ~38%.
MTLLALGINHKTAP...
CCEEEEEECCCCCC...
35
The Chou & Fasman Method
• Each residue is classified as:
–
–
–
–
–
Ha/Hb, strong helix / strand former.
ha/hb, weak helix / strand former.
I, indifferent.
ba/bb, weak helix/strand breaker.
Ba/Bb, strong helix / strand breaker.
36
The Chou & Fasman Method..
• Score each residue:
– Ha/ha=1, Ia=0 or ½, Ba/ba=-1.
– Hb/hb=1, Ib=0 or ½, Bb/bb=-1.
• Helix nucleation:
– Score > 4 in a “window” of 6 residues.
• Strand nucleation:
– Score > 3 in a “window” of 5 residues.
• Propagate until score < 1 in a 4 residue “window”.
37
The Chou & Fasman Method..
GPSRYIVTLANGK
Helix:
No nucl.
Strand
Nucleation
Propagate
Result
-1 -1 0 0 -1 1 1 0 1
1 -1 -1 1
-2 0 1 2 3 3 1
-1 -1 -1 .5 1 1 1 1 1
0
0 -1 -1
-1.5 .5 2.5 4.5 5 4 3 1 -1
-2.5 -.5 1.5 …
3 1 -1
GPSRYIVTLANGK
38
Modern methods
• Neural networks (e.g. the PHD server):
– Input: a number of protein sequences +
secondary structure.
– Output: a trained network that predicts
secondary structure elements with ~70%
accuracy.
• Use many different methods and compare
(e.g. the JPred server)!
39
Summary
• The function of a protein is governed by its structure
• Different sequence Different structure
• PDB, protein data bank
• Secondary structure prediction is hard, tertiary
structure prediction is even harder
• Use homologs whenever possible or different methods
to assess quality
40
41