Protein Structure and Structure Prediction

Download Report

Transcript Protein Structure and Structure Prediction

Protein Structure, Structure
Classification and Prediction
Bioinformatics X3
January 2005
P. Johansson, D. Madsen
Dept.of Cell & Molecular Biology,
Uppsala University
1
Overview
• Introduction to proteins, structure & classification
• Protein Folding
• Experimental techniques for structure determination
• Structure prediction
2
3
Proteins
• Proteins play a crucial role in virtually all biological processes
with a broad range of functions.
• The activity of an enzyme or the function of a protein is
governed by the three-dimensional structure
4
20 amino acids - the building blocks
5
The Amino Acids
6
Hydrophilic or hydrophobic..?
• Virtually all soluble proteins feature a hydrophobic core
surrounded by a hydrophilic surface
• But, peptide backbone is inherently polar ?
• Solution ; neutralize potential H-donors & acceptors using
ordered secondary structure
7
Secondary Structure: a-helix
8
Secondary Structure: a-helix
•
•
•
•
3.6 residues / turn
Axial dipole moment
Not Proline & Glycine
Protein surfaces
9
Secondary Structure: b-sheets
10
Secondary Structure: b-sheets
• Parallel or antiparallel
• Alternating side-chains
• No mixing
• Loops often have polar amino acids
11
Structural classification
• Databases
– SCOP, ’Structural Classification of Proteins’,
manual classification
– CATH, ’Class Architecture Topology Homology’,
on the SSAP algorithm
– FSSP, ’Family of Structurally Similar Proteins’,
on the DALI algorithm
– PClass, ’Protein Classification’
based on the LOCK and 3Dsearch algorithms
based
based
12
Structural classification, CATH
• Class, four types :
–
–
–
–
Mainly a
a / b structures
Mainly b
No secondary structure
• Arhitecture (fold)
• Topology (superfamily)
• Homology (family)
13
Structural classification..
14
Structural classification..
• Two types of algorithms
– Inter-Molecular, 3D, Rigid Body ; structural alignment in a
common coordinate system (hard) e.g. VAST, LOCK.. alg.
– Intra-Molecular, 2D, Internal Geometry ; structural
alignment using internal distances and angles e.g. DALI,
STRUCTURAL, SSAP.. alg.
15
Structural classification, SSAP
• SSAP, ‘Sequential Structure Alignment Program’
Basic idea ; The similarity between residue i in molecule A
and residue k in molecule B is characterised in terms of their
structural surroundings
This similarity can be quantified into a score, Sik
Based on this similarity score and some specified gap penalty,
dynamic programming is used to find the optimal structural
alignment
16
Structural classification, SSAP
The structural neighborhood of residue i in A compared
to residue k in B
i
k
17
Structural classification, SSAP..
Distance between residue i & j in molecule A ; dAi,j
Similarity for two pairs of residues, i j in A & k l in B ;
sij,kl
a
 A
,
B
dij  d kl  b
a,b constants
Similarity between residue i in A and residue k in B ;
n
a
Si ,k   A
B
m   n d i ,i  m  d k , k  m  b
Idea ; Si,k is big if the distances from residue i in A to the 2n
nearest neighbours are similar to the corresponding distances
around k in B
18
Structural classification, SSAP..
This works well for small structures and local structural
alignments - however, insertions and deletions cause problems
 unrelated distances
i=5
A : HSERAHVFIM..
B : GQ-VMAC-NW..
k=4
- The real algorithm uses Dynamic programming on two levels,
first to find which distances to compare  Sik, then to align the
structures using these scores
19
Experimental techniques for structure
determination
• X-ray Crystallography
• Nuclear Magnetic Resonance
spectroscopy (NMR)
• Electron Microscopy/Diffraction
• Free electron lasers ?
20
X-ray Crystallography
21
X-ray Crystallography..
• From small molecules to viruses
• Information about the positions of
individual atoms
• Limited information about
dynamics
• Requires crystals
22
23
NMR
• Limited to molecules up to ~50kDa
(good quality up to 30 kDa)
• Distances between pairs of
hydrogen atoms
• Lots of information about dynamics
• Requires soluble, non-aggregating
material
• Assignment problem
24
Electron Microscopy/ Diffraction
• Low to medium resolution
• Limited information about
dynamics
• Can use very small crystals
(nm range)
• Can be used for very large
molecules and complexes
25
26
Structure Prediction
?
GPSRYIV…
27
Protein Folding
• Different sequence  Different
structure
• Free energy difference small due
to large entropy decrease,
DG = DH - TDS
28
Structure Prediction
Why is structure prediction and especially ab
initio calculations hard..?
• Many degrees of freedom / residue
• Remote noncovalent interactions
• Nature does not go through all conformations
• Folding assisted by enzymes & chaperones
29
Molecular dynamics
Ab initio calculations used
for smaller problems ;
• Calculation of affinity
• Enzymatic pathways
30
Sequence Classification rev.
• Class : Secondary structure content
• Fold : Major structural similarity.
• Superfamily : Probable common
evolutionary origin.
• Family : Clear evolutionary relationship.
31
Structure Prediction
• Search sequence data banks for homologs
• Search methods e.g. BLAST, PSIBLAST,
FASTA…
• Homologue in PDB..?
IVTY…PGGG HYW…QHG
32
Structure Prediction
Multiple sequence / structure alignment
• Contains more information than a single sequence
for applications like homology modeling and
secondary structure prediction
• Gives location of conserved parts
and residues likely to be buried in
the protein core or exposed to solvent
33
Multiple alignment example
HFD fingerprint
34
Secondary Structure Prediction
• Statistical Analysis (old fashioned):
– For each amino acid type assign it’s ‘propensity’
to be in a helix, sheet, or coil.
• Limited accuracy ~55-60%.
• Random prediction ~38%.
MTLLALGINHKTAP...
CCEEEEEECCCCCC...
35
The Chou & Fasman Method
• Each residue is classified as:
–
–
–
–
–
Ha/Hb, strong helix / strand former.
ha/hb, weak helix / strand former.
I, indifferent.
ba/bb, weak helix/strand breaker.
Ba/Bb, strong helix / strand breaker.
36
The Chou & Fasman Method..
• Score each residue:
– Ha/ha=1, Ia=0 or ½, Ba/ba=-1.
– Hb/hb=1, Ib=0 or ½, Bb/bb=-1.
• Helix nucleation:
– Score > 4 in a “window” of 6 residues.
• Strand nucleation:
– Score > 3 in a “window” of 5 residues.
• Propagate until score < 1 in a 4 residue “window”.
37
The Chou & Fasman Method..
GPSRYIVTLANGK
Helix:
No nucl.
Strand
Nucleation
Propagate
Result
-1 -1 0 0 -1 1 1 0 1
1 -1 -1 1
-2 0 1 2 3 3 1
-1 -1 -1 .5 1 1 1 1 1
0
0 -1 -1
-1.5 .5 2.5 4.5 5 4 3 1 -1
-2.5 -.5 1.5 …
3 1 -1
GPSRYIVTLANGK
38
Modern methods
• Neural networks (e.g. the PHD server):
– Input: a number of protein sequences +
secondary structure.
– Output: a trained network that predicts
secondary structure elements with ~70%
accuracy.
• Use many different methods and compare
(e.g. the JPred server)!
39
Summary
• The function of a protein is governed by its structure
• Different sequence  Different structure
• PDB, protein data bank
• Secondary structure prediction is hard, tertiary
structure prediction is even harder
• Use homologs whenever possible or different methods
to assess quality
40
41