Structure Classification
Download
Report
Transcript Structure Classification
Protein Structure
Lesk, chapter 5
Details on SCOP and CATH can be found in
Structural Bioinformatics, Bourne/Weissig, chapter 12 and 13
Michael Schroeder
BioTechnological Center
TU Dresden
Biotec
Folding
Proteins are linear polymer
mainchains with different amino
acid side chains
Proteins fold spontaneously
reaching a state of minimal
energy
Side and main chains
interact with one another and
with solvent
Example movie
Jones, D.T. (1997) Successful ab initio
prediction of the tertiary structure of NKLysin using multiple sequences and
recognized supersecondary structural
motifs. PROTEINS. Suppl. 1, 185-191
By Michael Schroeder, Biotec,
2
Examining Proteins
Specialised tools with
different views of
structure
Corey, Pauling, Koltun
(CPK)
Diameter of sphere ~
atomic radius
Hydrogen white,
carbon grey, nitrogen
blue, oxygen red,
sulphur yellow
Cartoon
Wire
Balls
By Michael Schroeder, Biotec,
3
Examining Proteins
By Michael Schroeder, Biotec,
4
Protein Folding
Conformation of residue
Rotation around N-Ca bond, (phi)
Rotation around Ca-C bond, (psi)
Rotation around peptide bond (omega)
Residue
Peptide bond tends to be
planar and
in one of two states:
trans 180 (usually) and
cis, 0 (rarely, and mostly proline)
By Michael Schroeder, Biotec,
Image taken from www.expasy.org/swissmod/course
5
Sasisekharan-RamakrishnanRamachandran plot
Solid line =
energetically preferred
Outside dotted line =
disallowed
Most amino acids fall into
R region (right-handed
alpha helix) or -region
(beta-strand)
Glycine has additional
conformations (e.g. lefthanded alpha helix = L
region) and in lower right
panel
By Michael Schroeder, Biotec,
Image taken from www.expasy.org/swissmod/course
6
Ramachandran plot
Plot for a protein with
mostly beta-sheets
Image taken from www.expasy.org/swissmod/course
By Michael Schroeder, Biotec,
Example for
conformations
7
Helices and Strands
Consecutive residues in alpha or beta
conformation generate alpha-helices and betastrands, respectively
Such secondary structure elements are stabilised by
weak hydrogen bonds
They are by turns or loops, regions in which the
chain alters direction
Turns are often surface exposed and tend to
contain charged or polar residues
By Michael Schroeder, Biotec,
8
Alpha Helix
Residue j is hydrogen-bonded
to residue j+4
3.6 residues per turn
1.5A rise per turn
Repeat every 3.6*1.5A = 5.4 A
= -60 , = -45
Image takenBiotec,
from www.expasy.org/swissmod/course
By Michael Schroeder,
9
Beta strand
By Michael Schroeder, Biotec,
Image taken from www.expasy.org/swissmod/course
10
Beta Sheets
By Michael Schroeder, Biotec,
Image taken from www.expasy.org/swissmod/course
11
Turn
Residue j is bonded to
residue j+3
Often proline and
glycine
By Michael Schroeder, Biotec,
Image taken from www.expasy.org/swissmod/course
12
How to Fold a Structure
All residues must have stereochemically allowed
conformations
Buried polar atoms must be hydrogen-bonded
If a few are missed, it might be energetically preferable
to bond these to solvent
Enough hydrophobic surface must be buried and
interior must be sufficiently densely packed
There is evidence, that folding occurs hierarchically:
First secondary structure elements, then supersecondary,…
This justifies hierarchic approach when simulating
folding
By Michael Schroeder, Biotec,
13
Structure Alignment
+
By Michael Schroeder, Biotec,
Slides from Hanekamp, University of Wyoming, www.uwyo.edu
14
Structure Alignment
+
By Michael Schroeder, Biotec,
15
Structure Alignment
In the same way that we align sequences, we wish to
align structure
Let’s start simple: How to score an alignment
Sequences: E.g. percentage of matching residues
Structure: rmsd (root mean square deviation)
By Michael Schroeder, Biotec,
16
Root Mean Square Deviation
What is the distance between two points a with
coordinates xa and ya and b with coordinates xb and
yb?
Euclidean distance:
d(a,b) = √ (xa--xb )2 + (ya -yb )2 + (za -zb )2
a
b
By Michael Schroeder, Biotec,
17
Root Mean Square Deviation
In a structure alignment the score measures how far
the aligned atoms are from each other on average
Given the distances di between n aligned atoms, the
root mean square deviation is defined as
rmsd = √ 1/n ∑ di2
By Michael Schroeder, Biotec,
18
Quality of Alignment and Example
Unit of RMSD => e.g. Ångstroms
Identical structures => RMSD = “0”
Similar structures => RMSD is small (1 – 3 Å)
Distant structures => RMSD > 3 Å
Structural superposition of gamma-chymotrypsin and
Staphylococcus aureus epidermolytic toxin A
By Michael Schroeder, Biotec,
19
Pitfalls of RMSD
all atoms are treated equally
(e.g. residues on the surface have a higher degree of
freedom than those in the core)
best alignment does not always mean minimal
RMSD
significance of RMSD is size dependent
By Michael Schroeder, Biotec,
From www.uwyo.edu/molecbio/LectureNotes/ MOLB5650 20
Alternative RSMDs
aRMSD = best root-mean-square deviation calculated over all
aligned alpha-carbon atoms
bRMSD = the RMSD over the highest scoring residue pairs
wRMSD = weighted RMSD
Source: W. Taylor(1999), Protein Science, 8: 654-665.
http://www.prosci.uci.edu/Articles/Vol8/issue3/8272/8272.html#relat
By Michael Schroeder, Biotec,
From www.uwyo.edu/molecbio/LectureNotes/ MOLB5650 21
Computing Structural Alignments
DALI (Distance-matrix-ALIgnment) is one of the first tools for structural
alignment
How does it work?
Atoms:
Given two structures’ atomic coordinates
Compute two distance matrices:
Compute for each structure all pairwise inter-atom distances.
This step is done as the computed distances are independent of a
coordinate system
The two original atomic coordinate sets cannot be compared, the two
distance matrices can
Align two distance matrices:
Find small (e.g. 6x6) sub-matrices along diagonal that match
Extend these matches to form overall alignment
This method is a bit similar to how BLAST works.
SSAP (double dynamic programming) in term 3.
By Michael Schroeder, Biotec,
22
DALI Example
The regions of common fold, as determined by the program
DALI by L. Holm and C. Sander, in the TIM-barrel proteins
mouse adenosine deaminase [1fkx] (black) and Pseudomonas
diminuta phosphotriesterase [1pta] (red):
By Michael Schroeder, Biotec,
23
Protein zinc finger (4znf)
By Michael Schroeder, Biotec,
Slides from Hanekamp, University of Wyoming, www.uwyo.edu
24
Superimposed 3znf and 4znf
30 CA atoms RMS = 0.70Å
248 atoms RMS = 1.42Å
By Michael Schroeder, Biotec,
Lys30
Slides from Hanekamp, University of Wyoming, www.uwyo.edu
25
Superimposed 3znf and 4znf
backbones
30 CA atoms RMS = 0.70Å
By Michael Schroeder, Biotec,
Slides from Hanekamp, University of Wyoming, www.uwyo.edu
26
RMSD vs. Sequence Similarity
At low sequence identity, good structural
alignments possible
By Michael Schroeder, Biotec,
Picture from www.jenner.ac.uk/YBF/DanielleTalbot.ppt
27
Structure Classification
By Michael Schroeder, Biotec,
28
Why classify structures?
Structure similarity is good indicator for homology,
therefore classify structures
Classification at different levels
Similar general folding patterns (structures not
necessarily related)
Possibly low sequence similarity, but similar structure
and function implies very likely homology
High sequence similarity implies similar structures
and homology
Classification can be used to investigate
evolutionary relationships and possibly infer
function
By Michael Schroeder, Biotec,
29
Structure Classification
SCOP: Structural Classification of Proteins
Hand curated (Alexei Murzin, Cambridge) with some
automation
CATH: Class, Architecture, Topology, Homology
Automated, where possible, some checks by hand
FSSP: Fold classification based on StructureStructure alignment of Proteins
Fully automated
Reasonable correspondance (>80%)
By Michael Schroeder, Biotec,
30
Evolutionary Relation
Strong sequence similarity is assumed to be sufficient to
infer homology
Close structural and functional similarity together are also
considered sufficient to infer homology
Similar structure alone not sufficient, as proteins may have
converged on structure due to physiochemical necessity
Similar function alone not sufficient, as proteins may have
developed it due to functional selection
In general, structure is more conserved than sequence
Beware: Descendents of ancestor may have different
function, structure, and sequence! Difficult to detect
By Michael Schroeder, Biotec,
31
What is a domain?
Single and Multi-Domain Proteins
By Michael Schroeder, Biotec,
32
What is a domain?
Functional: Domain is “independent” functional
unit, which occurs in more than one protein
Physiochemical: Domain has a hydrophobic core
Topological: Intra-domain distances of atoms are
minimal, Inter-domain distances maximal
Difficult to exactly define domain
Difficult to agree on exact domain border
By Michael Schroeder, Biotec,
33
Domains
re-occur
A domain re-occurs in
different structures and
possibly in the context
of different other
domains
P-loop domain in
1goj: Structure Of A
Fast Kinesin:
Implications For
ATPase Mechanism
and Interactions
With
Microtubules Motor
Protein (single
domain)
1ii6: Crystal
Structure Of The
Mitotic Kinesin Eg5
In Complex With
Mg-ADP Cell Cycle
(two domains)
By Michael Schroeder, Biotec,
34
Domains re-occur
1in5: interaction of P-loop domain
(green & orange) and winged helix
DNA binding domain
By Michael Schroeder, Biotec,
1a5t: interaction of P-loop domain
(green & orange) and DNA
polymerase III domain
35
Domains have hydrophobic core
Kyte J., Doolittle R.F, J.
Mol. Biol. 157:105132(1982).
Hydrophobicity Plot for 1GOJ Kinesin Motor
Hydrophobicity
3
2
1
0
-1
1
51
101
151
-2
-3
Residue
By Michael Schroeder, Biotec,
201
251
301
Ala: 1.800
Arg: -4.500
Asn: -3.500
Asp: -3.500
Cys: 2.500
Gln: -3.500
Glu: -3.500
Gly: -0.400
His: -3.200
Ile: 4.500
Leu: 3.800
Lys: -3.900
Met: 1.900
Phe: 2.800
Pro: -1.600
Ser: -0.800
Thr: -0.700
Trp: -0.900
Tyr: -1.300
Val: 4.200
36
Intra-domain distances minimal
Distances between atoms
within domain are minimal
Distances between atoms of
two different domains are
maximal
By Michael Schroeder, Biotec,
37
PDB, Proteins, and Domains
Ca. 20.000 structures in PDB
Dom#
Freq.
1
8464
2
4358
3
926
4
1888
5
148
8000
6
624
6000
7
42
8
491
9
22
10
58
50% single domain
50% multiple domain
90% have less than 5 domains
Distribution of Number of Domains
Frequency
10000
4000
2000
0
-2000
0
10
20
30
Number of Domains
40
50
60
…
By Michael Schroeder, Biotec,
…
30
7
31
1
32
16
36
1
40
8
42
1
48
3
49
1
38
A structure with 49 domains
1AON, Asymmetric Chaperonin Complex Groel/Groes/(ADP)7
By Michael Schroeder, Biotec,
39
SCOP: Structural Classification of Proteins
top
CLASS
All alpha (218)
All Beta (144)
Alpha+Beta (279)
Alpha/Beta (136)
FOLD
Trypsin-like serine proteases (1)
Immunoglobulin-like (23)
SUPERFAMILY
Transglutaminase (1)
Immunoglobulin (6)
FAMILY
C1 set domains
(antibody constant)
By Michael Schroeder, Biotec,
V set domains
(antibody variable)
40
Class
All beta
(possibly small alpha
adornments)
All alpha
(possibly small beta
adornments)
By Michael Schroeder, Biotec,
41
Class
Alpha/beta (alpha and beta) =
single beta sheet with alpha helices
joining C-terminus of one strand to
the N-terminus of the next
subclass: beta sheet forming barrel
surrounded by alpha helices
sublass: central planar beta sheet
Alpha+beta (alpha plus beta) =
Alpha and beta units are largely
separated
Strands joined by hairpins leading
to antiparallel sheets
By Michael Schroeder, Biotec,
42
Class
Multi-domain proteins
have domains placed in
different classes
domains have not been
observed elsewhere
E.g. 1hle
By Michael Schroeder, Biotec,
43
Class
Membrane (few and most
unique) and cell surface
proteins
E.g. Aquaporin 1ih5
By Michael Schroeder, Biotec,
44
Class
Small Proteins
E.g. Insulin, 1pid
By Michael Schroeder, Biotec,
45
Class
Coiled coil proteins
E.g. 1i4d, Arfaptin-Rac
binding fragment
By Michael Schroeder, Biotec,
46
Class
Low-resolution structures,
peptides, designed proteins
E.g. 1cis, a designed
protein, hybrid protein
between chymotrypsin
inhibitor CI-2 and helix E
from subtilisin Carlsberg
from Barley (Hordeum
vulgare), hiproly strain
By Michael Schroeder, Biotec,
47
Fold, Superfamily, Family
Fold
Common core structure
i.e. same secondary structure elements in the same
arrangement with the same topological structure
Superfamily
Very similar structure and function
Family
Sequence identity (>30%) or extremely similar
structure and function
By Michael Schroeder, Biotec,
48
Distribution (2007)
Class
Fold
Superfamily
Family
All alpha
259
459
772
All beta
165
331
679
Alpha/beta
141
232
736
Alpha+beta
334
488
897
Multidomain
53
53
74
Membrane and
cell surface
50
92
104
Small proteins
85
122
202
1086
1777
3464
Total
By Michael Schroeder, Biotec,
49
Uses of SCOP
Automatic classification
Understanding of protein enzymatic function
Use superfamily and fold to study distantly related
proteins
Study sequence and structure variability
Derive substitution matrices for sequence
comparison
Extract structural principles for design
Study decomposition of multi domain proteins
Estimate total number of folds
Derived databases
By Michael Schroeder, Biotec,
50
PDB, Proteins, Domains revisited
80% of PDB have only one type of
SCOP superfamily
15% of PDB have two different SCOP
superfamilies
Frequency
Frequency of Number of SCOP Superfamilies
16000
14000
12000
10000
8000
6000
4000
2000
0
-2000 0
5
10
15
Number of Superfamilies
By Michael Schroeder, Biotec,
20
25
sfNo
sfNoFreq
1
13960
2
2721
3
495
4
178
5
33
6
25
7
1
9
4
20
9
21
1
22
1
23
6
51
A structure with
23 different
superfamilies
1k9m Co Crystal
Structure Of Tylosin
Bound To The 50S
Ribosomal Subunit Of
Haloarcula Marismortui
Ribosome
By Michael Schroeder, Biotec,
52
The 20 Most
Frequently
Occurring
Superfamilies
Suyperfamily
SCOP ID
#PDB
Immunoglobulin
b.1.1
823
Lysozyme-like
d.2.1
777
Trypsin-like serine proteases
b.47.1
649
P-loop containing nucleotide triphosphate hydrolases
c.37.1
521
NAD(P)-binding Rossmann-fold domains
c.2.1
384
Globin-like
a.1.1
384
(Trans)glycosidases
c.1.8
332
Acid proteases
b.50.1
288
Concanavalin A-like lectins/glucanases
b.29.1
230
Thioredoxin-like
c.47.1
217
EF-hand
a.39.1
212
alpha/beta-Hydrolases
c.69.1
195
b.6.1
178
Ribonuclease H-like
c.55.3
178
PLP-dependent transferases
c.67.1
176
Periplasmic binding protein-like II
c.94.1
171
Carbonic anhydrase
b.74.1
169
Metalloproteases (\zincins\"), catalytic domain"
d.92.1
169
FAD/NAD(P)-binding domain
c.3.1
162
Cytochrome c
a.3.1
161
Cupredoxins
By Michael Schroeder, Biotec,
53
CATH
Class
secondary structure
composition
Architecture
orientation in 3D
Topology
connectivity
Homology
Grouped by evidence for
homology (sequence,
structure and function)
By Michael Schroeder, Biotec,
54
Generating CATH
1. Identify close relatives by pairwise sequence
alignment
2. Detect more distant relatives using
2a. sequence profiles and
2b. structure alignment
3. Structures still unclassified after 1. and 2. are
examined by hand to detect domain boundaries
4. Try 2. and 3. again
5. If still unclassified assign manually
By Michael Schroeder, Biotec,
55
CATH step 1:
Sequence-based Identification of
Homologues Structures
> 30% sequence similarity implies similar
structure
Relatives identified using pairwise alignment are
clustered using hierarchical clustering with single
linkage
Reminder…
By Michael Schroeder, Biotec,
56
1
1
2
3
4
5
0
2
6
10
9
0
5
9
8
0
4
5
0
3
2
3
4
5
(1,2)
Hierarchical Clustering
0
(1,2) 3
4
5
0
5
9
8
0
4
5
4
0
3
3
3
4
5
5
0
(1,2)
(1,2)
3
(4,5)
0
5
8
1
0
4
0
3
(4,5)
0
(1,2)
(3,(4,5))
2
(1,2)
(3,(4,5))
0
5
1
2
3
4
5
0
By Michael Schroeder, Biotec,
57
Hierarchical Clustering:
How to define distance between clusters?
Single linkage:
Minimum
Example: Distance (A,B) to C is 1
A
B
Complete linkage:
Maximum
C
A
B
C
0
1
2
0
1
0
Example: Distance (A,B) is C is 2
Average linkage:
Average
Example: Distance (A,B) to C is 1.5
Are dendrograms always the same A B C
independent of the linkage method?
By Michael Schroeder, Biotec,
A B C
58
Hierarchical Clustering: Chaining
Beware of chaining
when using single
linkage
A
B
As nearest neighbour
selected, it appears that
all members of the
cluster are very similar
to each other, when in
fact A and Z are very
different
C
D
…
Z
A
B
C
D
…
Z
0
1
2
3
…
25
0
1
2
…
24
0
1
…
23
0
…
22
…
0
A B C D …Z
By Michael Schroeder, Biotec,
59
CATH and single linkage
It is argued that
structural data is quite sparse,
hence it cannot be expected that all cluster
members will be very similar (in terms of
sequence) to each other,
so that the chaining effect is even useful
By Michael Schroeder, Biotec,
60
CATH step 2a:
Profile-based methods such as PSI-BLAST are used
to detect distant relatives
Build profiles using all sequence data available
(rather than only sequences for which structure
exists)
This increases quality of profiles dramatically
51% distant relatives retrieved using profiles based on
sequences with known structure only
82% distant relatives retrieved using profile based on
all sequences
By Michael Schroeder, Biotec,
61
CATH step 2b: Structure-based
methods to detect distant relatives
For ca. 15% of structures, sequence-based method
does not work
Example: For globins sequence similarity can fall
below 10%, yet structure and function (oxygenbinding) are preserved
Use SSAP, the Sequential Structure Alignment
Program
By Michael Schroeder, Biotec,
62
Clustering Result of
Structure Alignment
Relatives identified using pairwise alignment
are clustered using hierarchical clustering
with single linkage
By Michael Schroeder, Biotec,
63
Improving Efficiency: GRATH
Screening large structures (>300 residues) against database
can take days
Idea of GRATH (Graphical Representation of CATH):
Improve efficiency by filtering at a higher level before doing
detailed comparison
Represent protein as graph where
Nodes are secondary structure elements represented as
their midpoint, tilt, and rotation
Edges distances between midpoints of secondary structure
elements
Use algorithm to determine subgraph isomorphism (i.e. does
one graph occur in another one)
Yes, then do detailed comparison using SSAP
By Michael Schroeder, Biotec,
64
Structure Prediction and Modelling
By Michael Schroeder, Biotec,
65
Structure Prediction:
Four Main Problem Areas
Given a sequence with unknown structure, predict its structure
Secondary structure prediction
Predict regions of helices and strands
Homology modelling
Predict structure from known structures of one or more related
proteins
Fold recognition
Given a library of structures, determine which one (if any) is
the fold of the given sequence
Prediction of novel folds: A-priori and knowledge-based
methods
By Michael Schroeder, Biotec,
66
Structure Prediction of Novel Folds:
Two Approaches
A priori:
Most approaches aim to reproduce inter-atomic
interactions by
defining an energy function and
trying to find global minimum
Problem:
Inadequacy of the energy function
Algorithms get stuck in local minima
Knolwedge-based:
Find similarities to known structures or substructures
By Michael Schroeder, Biotec,
67
Secondary Structure Prediction
A successful tool for secondary structure prediction is PROF
PROF uses a neural networks to learn secondary structure from
known structures
¾ of PROF’s prediction are correct
At CASP 2000 it predicted e.g. the following
|10
|20
|30
|40
|50
Sequence
ALVEDPPLKVSEGGLIREGYDPDLDALRAAHREGVAYFLELEERERERTG
Prediction HH------------EEE------HHHHHHHHHH-HHHHHHHHHHHHHHHExperiment -E-------------E-----HHHHHHHHHHHHHHHHHHHHHHHHHHHH|60
|70
|80
| 90
|100
IPTLKVGYNAVFGYYLEVTRPYYERVPKEYRPVQTLKDRQRYTLPEMKEK
--EEEEEEEEEEEEEEEE-----------EEEEEEEE—-EEEE-HHHHHH
----EEEEE---EEEEEEEHHHHHH-----EEEEE---EEEEE-HHHHHH
|110
|120
EREVYRLEALIRRREEEVFLEVRERAKRQ
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH-By Michael Schroeder, Biotec,
68
PROF’s prediction
The regions
predicted by the
PROF server of
Rost to be
helical are
shown as wider
ribbons. The
prediction
missed only a
short helix, at
the top left of
the picture
By Michael Schroeder, Biotec,
69
Homology modelling
Define the model of an unknown structure by making
minimal changes to a relative with known structure
Align amino acid sequences of target and one or more
known structures
Insertions and deletions should be in loop regions
Determine mainchain segments to represent the regions
containing insertions and deletions and stitch these into the
known structure
Replace the sidechains of the residues that have been
mutated
Examine the model (by hand and computationally) to detect
collisions between atoms
Refine the model by limited energy minimisation
By Michael Schroeder, Biotec,
70
Accuracy of Homology Modelling
Works for >40-50% sequence similarity
Example: SWISS-MODEL Prediction of neurotoxin of red
scorpion (1DQ7) from neurotoxin of yellow scorpion (1PTX)
By Michael Schroeder, Biotec,
71
Fold Recognition: 3D Profiles
Given a sequence determine which (if any) fold is most similar
Can we build profiles to represent structures of similar fold
(similar to sequence profiles)?
3D profiles:
Classify the environment of each residue
Secondary structure:
Is it part of helix, sheet or other (determined by Mainchain
hydrogen bonding interactions)
Surface exposure:
<40A2, 40-114A2, or >114A2 accessible surface area
Polar or non-polar nature of environment
Total of 18 residue classes, one of which each residue is part of
Sequence of these residue classes is 3D profile
By Michael Schroeder, Biotec,
72
3D Profiles and Alignments
Structure-Structure Alignment:
3D profiles of two known structures can be aligned against each other
Sequence-Structure Alignment:
Based on existing 3D profiles, probability can be determined for a
residue occurring in a residue class.
Using this probability, we can assign 3D profile to a sequence
And hence align the sequence 3D profile to a structure 3D profile
For correctly determined protein structures, the structure 3D profile
fits the sequence 3D profile well
However, other proteins may score even better
If a structure does not match its own 3D profile well it is likely that
there is an error in the structure determination
By Michael Schroeder, Biotec,
73
Threading
Pull query sequence
through known structure
and rate the score
Necessary:
Method to score the models
to select best one
Method to calibrate the
scores to decide which of
the best is correct
By Michael Schroeder, Biotec,
Homology
modelling
Threading
Identify
homologues
Try all possible
parents
Determine
optimal
alignment
Try many
alignments
Optimize one
model
Evaluate many
rough models
74
Scoring for Threading
Empirical patterns of residue neighbours derived
from known structures
Observe distribution of inter-residue distances for
all 20 x 20 residue pairs
Derive probability distribution as function of
distance in space and on sequence
Boltzmann equation relates probability and energy
Reverse this and derive energy function from
probability distribution
By Michael Schroeder, Biotec,
75
Threading the sequence
template
Target
By Michael Schroeder, Biotec,
Slides from Hanekamp, University of Wyoming, www.uwyo.edu
76
“Threaded” sequence
Yellow = adrenergic receptor sequence
Blue = adrenergic receptor (PDB 1F88 )
By Michael Schroeder, Biotec,
Slides from Hanekamp, University of Wyoming, www.uwyo.edu
77
Modeled structure
Gaps
By Michael Schroeder, Biotec,
Slides from Hanekamp, University of Wyoming, www.uwyo.edu
78
Corrected Model
By Michael Schroeder, Biotec,
Slides from Hanekamp, University of Wyoming, www.uwyo.edu
79
Ab initio Structure Prediction
By Michael Schroeder, Biotec,
80
Molecular dynamics
Structure prediction = place atoms so that
interactions between them create a unique state of
maximum stability
Problem:
Model of inter-atomic distances is not complete
Computational scale:
Large number of variables and massive search space
Non-linearities
Rough energy surface with many local minima
By Michael Schroeder, Biotec,
81
Conformational energy calculations
Bond stretching:
Bond angle bend
Torsion angle (e.g. , , )
Van der Waals interactions
Short-range repulsion ~R-12 and long-range attraction ~R-6, where
R is the inter-atom distance
Hydrogen bond
Weak chemical/electrostatic interaction, ~R-12 and ~R-10
Electrostatics
Charges on atoms
Solvent
Interactions with water, salt, sugar, etc.
By Michael Schroeder, Biotec,
82
Rosetta
Predicts structure by first generating structures of
fragments using known structures (3-9 residues)
Combine fragments using Monte Carlo simulation
using an energy function with terms for
Paired beta-sheets
Burial of hydrophobic residues
Carries out 1000 simulations
Results are clustered and the centre of the largest
cluster is presented as prediction
Demo
By Michael Schroeder, Biotec,
83
ROSETTA
The program ROSETTA, by D. Baker and colleagues,
can predict the structures of proteins for which no
complete domain of similar folding pattern appears in
the database. Prediction by ROSETTA of H. influenzae,
hypothetical protein. Black lines, experimental
structure; red lines, prediction
By Michael Schroeder, Biotec,
84
Rosetta
Prediction by ROSETTA of The N-terminal half of
domain 1 of human DNA repair protein Xrcc4. This
figures shows a selected substructure of Xrcc4
containing the N-terminal 55 out of 116 residues. Black
lines, experimental structure; red lines, prediction
By Michael Schroeder, Biotec,
85
LINUS
Another programme with similar idea
Prediction by LINUS (program by G.D. Rose and R. Srinivasan) of Cterminal domain of rat endoplasmic reticulum protein ERp29. Black
lines, experimental structure; red lines, prediction
By Michael Schroeder, Biotec,
86
Monte Carlo Simulation
Objective: Find conformation with minimal energy
Problem: Avoid local minima
Algorithm:
1. Generate a random initial conformation x
2. Perturb conformation x to generate a neighbouring conformation x’
3. Calculate the energies E(x) and E(x’), resp., for conformations x and x’
4. If E(x)>E(x’) (i.e. x’ is an improvement, we go down hill from x to x’) then accept
x’ as new conformation and go to 2.
5. If E(x)<E(x’) (i.e. x’ is no improvement, we go uphill from x to x’) then accept x’
as new conformation with probability p
6. The probability p to accept uphill moves is reduced with every step
7. Go to step 2.
Step 1.-4. make sure that we “walk” downhill towards a minimum
Step 5.-7. make sure that if we are in local minimum there is a chance to get out
of it by accepting an uphill move. It’s important that this probability decreases so
that we are getting more and more unlikely to walk uphill
By Michael Schroeder, Biotec,
87
Summary
You should know now
What helices, strands, sheets are
What a Ramachandran plot is
How to score a structural alignment (rmsd)
How to compute a structural alignment
How a domain can be characterised
Why structure classification is useful
What the main structure classes are
How classifications can be generated automatically
What the problems are
What secondary structure prediction, homology modelling, threading,
ab-initio and knowledge-based structure prediction of novel folds are
Visit PDB, SCOP and CATH websites and
Read chapter 5
By Michael Schroeder, Biotec,
88