Comparative modeling - CBS

Download Report

Transcript Comparative modeling - CBS

Comparative modeling
Ole Lund,
Associate Professor,
CBS, BioCentrum, DTU
Comparative modeling



Also known as homology modeling
Uses template from related protein to build
model
Based on the finding that
–
–
Protein structure tend to remain approximately the
same even when many amino acids have
changed during evolution!
selection for conservation of structure?

proteins with similar sequences often have similar
structures
OL
Why make structural models?

Fast and cheap alternative to experimental
determination of structures (X-ray & NMR)
–
–

Not as accurate as experimental methods
Not all proteins can be modeled with current
methods
Applications
–
–
–
Drug discovery (Requires accurate model)
Plan new experiments (mutations)
Understanding of function
OL
Steps in comparative modeling
1.
2.
3.
4.
5.
6.
Find template
Make alignment
Build loops
Model side chains
Refinement
Evaluate model
OL
Recovery from errors

An error on an earlier step is normally
unrecoverable on a later step
–
–

The alignment can not make up for a wrong
choice of template
Loop modeling can not make up for a wrong
alignment
Errors may be discovered on a later step and
corrected for by going back and correcting it
–
i.e. by selecting a new (and better) template
OL
Template identification

Search with sequence
–
–
–

Blast
Psi-Blast
Fold recognition methods
Use significance levels (P or E values) - not %ID

BLAST reports E-values:
–

# of random hits with expected to be found with a given score
Rather than P values:
–





probability of finding at least one hit with a given score
P = 1- exp(-E)
E=loge(1-P)
– http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html
Use biological information
Functional annotation in databases
Active site/motifs
OL
Example: Query sequence
>gi|2065035|emb|CAA65601.1| beta-lactamase [Chryseobacterium meningosepticum
MLKKIKISLILALGLTSLQAFGQENPDVKIEKLKDNLYVYTTYNTFNGTKYAANAVYLVTDKGVVVIDCP
WGEDKFKSFTDEIYKKHGKKVIMNIATHSHDDRAGGLEYFGKIGAKTYSTKMTDSILAKENKPRAQYTFD
NNKSFKVGKSEFQVYYPGKGHTADNVVVWFPKEKVLVGGCIIKSADSKDLGYIGEAYVNDWTQSVHNIQQ
KFSGAQYVVAGHDDWKDQRSIQHTLDLINEYQQKQKASN
Since the discovery of penicillin, bacteria have developed defense mechanisms
against these drugs. In particular, this has become a problem during the last
decades, where certain pathogenic bacteria have become resistant to
antibiotics. The primary defense mechanism is production of beta-lactamases,
which are enzymes cleaving beta-lactam antibiotics.
http://www.matfys.kvl.dk/~antony/
OL
http://www.ncbi.nlm.nih.gov/blast/
Blast search vs. pdb
>gi|3318914|pdb|1A7T|A
gi|3318915|pdb|1A7T|B
gi|3891997|pdb|1A8T|A
gi|3891998|pdb|1A8T|B
Length = 232
Chain
Chain
Chain
Chain
A,
B,
A,
B,
Metallo-Beta-Lactamase
Metallo-Beta-Lactamase
Metallo-Beta-Lactamase
Metallo-Beta-Lactamase
With Mes
With Mes
In Complex With L-159,061
In Complex With L-159,061
Score = 126 bits (317), Expect = 7e-30
Identities = 62/216 (28%), Positives = 111/216 (51%), Gaps = 1/216 (0%)
Query: 27
Sbjct: 10
Query: 86
Sbjct: 70
DVKIEKLKDNLYVYTTYNTFNG-TKYAANAVYLVTDKGVVVIDCPWGEDKFKSFTDEIYK 85
D+ I +L D +Y Y +
G
+N + ++ +
++D P + + +
+ +
DISITQLSDKVYTYVSLAEIEGWGMVPSNGMIVINNHQAALLDTPINDAQTEMLVNWVTD 69
KHGKKVIMNIATHSHDDRAGGLEYFGKIGAKTYSTKMTDSILAKENKPRAQYTFDNNKSF 145
KV
I H H D GGL Y + G ++Y+ +MT + ++ P ++ F ++ +
SLHAKVTTFIPNHWHGDCIGGLGYLQRKGVQSYANQMTIDLAKEKGLPVPEHGFTDSLTV 129
Query: 146 KVGKSEFQVYYPGKGHTADNVVVWFPKEKVLVGGCIIKSADSKDLGYIGEAYVNDWTQSV 205
+
Q YY G GH DN+VVW P E +L GGC++K
+ +G I +A V W +++
Sbjct: 130 SLDGMPLQCYYLGGGHATDNIVVWLPTENILFGGCMLKDNQTTSIGNISDADVTAWPKTL 189
Query: 206 HNIQQKFSGAQYVVAGHDDWKDQRSIQHTLDLINEY 241
++ KF A+YVV GH ++
I+HT ++N+Y
Sbjct: 190 DKVKAKFPSARYVVPGHGNYGGTELIEHTKQIVNQY 225
OL
Template sequence
1A8TB. Chain B, Metallo-...[gi:3891998] BLink, Domains, Links
LOCUS
1A8T_B
232 aa
linear
BCT 23-MAR-1998
DEFINITION Chain B, Metallo-Beta-Lactamase In Complex With L-159,061.
ACCESSION
1A8T_B
VERSION
1A8T_B GI:3891998
DBSOURCE
pdb: molecule 1A8T, chain 66, release Mar 23, 1998;
deposition: Mar 23, 1998;
class: Hydrolase;
source: Mol_id: 1; Organism_scientific: Bacteroides Fragilis;
Strain: Tal3636; Variant: Clinical Isolate; Gene: Ccra;
Expression_system: Escherichia Coli;
Exp. method: X-Ray Diffraction.
KEYWORDS
.
SOURCE
Bacteroides fragilis
ORGANISM Bacteroides fragilis
Bacteria; Bacteroidetes; Bacteroides (class); Bacteroidales;
Bacteroidaceae; Bacteroides.
……………
ORIGIN
1 aqksvkisdd isitqlsdkv ytyvslaeie gwgmvpsngm ivinnhqaal ldtpindaqt
61 emlvnwvtds lhakvttfip nhwhgdcigg lgylqrkgvq syanqmtidl akekglpvpe
121 hgftdsltvs ldgmplqcyy lggghatdni vvwlptenil fggcmlkdnq ttsignisda
181 dvtawpktld kvkakfpsar yvvpghgnyg gteliehtkq ivnqyiests kp
OL
//
Template recognition
BlaB – Beta lactamase
Template
1A8T
Chain A
OL
Alignment of query and template

Look at the alignment used to find the template
–
–
–

Are secondary structure elements active sites and other
motifs aligned?
Can gaps be closed?
Are there place for the insertions?
Change the alignment manually or by a different
alignment program/alignment parameters
–
–
Take care not to change it for the worse
On average I only make things slightly worse by manual
intervention!
OL
Alignment
BlaB – Beta lactamase
BLAB
1A8T.A
EKLKDNLYVYTTYNTFNGTKY-AANAVYLVTDKGVVVIDCPWGEDKFKSFTDEIYKKHGKKVIMNIATHS
TQLSDKVYTYVSLAEIEGWGMVPSNGMIVINNHQAALLDTPINDAQTEMLVNWVTDSLHAKVTTFIPNHW
BLAB
1A8T.A
HDDRAGGLEYFGKIGAKTYSTKMTDSILAKENKPRAQYTFDNNKSFKVGKSEFQVYYPGKGHTADNVVVW
HGDCIGGLGYLQRKGVQSYANQMTIDLAKEKGLPVPEHGFTDSLTVSLDGMPLQCYYLGGGHATDNIVVW
BLAB
FPKEKVLVGGCIIKSADSKDLGYIGEAYVNDWTQSVHNIQQKFSGAQYVVAGHDDWKDQRSIQHTLDLIN
1A8T.A LPTENILFGGCMLKDNQTTSIGNISDADVTAWPKTLDKVKAKFPSARYVVPGHGNYGGTELIEHTKQIVN
BLAB
EYQQKQK
1A8T.A QYIESTS
Sequence identity 27%
OL
Template vs alignment identification


If the template was hard to find the correct
alignment will be tough to make
If the Template is correct part of the model
will normally be correct
OL
Build loops

Fragment based methods
–
–

Energy based methods
–
–


Many implementations (M Levitt, L Holm, D Baker etc.)
Fast
Avoid stereo-chemically infeasible solutions
Can see what is bad but not what is good!
Combination of methods is often used
No method can move the model (very much)
towards the native conformation i.e reduce the root
mean square deviation (RMSD) = How many
Ångstrøms you are off
OL
http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php
Loops: The rosetta method


Find fragments (10 per amino acid) with the
same sequence and secondary structure
profile as the query sequence
Combine them using a Monte Carlo scheme
to build them to build the loop
Baker et al.
OL
Model side chains

Knowledge based methods
–
–
SCWRL performed well in CASP4
(http://dunbrack.fccc.edu/SCWRL3.php ,
http://dunbrack.fccc.edu/scwrl3protsci.pdf )
Energy calculations

Slow
OL
SCWRL (Bower, Cohen & Dunbrack)


1.
2.
3.
Sidechain placement With a Rotamer Library
Assumes constant angles and distances of bonds
Each residue begins in its most favored rotamer
Rotamer search to remove steric clashes between
sidechains and backbone
Rotamer search to remove steric clashes between
sidechains
OL
Model (red) vs template (blue)
OL
Model evaluation


Is the structure unlikely?
Distributions of
–
–

Dihedral angles (fraction in most favored regions)
Bond lengths and angles
Procheck
–
www.biochem.ucl.ac.uk/~roman/procheck/proche
ck.html
OL
Example of
Procheck
output
OL
Benchmarking comparative modeling

CASP
–
–

Critical Assessment of Structure Predictions
Sequences from about-to-be-solved-structures
are given to groups who submit their predictions
before the structure is published
EVA
–
–
Newly solved structures are send to prediction
servers.
Evaluates automatic servers
OL
CASP4: Best overall fold
1.
2.
3.
4.
5.
Venclovas, C
Baker, D
Sternberg, M
Rychlewski, L (Bioinfo.PL)
SBI-AT
Tramantano et al., 2001
OL
CASP4: Best details of models
1.
2.
3.
4.
5.
Venclovas, C
Sternberg, M
Honig, B
Baker, D
SBI-AT
Tramantano et al., 2001
OL
Accuracy of SwissModel
OL
http://cubic.bioc.columbia.edu/eva/cm/res/rank.html
EVA
Analysis of Fold accuracy (% Equivalent Positions):
Ranking of the methods:
1. sdsc1
2. 3djigsaw
3. SwissModel
4. cphmodels
5. esypred
OL
Links to modeling servers

Database of links
–

SwissModel
–

http://cl.sdsc.edu/hm.html
ESyPred3D
–

www.bmm.icnet.uk/servers/3djigsaw/
SDSC1
–

www.expasy.ch/swissmod/SM_FIRST.html
3D-Jigsaw
–

http://mmtsb.scripps.edu/cgi-bin/renderrelres?protmodel
http://www.fundp.ac.be/urbm/bioinfo/esypred/
CPHmodels
–
www.cbs.dtu.dk/services/CPHmodels-2.0
OL
Practical conclusions

Several servers exist in the public domain
Template and alignment must be correct
Loops are difficult to model

More info on comparative modeling


–
–
–
http://speedy.embl-heidelberg.de/gtsp/
http://www.cmbi.kun.nl/gv/course/index.html
http://www.umass.edu/microbio/chime/explorer/homol
mod.htm
OL