Shoba Ranganathan - BioInformatics Center, NUS
Download
Report
Transcript Shoba Ranganathan - BioInformatics Center, NUS
Biomolecular Modeling:
building a 3D protein
structure from its sequence
Prof Shoba Ranganathan
Dept. of Chemistry and Biomolecular Sciences,
Macquarie University, Sydney, Australia &
Dept of Biochemistry, Yong Loo Lin School of Medicine
National University of Singapore
([email protected])
Why protein structure?
In the factory of the living cell, proteins
are the workers, performing a variety of
tasks
Each
protein adopts a particular folding
pattern that determines its function
The 3D structure of a protein brings
into close proximity residues that are
far apart in the amino acid sequence
How does a protein fold?
Most newly synthesized proteins fold
without assistance!
Ribonuclease A: denatured protein
could refold and recover its activity (C.
Anfinsen -1966)
“Structure implies function”
The amino acid sequence encodes
the protein’s structural information
1.
2.
3.
4.
5.
Understanding Protein Structure
A Quick Overview of Sequence Analysis
Finding a Structural Homologue
Template Selection
Aligning the Query Sequence to
Template Structure(s)
6. Building the Model
The basics
Proteins are linear heteropolymers: one or more
polypeptide chains
Repeat units: 20 amino acid residues
Range from a few 10s-1000s
Three-dimensional shapes (“folds”)
adopted vary enormously
Experimental methods: X-ray
crystallography, electron microscopy
and NMR (nuclear magnetic
resonance)
The (L-)amino acid
R
Amino
+
N
Side chain = H,CH3,…
Ca
Backbone
O
C
-
O
Carboxylate
The peptide bond
Coplanar atoms
Levels of protein structure
Zeroth: amino acid composition
Primary
This is simply the order of covalent linkages
along the polypeptide chain, i.e. the sequence
itself
Levels of protein structure
Secondary
Local organization of the protein backbone: ahelix, b-strand (which assemble into b-sheets),
turn and interconnecting loop
Ramachandran / phi-psi plot
b-sheet
a-helix (left
handed)
y
a-helix
(right
handed)
f
Levels of protein structure
Tertiary
packing of secondary
structure elements into
a compact spatial unit
“Fold” or domain – this
is the level to which
structure prediction is
currently possible
Levels of protein structure
Quaternary
Assembly of homo- or
heteromeric protein
chains
Usually the functional
unit of a protein,
especially for enzymes
Structural classes
All-a (helical)
All-b (sheet)
Structural classes
a/b (parallel b-sheet)
a+b (antiparallel b-sheet)
Structural information
Protein Data Bank: maintained by the
Research Collaboratory for Structural
Bioinformatics
http://www.rcsb.org/pdb
> 45,744 structures of proteins
Also contains structures of DNA,
carbohydrates, protein-DNA complexes
and numerous small ligand molecules.
The PDB data
Text files
Each entry is identified by a unique 4letter code: say 1emg
1emg entry
Header information
Atomic coordinates in Å (1 Ångstrom
= 1.0e-10 m)
PDB Header details
HEADER
TITLE
TITLE
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
identifies the molecule, any modifications, date
of release of PDB entry
GREENFLUORESCENT PROTEIN
12-NOV-98
1EMG
GREEN FLUORESCENT PROTEIN (65-67 REPLACED BY CRO, S65T
2 SUBSTITUTION, Q80R)
MOL_ID: 1;
2 MOLECULE: GREEN FLUORESCENT PROTEIN;
3 CHAIN: A;
4 ENGINEERED: YES;
5 MUTATION: 65 - 67 REPLACED BY CRO, S65T SUBSTITUTION, Q80R
6 SUBSTITUTION;
7 BIOLOGICAL_UNIT: MONOMER
organism, keywords, method
Authors, reference, resolution if X-ray structure
Sequence, x-reference to sequence databases
The data itself
Coordinates for each heavy (non-hydrogen) atom
from the first residue to the last
ATOM
1
ATOM
2
ATOM
3
ATOM
4
ATOM
5
ATOM
6
------ATOM
1737
TER
1738
N
CA
C
O
CB
OG
SER
SER
SER
SER
SER
SER
A
A
A
A
A
A
2
2
2
2
2
2
29.089
27.883
26.659
26.718
28.039
27.582
9.397
10.162
9.634
8.686
11.660
12.038
51.904
52.185
51.463
50.686
51.932
50.639
1.00
1.00
1.00
1.00
1.00
1.00
81.75
79.71
82.64
81.02
75.59
43.28
CD1 ILE A 229
ILE A 229
39.535
21.584
52.346
1.00 41.62
Any ligands (starting with HETATM) follow the
biomacromolecule
O of water molecules (also HETATM) at the end
Structural Families
SCOP - Structural Classification Of
Proteins
FSSP – Family of Structurally Similar
Proteins
http://scop.mrc-lmb.cam.ac.uk/scop
http://www.ebi.ac.uk/dali/fssp/
CATH – Class, Architecture, Topology,
Homology
http://www.biochem.ucl.ac.uk/bsm/cath
Structure comparison facts
Proteins adopt a limited number of
topologies.
Homologous sequences show very
similar structures, with strong
conservation in secondary structural
elements: variations in non-conserved
regions.
In the absence of sequence
homology, some folds are preferred
by vastly different sequences.
Structure comparison facts
The “active site” (a collection of functionally
critical residues) is remarkably conserved,
even when the protein fold is different.
Structural models (especially those based
on homology) provide insights into
possible function for new proteins.
Implications for
protein engineering
ligand/drug design,
function assignment of genomic data.
Visualizing PDB information
RASMOL: most popular, available for all platforms
(Sayle et al, 2005)
http://www.bernstein-plus-sons.com/software/rasmol
DeepView Swiss-PDBViewer: from Swiss-Prot
(Guex & Peitsch, 1997)
http://tw.expasy.org/spdbv/
Chemscape Chime Plug-in: for PC and Mac
http://www.mdli.com/products/framework/chemscape
PyMOL: Very good, available for all platforms
(DeLano, W.L. The PyMOL Molecular Graphics System, 2002)
http://pymol.sourceforge.net
RASMOL views - SH2 domain
All-atom model
Atom colors:
Space-filling model
NOCS
RASMOL views – 1sha
Ca Trace
Ribbon
Rainbow coloring: N to C
Coloring: by structural units
Homologous folds
Hemoglobin and
erythrocruorin: 31%
sequence identity
Analogous folds
Hemoglobin and
phycocyanin: 9%
sequence identity
Surface Properties
Cro repressor –
DNA complex
Basic residues
in blue
Acidic residues
in red
Mapping Functional Regions
Immunoglobulin l
light chain - dimer
Hydrophobhic
residues in
magenta
Hydrophilic and
charged residues
in cyan
1.
2.
3.
4.
5.
Understanding Protein Structure
A Quick Overview of Sequence Analysis
Finding a Structural Homologue
Template Selection
Aligning the Query Sequence to
Template Structure(s)
6. Building the Model
Siblings and Cousins
Siblings or homologues: sequences with at
least 30% sequence identity over an
alignment length of at least 125 residues and
conservation of function.
Cousins or paralogues: < 30% identity but
with conservation of function
Both show structural conservation
Homologues located using a database search
tool such as BLAST (free webserver):
http://www.ncbi.nlm.nih.gov/BLAST
Paralogues require a more sensitive method
such as PSI-BLAST
Multiple Sequence Alignment
Finding the best way to match the residues of
related sequences
Identical residues must be lined up
The rest should be arranged, based on
observed substitution in protein families
chemical similarity
charge similarity
Where it is impossible to get the residues to
line up, the biological concept of
insertion/deletion in invoked: the ‘gap’ in
alignments
MSA Methods
CLUSTALW / CLUSTALX (Thompson et al, 1997):
freely available for all platforms and one of the best
alignment programs
http://www-igbmc.u-strasbg.fr/BioInfo/ClustalX/Top.html
MAXHOM (Sander & Schneider, 1991): alignment
based on maximum homology; available via the
PredictProtein webserver, free for academics
http://cubic.bioc.columbia.edu/predictprotein/
MALIGN (Johnson et al, 1994): freely available UNIX
program, based on the structural alignment of
protein families
http://www.abo.fi/fak/mnf/bkf/research/johnson/software.html
Alignment Checks
Conservation of functionally important
residues: e.g. the catalytic triad (Asp-SerHis) that are essential for serine
proteinase activity
Line up of structurally important residues:
e.g. cysteines forming disulfide bonds
Overall, maximizing the alignment of “like”
residues
Completely conserved residues usually
indicate some conserved structural or
functional role, especially buried charges
Sequence Motifs & Patterns
From the analysis of the alignment of
protein families
Conserved sequence features, usually
associated with a specific function
PROSITE (Hulo et al, 2006) database for
protein “signature” patterns:
http://www.expasy.ch/prosite
Aligned Sequence Families
From alignments of homologous
sequences:
PRINTS
PRODOM:
http://www.toulouse.inra.fr/prodom.html
From Hidden Markov Model based
methods:
PFAM: http://www.sanger.ac.uk/Pfam
Protein Domains
Most proteins are composed of structural
subunits called domains
A domain is a compact unit of protein
structure, usually associated with a function.
It is usually a “fold” - in the case of
monomeric soluble proteins.
A domain comprises normally only one
protein chain: rare examples involving 2
chains are known.
Domains can be shared between different
proteins: like a LEGO block
Protein Architectures
Beads-on-a-string: sequential location:
tyrosine-protein kinase receptor TIE-1
(immunoglobulin, EGF, fibronectin type-3 and
protein kinase).
Domain insertions: “plugged-in” - pyruvate
kinase (1pyk)
SMART: smart.embl-heidelberg.de
Simple Modular Architecture Retrieval Tool
Dissection into Domains
A sequence, usually > 125 residues should
be routinely checked to see how many
domains are present.
Conserved Domain Architecture Retrieval
Tool (CDART) uses information in Pfam and
SMART to assign domains along a sequence
E.g. NP_002917 shows similarity to G-protein
regulators:
1.
2.
3.
4.
5.
Understanding Protein Structure
A Quick Overview of Sequence Analysis
Finding a Structural Homologue
Template Selection
Aligning the Query Sequence to
Template Structure(s)
6. Building the Model
Structural Homologues
BLASTP vs. PDB database or PSI-BLAST:
look for 4-character PDB ID
E < 0.005
Domain coverage: at least 60% coverage
is recommended
Gaps: we don’t want them. Choose
between:
few gaps and reasonable similarity scores or
lots of gaps and high similarity scores?
Small Proteins: Disulfide bonds
BLAST-type methods may not locate
homologues, if Conserved Domain search is
not turned on.
gnl|Pfam|pfam00095, wap, WAP-type (Whey Acidic Protein)
four-disulfide core'.
CD-Length = 46 residues, 100.0% aligned
Score = 43.9 bits (102), Expect = 1e-06
Q:49 KAGFCPWNLLQMISSTGPCPMKIECSSDRECSGNMKCCNVDCVMTCTPP 97
D: 1 KPGVCPWVSISE---AGQCLELNPCQSDEECPGNKKCCPGSCGMSCLTP 46
Are the Cys residues conserved?
Gaps: where are they on the structure?
Metal-binding domains
C2H2 Zinc Finger
2 Cys & 2 His binding to
Zinc
Not detected even by CDsearch in BLAST
Detected by Pfam &
SMART
Sequence Pattern:
#-X-C-X(1-5)-C-X3-#-X5-#X2-H-X(3-6)-[H/C]
Structure Prediction Methods
Secondary Structure Prediction: identify local
structural elements such as helices, strands
and loops.
> 75% accuracy achievable
PredictProtein or PHD
http://cubic.bioc.columbia.edu/pp/
PSIPRED
http://bioinf.cs.ucl.ac.uk/psipred/
SSPro
http://promoter.ics.uci.edu/BRNN-PRED/
Folds from Secondary
Structure Predictions
Assembling SSEs into folds is a combinatorial
problem
Current methods depend on available
structural data for mapping predictions:
FORREST
http://abs.cit.nih.gov/foresst/foresst.html
TOPITS from the PHD server
http://cubic.bioc.columbia.edu/pp
Tertiary Structure Prediction
Fold recognition/Threading: < 20% identity
typically
Best results obtained by combining several
database search and knowledge-based tools:
3D-PSSM
http://www.sbg.bio.ic.ac.uk/~3dpssm/
FUGUE
http://www-cryst.bioc.cam.ac.uk/fugue/
1.
2.
3.
4.
5.
Understanding Protein Structure
A Quick Overview of Sequence Analysis
Finding a Structural Homologue
Template Selection
Aligning the Query Sequence to
Template Structure(s)
6. Building the Model
One or many templates?
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
Sequence similarity: extract template
sequences and align with query: select
the most similar structure
Completeness: Missing data?
465
465
465
465
465
465
465
465
470
470
470
470
MISSING RESIDUES
THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE
EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN
IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.)
M RES
MET
THR
M RES
GLU
GLU
GLU
C SSSEQI
A 1
A 230
CSSEQI ATOMS
A 5 OE2
A 6 CG CD OE1 OE2
A 17 OE1
One or many templates?
X-ray or NMR?:
Lowest resolution X-ray structure
X-ray and then NMR
NMR average over assembly
One or many?:
Structure alignment of Ca atoms
If 2 templates are very close, keep only one
Keep templates that provide new information
Many templates
Sequence alignment from structure
comparison of templates (SSA) can be
different from a simple sequence
alignment (SA).
For model building,
1. align templates structurally
2. extract the corresponding SSA
1.
2.
3.
4.
5.
Understanding Protein Structure
A Quick Overview of Sequence Analysis
Finding a Structural Homologue
Template Selection
Aligning the Query Sequence to the
Template Structure(s)
6. Building the Model
Query - Template Alignment
>40% identity: any alignment method is OK
Below this, checks are essential.
Collect close sequence homologues (about 10)
and align to query to get MSA (multiple sequence
alignment)
Collect several structural templates (at least 5)
and align them using structure comparison
methods: extract the SSA (structural sequence
alignment)
Align MSA to SSA using profile alignment
Extract query and selected template(s) from the
final alignment – QTA.
QTA Checks
Residue conservation checks
Indels
Functional regions
Patterns/motifs conserved?
Combine gaps separated by few residues
Editing the alignment
Move gaps from secondary structures to
loops
Within loops, move gaps to loop ends, i.e.
turnaround point of backbone
QTA Checks
Residue conservation checks
Indels
Functional regions
Patterns/motifs conserved?
Combine gaps separated by few residues
Editing the alignment
Move gaps from secondary structures to
loops
Within loops, move gaps to loop ends, i.e.
turnaround point of backbone
Visual Inspection of Indels
2-residue
deletion from
sequence
alignment
End-of-loop
2-residue
deletion
1.
2.
3.
4.
5.
Understanding Protein Structure
A Quick Overview of Sequence Analysis
Finding a Structural Homologue
Template Selection
Aligning the Query Sequence to
Template Structure(s)
6. Building the Model
Input for Model Building
Query sequence
Template structure
Template sequence
Query-template sequence alignment
1.
Methods Available
WHATIF (Vriend G, 1990) : "
High quality models where template is
available
Indels not modelled
Side chain rotamers
In silico mutations
In silico disulfide bond creation
Methods Available
2. SWISS-MODEL (Schwede et al, 2003) : "
Automatic modeling mode with multiple
templates
Query + template input
High Homology situations
DeepView for input file creation
Methods Available
3. MODELLER (Sali & Blundell, 1993) :
High quality models
Sequence alignment
Structure analysis/alignment
Multiple templates
Multiple chains
Ligand/cofactor present
4. ESyPred3D (uses MODELLER): "
QTAs from several methods & neural networks
http://www.fundp.ac.be/urbm/bioinfo/esypred/
Methods Available
5. ICM (Ruben et al, 1994) :
High quality models
Loop modelling
Multiple templates not possible
Sequence/Structure alignment/analysis
Ab initio peptide modeling
Secondary structure prediction
6. Geno3D (Combet et al, 2002) : "
Automated modelling
Distance geometry used for loops
http://geno
7.
Methods Available
3D-JIGSAW (Bates et al, 2001) : "
Automatic modeling mode
Interactive user mode to select templates
Multiple templates
Multidomain protein modeling
Methods Available
8. CPH-MODELS (Lund et al, 1997) : "
Fully automated
FASTA search for templates
Not validated
Automatic or Manual Mode?
Automatic: High homology
Manual
Medium/Low homology
Template from structure prediction
Multiple templates
Multiple chains
Ligand present
How good is the model?
Structural Quality Analysis
PROCHECK (Laskowski et al, 1993) :
WHATIF (Vriend G, 1990) : "
ERRAT (Colovos & Yeates, 1993) :
"
Improving ill-defined regions
Iterative model building
Rebuild or anneal bad regions
Check/edit alignment and rebuild
Molecular dynamics and/or Monte Carlo
simulations
Compute intensive
Input files need to be set up
Optional
Molecular Modeling Protocol
Resources required
The query sequence
Personal
computer
with
internet
connectivity
RASMOL/DeepView
for PDB structure
visualization
CLUSTALX sequence alignment software
Access to a UNIX workstation
MODELLER/ICM – UNIX software
WHATIF – UNIX/PC software
PROCHECK – UNIX or Windows software
MM Protocol – Input Files
Minimum requirement
Query sequence
Template structure
Template sequence
Query-Template alignment
Ex 1. High Homology Case
Human SOX9 WT - homologous to
SRY (PDB: 1HRY) - 49% identity
S9WT:
SOXCORE:
..AGAACAATGG..
..GCAACAATCT..
highest
least
Mutants (campomelic dysplasia):
F12L: No DNA binding
H65Y: Minimal binding
P70R: altered specificity; no SOXCORE
A19V: near WT but normal binding
1. SOX9 Models
WT & P70R
models built
Ca overlay:
WT-SRY ~ 0.72 Å
SOX9 & P70R
J. Biol. Chem. 274
based
on
SRY
(1999) 24023
Ex 1. SOX9
+ DNA Models
SOX9-WT
SRY
SOX9-P70R
Observed disease-linked
mutations mapped
Other residues in DNAbinding groove
determined
Ex 2. Low Homology Situation
Pigments from reef-building corals:
similar to Pocilloporin
fluoresce under UV and visible
radiation
similar to the Green Fluorescent
Protein - GFP (19.6% identity)
contain ‘QYG’ instead of ‘SYG’ in
GFP, as proposed fluorophore
2. Alignment of POC4 & GFP
2. POC4 Model
Barrel ends open
C-ter not included
b-sheet OK
‘QYG’ fits the site!
26 residues within
5Å of QYG (only
19 in GFP)
Increased thermal
stability
UV protection
Ex 3. Small Disulfide-bonded
Protein: Complement Factor H
20 tandem homologous units =
SCRs (short consensus repeat)
or “sushi” regions
Each SCR is ~ 60 aa:
conserved Y, P, G
2 disulfide bridges:1-3 & 2-4
Linkers of 3-8 aa
Heparin binding SCRs: 7 (high
affinity) & 20
Previous SCRs required for
activity: minimum constructs are
fH67 and fH18-20
C-ter
C2
C4
HV
loop
C3
C1
N-ter
3. Sequence Alignment of Close
Functional Homologues
Site A
Site d Site B
Site c
hfH
385 C L R K C Y F P Y L E N G Y N Q N H G R K F V Q G K S I D V A
fHR-3
83
bfH
292 C L R Q C I F N Y L E N G H N Q H R E E K Y L Q G E T V R V H
mfH
385 C V R K C V F H Y V E N G D S A Y W E K V Y V Q G Q S L K V Q
Consensus
C L RKC Y F P Y L E NG YNQN YGRK F VQGN S T E V A
* : * : *
*
* : * * *
.
.
: : * * : :
*
hfH
416 C H P G Y A L P K A - Q T T V T C M E N G W S P T P R C I R 444
fHR-3
114 C H P G Y G L P K V R Q T T V T C T E N G W S P T P R C I R 143
bfH
323 C Y E G Y S L Q N D - Q N T M T C T E S G W S P P P R C I R 351
mfH
416 C Y N G Y S L Q N G - Q D T M T C T E N G W S P P P K C I R 444
Consensus
* :
* * . *
:
*
* : * *
* . * * * * . * : * * *
3. Templates for fH SCRs 6-7
hfH SCRs 15&16 (fH1516; PDB ID:
1HFH)
Vaccinia virus complement control
protein domains 3&4 (vcp34; PDB ID:
1VVC)
hfH15
vcp3
vcp3
hfH15
Orientations differ considerably
Vcp34 28% identical to hfH67 compared
to hfH1516 (25%) !
3. Query-Templates alignment
3. hfH67 model
Sialic acid
Heparin
hfH67
hfH1516
disaccharide
repeat
3. Locating residues for
mutation from model
Lys-410
SCRs 6&7
Lys-388
Arg-387
Lys-405
His
-402
SCRs 15&16
Arg-404
Pacific Symposium of Biocomputing 2000, 5:155
Ex 4: Protein Engineering
Thermolysin-like
protease unstable at
high temperatures
(> 40 ºC unlike trypsin)
Homology Model built
G8 & N60 suited for
disulfide bond
Double Mutant
functional at 92.5 ºC
J Mansfeld et al. Extreme Stabilization of a Thermolysin-like Protease by
an Engineered Disulfide Bond” J. Biol. Chem. 1997 272: 11152-11156.
Ex 5. Multiple chains: Human
Hand, Foot & Mouth Disease
Virus capsid 2000 outbreak of
HFMD in Singapore:
thousands of
children affected – 4
deaths (The Lancet,
2000, 356, 1338)
Major etiological
agent: EV71
(enterovirus group)
Neurological
complications
5. EV71 genome structure
Capsid
VP4 VP2
VPg
VP3
Replication
VP1
2A 2B
5’ AUG
2C
3A,B 3C
3D
PolyA
3’ UAG
Pan-enterovirus primers EV71 specific primers
95/94% (RNA) homology coxsackievirus A16/B3
Only 1% difference between neurovirulent and
non-neuro virulent isolates
Most variations in non-capsid regions
Within capsid regions, VP1 shows maximum
variability relative to other Evs
Differences in capsid region 1: VP1 & VP2, 2: VP3
5. Picornaviridae
Icosahedral
Capsid
5. Template hunt
BLASTP against PDB sequences
VP1: 3 templates
1BEV 38.7% (bovine enterovirus)
1EAH 36.5% (poliovirus type 2 strain Lansing)
1FPN 38.0% (human rhinovirus serotype 2)
VP2: 1BEV 56.7%
VP3: 1BEV 54.9%
VP4: 1BEV* 50.0%
5. Fixing the VP1 Alignment
Structural alignment of templates: using
VAST (Gibrat, Madej, & Bryant, 1996)
Extract corresponding sequence
alignment
Match HFMDV VP1 to aligned
templates using profile alignment in
CLUSTALW
5. VP1 alignment to templates
3,10-helices a-helices
VP1
1BEV1
1EAH1
1FPN1
EV711
14
24
15
23
Q
A
A
L
A
P
G
A
A
A
N
L
P
L
N
V
T
V
L
V
G
A
P
P
Q
G
D
N
N
.
T
T
I
T
S
Q
N
Q
.
T
S
S
V
S
S
S
S
*
T
G
N
S
H
P
P
H
S
A
T
R
V
H
T
L
T
T
D
K
N
G
S
E
S
E
.
1BEV1
1EAH1
1FPN1
EV711
84
90
80
93
P
I
S
I
L
E
K
D
L
V
L
L
A
D
E
P
T
N
V
L
D
T
E
L
-
A
G
N
T
Y
T
N
N
G
S
K
P
T
K
E
N
S
L
N
G
I
F
F
Y
:
T
S
T
A
:
HWR
V WK
V WA
NWD
*
I
I
I
I
*
DF R E
TY K D
NL Q E
D I TG
1BEV1
1EAH1
1FPN1
EV711
147
161
147
157
P
P
P
P
*
P
P
P
P
*
G
G
G
G
*
A
A
A
A
*
P
P
P
P
*
V
I
V
K
P
P
P
P
*
S
G
N
E
NQD
K WN
S RD
S R E
.
:
SF
DY
DY
SL
.
QWQ
TWQ
AWQ
AWQ
* *
S
T
S
T
:
G
S
G
A
.
C
S
T
T
N
N
N
N
*
P
P
A
P
.
S
S
S
S
*
V
V
V
V
*
1BEV1
1EAH1
1FPN1
EV711
211
231
207
224
I
A
T
A
LP S N
A S L N
A N TN
CP N N
*
F L G
D FG
N MG
MMG
: *
FM
S L
S L
T F
:
F
V
S
V
T
V
I
T
D
D
E
S
.
H
K
S
N
H
-
P
I
K
A
T
H
S
A
K
K
K
H
L
V
Y
1BEV1
1EAH1
1FPN1
EV711
275
296
277
291
R
K
R
R
:
A
P
T
S
T
T
T
:
S
G
I
A
L
L
I
I
:
T
T
T
T
*
Y
Y
A
-
281
301
283
296
Y
A
C
S
R
R
R
R
*
L
V
V
V
:
E
N
T
G
A
S
S
D
T
T
A
V
.
P
P
P
P
*
A
A
A
A
*
L
L
L
L
*
Q
T
D
Q
A
V
A
A
.
E
E
E
E
*
T
T
T
I
G
G
G
G
*
QL
QL
Q I
QM
* :
R
R
R
R
*
A
R
R
R
FAD
FY T
FWQ
FV K
*
T
Y
H
L
D
G
G
T
G
A
Q
D
Q
T
H
P
F
I
T
V
R
R
R
R
*
I
V
I
I
:
FV
TV
MA
Y A
.
V
S
I
L
R
K
M
V
A
A
A
A
*
A
A
H
A
T
T
T
S
:
S
N
S
S
.
T
P
S
N
A
L
V
T
RD ESM
VP S D T
QP EDV
S D ESM
. .
KM
KL
K F
KV
* .
SW
E F
E L
E L
.
F
F
F
F
*
T
T
T
T
*
Y
Y
Y
Y
*
M
S
T
M
P
P
A
P
.
P
P
Y
P
A
A
P
A
.
Q
R
R
Q
:
F
I
F
V
.
S
S
S
S
*
V
V
L
V
:
P
P
P
P
*
Y
Y
Y
Y
*
A
M
H
M
K I
KP
KA
RM
:
K
K
K
K
*
H
H
H
H
*
T
V
V
V
.
S
R
K
R
b-strands
I
V
I
I
:
E
Q
E
E
:
T
T
T
T
*
R
R
R
R
*
T
H
Y
C
I
V
V
V
:
V
I
Q
L
P
Q
T
N
T
K
S
S
.
H
R
Q
H
:
G
T
T
S
I H
RS
RD
TA
E T
ES
EM
E T
*
S
T
S
T
:
V
V
L
L
:
E
E
E
D
:
S
S
S
S
*
F
F
F
F
*
F
F
L
F
:
G
A
G
S
.
R
R
R
R
*
S
G
S
A
.
S
A
G
G
.
L
C
C
L
V
V
I
V
:
G
A
H
G
M
I
E
E
DV
DM
DS
DA
*
E
E
E
E
*
F
F
I
F
:
T
T
T
T
*
I
F
L
F
:
I
V
V
V
:
A
V
P
A
T
T
C
C
S
S
I
T
S
N
S
P
Y
A
-
T
T
L
-
G
D
-
Q
A
-
N
N
-
V
N
S
T
T
G
Q
G
T
H
D
E
E
A
I
V
Q
L
G
V
H
N
H
P
T
Q
I
Q
T
V
T
L
Y Q
Y Q
MQ
L Q
*
V
I
Y
Y
MY
MY
MY
M F
* :
V
I
V
V
:
FMS
Y V G
F L S
F MS
: : .
S
I
V
P
A
A
A
A
*
N
N
S
S
.
A
A
A
A
*
Y
Y
Y
Y
*
S TV
S H F
Y MF
QW F
.
Y
Y
Y
Y
*
D
D
D
D
*
G
G
G
G
*
Y
F
Y
Y
:
A
A
D
P
R
K
E
T
F
V
F
M
P
G
L
-
A
-
G
-
D
Q
E
T
A
H
S
K
T
Q
E
E
DP D R
GDS L
QDQN
K D L E
Y
Y
Y
Y
*
G
G
G
G
*
CW I
V WC
AWC
AW I
*
P
P
P
P
*
R
R
R
R
*
A
P
P
P
.
PR
PR
PR
MR
*
Q
A
A
N
Y
Y
Y
Y
*
K
Y
T
L
K
G
R
F
R
P
A
K
Y
H
A
N
G
R
N
L
V
T
P
V
D
N
N
F
Y
F
Y
:
S
K
K
A
I
-
E
-
G
D
D
G
.
R
N
D
G
S
S
.
S
L
I
I
D
A
Q
K
S
P
V
T
N
G
T
S
R
R
R
R
*
F
F
F
F
*
A
V
L
Q
P
P
E
N
Pocket-factor
binding
residues
R
P
T
P
I
A
T
C
L
I
G
5. Model building steps
Build all 4 capsid proteins (VP1-VP4)
together to ensure 3D fit
Use 1BEV alone for VP2-VP4
For VP1: use aligned 1BEV, 1EAH,
1FPN
Check model
5. Round 1: VP1,VP2,VP3,VP4
Clip hanging
ends
Re-position
problem loops:
adjust
gaps in
alignment
Build again
5. Round 2: Pentamer Check
Loops look OK
Build pentamer
Publish….
Oops: clash in
pentamer
assembly. Go
back
5. Close encounters of the 3rd Kind
Build only VP3
pentamer
N-terminus of
each VP3
hydrogenbonded
Also, in BEV,
Asp-Lys ion
pair
First 25 aa
overlay v. well
5. Fourth foray: build with 5 VP3s
Only first 50aa of the
other 4 VP3s
included
Model resulted in
knots due to
insufficient refinement
cycles
However, VP3
pentameric region OK
5. Fifth
and final
attempt
5. Canyon Pit and Antigenic Sites
Poliovirus
sites
Neurovirulent
Polio (mouse)
Cardiovirus
5. Putative antigenic sites
VP1
VP2
5. HFMDV Conclusions
Unique surface loops identified for
Immunodiagnostic assays
Vaccine design
Antibodies being generated
Canyon pit: depth is similar to BEV
Mapping the antigenic regions of other related
enteroviruses on the HFMDV surface: specific VP1
and VP2 sites buried
Sunita Singh, Vincent T. K. Chow, C. L. Poh, M. C.
Phoon: Dept. of Microbiology, NUS
Applied Bioinformatics, Vol 1, issue 1, 43-52: invited
research article