Lund_Apr04 - CBS - Technical University of Denmark

Download Report

Transcript Lund_Apr04 - CBS - Technical University of Denmark

Immunological bioinformatics
Ole Lund,
Center for Biological
Sequence Analysis
(CBS)
Denmark.
World-wide Spread of SARS
Status as of July 11, 2003: 8437 Infected, 813 Dead
SARS



First severe infectious disease to emerge in the post-genomic era
Modern societies are vulnerable to epidemics
Classical containment strategies has been successful in
controlling the epidemic, but
–
–



SARS may resurface (e.g. be seasonal)
Suggested existence of an animal reservoir could compromise the
containment strategy
Need to develop a vaccine strategy
Biotechnology has provided new tools to analyze
genome/proteome information and guide vaccine development.
The causative virus, the SARS corona virus (SARS CoV), has
been isolated and full-length sequenced.
Main scientific achievements




Discovery of causative agent
Genome(s)
3D Structure of main
proteinase
Origin
–
Similar virus found in from
Himalayan palm civets and other
animals, including a raccoondog, and in humans working at
an animal market in Guangdong,
China (Guan et al., Sep 4, 2003).
Himalayan (Masked) palm civet
Ferret-Badger
Raccoon-dog
http://biobase.dk/~david-c/uk-dk-mammmal-list.htm
Source: Michael Buchmeier, Beijing June, 2003
New corona viruses
1978 Porcine Epidemic diarrhea virus (PEDV)
Probably from humans
1984 Porcine Respiratory Coronavirus
1987 Porcine Reproductive and Respiratory Syndrome
(PRRS)
1993 Bovine corona virus
2003 SARS
Will it be back?

When?
–
–
–
–

Every year?, Like the flu.
Every few years? Like measles used to.
Sporadic? Like Ebola
Never?
Lab safety: The patient, a 27-year-old virologist,
worked on the West Nile virus in a biosafety level
3 lab at the Environmental Health Institute,
where the SARS coronavirus was also studied
(Enserink, 2003)
How does the immune system “see” a virus?
The immune system

The innate immune system
–
–
–

Found in animals and plants
Fast response
Complement, Toll like receptors
The adaptive Immune system
–
–
–
Found in vertebrates
Stronger response 2nd time
B lymphocytes


–
Produce antibodies (Abs) recognizes 3D shapes
Neutralize virus/bacteria outside cells
T lymphocytes

Cytotoxic T lymphocytes (CTLs) - MHC class I
–
–

Recognize foreign protein sequences in infected cells
Kill infected cells
Helper T lymphocytes (HTLs) - MHC class II
–
–
Recognize foreign protein sequences presented by immune cells
Activates cells
Weight matrices
(Hidden Markov models)
YMNGTMSQV
GILGFVFTL
ALWGFFPVV
ILKEPVHGV
ILGFVFTLT
LLFGYPVYV
GLSPTVWLS
WLSLLVPFV
FLPSDFFPS
CVGGLLTMV
FIAGNSAYE
A2 Logo
Protein sequence information content
Entropy

–
–
–
Average Uncertainty in the random variable
H = -Spilog2pi
range: 0 to log2(20) = 4.3
Logo height I = log2(20) + H
Relative entropy (Kullback Leibler distance)

–
D = Spilog2(pi/qi)
range: 0 to infinity
Mutual information

–
–
Reduction in uncertainty due to knowledge of another random
variable (corresponds to correlation)
M = SSpijlog2(pij/pipj)
Prediction of MHC binding specificity
Simple Motifs

–
Allowed (non allowed) amino acids
Extended motifs

–
Amino acid preferences
Structural models

–
Limitations: precision of force field, and speed
of calculations
Neural networks

–
Can take correlations into account
Log odds ratios




Used for scoring Alignments (BLAST), HMMs,
Matrix methods
Odds ratio of observing given amino acids
–
Relative probability of observing amino acid i
in motif position j
–
Oj = p(aai at pos j)/p(aai)
Assumption of independence =>
–
Odds for observing sequence = O1O2 … On
Log odds ratio
–
LO = log(O1O2 … On) =
log(O1)+log(O2)+…log(On)
–
LO in half bits = 2 LO/log(2)
G
F
C
A
Evaluation of prediction accuracy
Coverage = TP/actual_positive
Reliability = TP/predicted_positive
A*1101 performance
154 peptides, 9 Binders
95% Reliability
1
0.8
0.8
0.6
0.6
0.4
0.2
0.43
0.33
0.18
0.11
0
Correlation
1
0.44
0.4
0.2
0
Prediction method
0 0 0
Prediction method
Pearson correlation coefficient
1
Coverage
True positive ratio
50% Coverage
0.76
0.8
0.6
0.4
0.450.5
0.35
0.2
0
Prediction method
SYFPEITHI
Bimas
HMM
NN
SYFPEITHI
Bimas
HMM
NN
SYFPEITHI
Bimas
HMM
NN
The MHC gene region
From Bill Paul, ”Fundamental Immunology”, 4th Ed
Human Leukocyte antigen (HLA=MHC in
humans) polymorphism - alleles
A total of
229 HLA-A
464 HLA-B
111 HLA-C
class I alleles have been named,
a total of
2 HLA-DRA, 364 HLA-DRB
22 HLA-DQA1, 48 HLA-DQB1
20 HLA-DPA1, 96 HLA-DPB1
class II sequences have also been assigned.
As of October 2001 (http://www.anthonynolan.com/HIG/index.html)
HLA polymorphism - supertypes
•Each HLA molecule within a supertype essentially
binds the same peptides
•Nine major HLA class I supertypes have been
defined
•HLA-A1, A2, A3, A24,B7, B27, B44, B58, B62
Sette et al, Immunogenetics (1999) 50:201-212
HLA polymorphism - frequencies
Supertypes
Phenotype frequencies
Caucasian
Black
Japanese
Chinese Hispanic
Average
A2,A3, B27
83 %
86 %
88 %
88 %
86 %
86%
+A1, A24, B44
100 %
98 %
100 %
100 %
99 %
99 %
+B7, B58, B62
100 %
100 %
100 %
100 %
100 %
100 %
Sette et al, Immunogenetics (1999) 50:201-212
Conclutions

We suggest to
–
–
–
–
split some of the alleles in the A1 supertype into a
new A26 supertype
split some of the alleles in the B27 supertype into a
new B39 supertype.
the B8 alleles may define their own supertype
The specificities of the class II molecules can be
clustered into nine classes, which only partly
correspond to the serological classification
Lund O, Nielsen M, Kesmir C, Petersen AG, Lundegaard C, Worning P, Sylvester-Hvid C, Lamberth K, Roder G,
Justesen S, Buus S, Brunak S. Definition of supertypes for HLA molecules using clustering of specificity
matrices. Immunogenetics. 2004 Feb 13 [Epub ahead of print]
MHC class I binding of SARS
peptides

Predictions for all supertypes
–

Allele specific neural networks
–
–

Broad population coverage
Peptides with associated measured binding affinity
A1 (A0101), A2 (A0204), A3 (A1101+A0301), B7 (B0702)
Weight matrices
–
–
Peptides from public databases (Sypfeithi, MHCpep)
A24, B27, B44, B58 and B62
Super type weight matrices
B27
B44
B58
B62
Proteasomal cleavage
Epitope predictions



Binding to MHC class I
High probability for C-terminal proteasomal
cleavage
No sequence variation
Inside out:
1.
Position in RNA
2.
Translated regions (blue)
3.
Observed variable spots
4.
Predicted proteasomal cleavage
5.
Predicted A1 epitopes
6.
Predicted A*0204 epitopes
7.
Predicted A*1101 epitopes
8.
Predicted A24 epitopes
9.
Predicted B7 epitopes
10. Predicted B27 epitopes
11. Predicted B44 epitopes
12. Predicted B58 epitopes
13. Predicted B62 epitopes
Strategy for the
quantitative ELISA assay
C. Sylvester-Hvid, et al., Tissue antigens, 2002: 59:251
• Step I: Folding of MHC class I molecules in solution
b2m
Heavy chain
peptide
Incubation
Peptide-MHC
complex
• Step II: Detection of de novo folded MHC class I molecules by ELISA
Development
Christina Sylvester-Hvid, University of Copenhagen , July, 2003
Summery of peptide binding assays
A1
A2
A3
A24
B7
B27
B44
B58
B62
#tested
15
15
15
0
15
13
0
15
14
#binding <500nM
13
12
14
10
2
13
12
Initial polytope
(19 HIV epitopes)
• New epitopes 12
• Poor C-term
cleavage 8
• Cleavage within
31
• Linker length 12
Optimized polytope
• New epitopes 1
• Weak C-term
cleavage 3
• Cleavage within 7
• Linker length 37
MHC class II
Molecule
Virtual matrices

HLA-DR molecules sharing the same pocket amino
acid pattern, are asumed to have identical amino
acid binding preferences.
MHC Class II binding

Virtual matrices
–
–


TEPITOPE: Hammer, J., Current Opinion in Immunology 7, 263-269, 1995,
PROPRED: Singh H, Raghava GP Bioinformatics 2001 Dec;17(12):1236-7
Web interface
http://www.imtech.res.in/raghava/propred
Prediction Results
MHC class II prediction

Complexity of problem
–
–


Peptides of different
length
Weak motif signal
Alignment crucial
Gibbs Monte Carlo
sampler
RFFGGDRGAPKRG
YLDPLIRGLLARPAKLQV
KPGQPPRLLIYDASNRATGIPA
GSLFVYNITTNKYKAFLDKQ
SALLSSDITASVNCAK
PKYVHQNTLKLAT
GFKGEQGPKGEP
DVFKELKVHHANENI
SRYWAIRTRSGGI
TYSTNEIDLQLSQEDGQTIE
Class II binding motif
Random
ClustalW
Alignment by Gibbs sampler
RFFGGDRGAPKRG
YLDPLIRGLLARPAKLQV
KPGQPPRLLIYDASNRATGIPA
GSLFVYNITTNKYKAFLDKQ
SALLSSDITASVNCAK
PKYVHQNTLKLAT
GFKGEQGPKGEP
DVFKELKVHHANENI
SRYWAIRTRSGGI
TYSTNEIDLQLSQEDGQTI
Gibbs sampler
Tepitope
Gibbs
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Accuracy
MHC class II predictions
Allele DRB1_0401
H
H
e
Cb
uk d
el oo
G hw 8
ut ch
So e n 7
Cb ch
H
M en
Cb ch6
H
M en
Cb ch5
H
M en 4
Cb ch
H
M en
Cb ch3
H
M en
Cb ch2
H
M en 1
Cb ch
n
M
M
Polytope construction
Linker
NH2 M
Epitope
COOH
C-terminal cleavage
Cleavage within epitopes
cleavage
New epitopes
Prediction of Antibody epitopes

Linear
–
Hydrophilicity scales (average in ~7 window)



–
Other scales & combinations



Pellequer and van Regenmortel
Alix
Discontinuous
–

Hoop and Woods (1981)
Kyte and Doolittle (1982)
Parker et al. (1986)
Protrusion (Novotny, Thornton, 1986)
Neural networks (In preparation)
Secondary structure in epitopes
Sec struct:
H
T
B
E
S
G
I
.
Log odds
ratio
-0.19
0.30
0.21
-0.27
0.24
-0.04
0.00
0.17
H:
G:
I:
E:
B:
S:
T:
.:
Alpha-helix (hydrogen bond from residue i to residue i+4)
310-helix (hydrogen bond from residue i to residue i+3)
Pi helix (hydrogen bond from residue i to residue i+5)
Extended strand
Beta bridge (one residue short strand)
Bend (five-residue bend centered at residue i)
H-bonded turn (3-turn, 4-turn or 5-turn)
Coil
Amino acids in epitopes
Fre
Amino
Acid
G
A
V
L
I
M
P
F
W
S
e/E
0.09
0.07
0.05
0.08
0.04
0.02
0.06
0.03
0.01
0.08
.
0.07
0.08
0.07
0.10
0.06
0.03
0.05
0.05
0.02
0.07
Amino
acid
C
T
Q
N
H
Y
E
D
K
R
e/E
0.03
0.08
0.04
0.04
0.02
0.04
0.06
0.07
0.07
0.04
.
0.03
0.06
0.04
0.05
0.02
0.03
0.04
0.04
0.05
0.04
Dihedral angles in epitopes
Z-scores for number of dihedral angle combinations in epitopes vs. non epitopes
Phi\Psi
1
2
3
4
5
6
7
8
9
10
11
12
1
-0.47
0.44
-0.58
0.45
0.46
0.00
0.00
-0.73
-0.79
0.00
-0.83
1.42
2
-0.01
-0.12
-1.82
0.52
1.75
0.00
0.00
0.00
1.42
-0.82
0.00
0.00
3
1.82
-2.26
-1.57
0.48
0.10
0.00
-0.77
0.45
1.77
0.00
-0.82
0.99
4
1.76
1.15
-0.34
0.75
0.00
0.00
0.97
0.16
0.38
1.03
0.00
0.00
5
-0.85
0.45
-1.09
0.57
0.00
0.00
0.00
0.13
1.52
0.00
1.02
-0.79
6
0.60
1.28
1.30
1.73
0.00
0.00
0.00
0.00
1.32
-0.89
-0.76
0.00
7
0.27
-0.91
1.67
-0.51
0.00
0.00
0.00
0.00
-1.02
-1.09
0.00
0.00
8
0.93
1.21
-0.23
-3.63
0.49
0.00
0.00
0.00
0.00
-0.19
0.31
-0.82
9
0.00
0.28
-0.67
0.33
0.01
-0.83
0.00
0.00
0.87
0.23
0.00
0.00
10
0.00
0.95
1.71
-0.70
0.00
0.00
0.00
1.29
1.08
0.00
1.00
0.00
11
0.00
0.00
1.02
0.00
0.00
0.00
0.00
0.86
-0.75
0.00
0.00
0.00
12
0.42
0.83
0.28
1.68
0.00
0.00
0.00
0.00
1.03
-0.21
-0.79
0.93
Immunological bioinformatics

Classical experimental research
–
–

New experimental methods
–
–
–

Few data points
Data recorded by pencil and
paper/spreadsheet
Sequencing
DNA arrays
Proteomics
Need to develop new methods for
handling these large data sets

Immunological Bioinformatics/Immunoinformatics
Acknowledgements
CBS, Technical University of Denmark
Søren Brunak (Director of CBS)
Morten Nielsen (Epitope prediction)
Peder Worning (Genome atlases)
Claus Lundegaard (Data bases)
Mette Børgesen (CTL prediction)
Jesper Schantz (Polytope optimization)
IMMI, University of Copenhagen
Søren Buus (Professor)
Christina Sylvester-Hvid (Experimental coordinator)
Kasper Lamberth (Peptide bank, Quality control)
Erland Johansson, Jeanette Nielsen (Preparations of peptides)
Hanne Møller (ELISA binding assay)