Transcript Peptibase
1
Motivation and Goals
HIV:
HBV:
Many failed attempts have been made to develop
vaccines against HIV.
This is due to the rapid mutation rate which enables the
virus to evade immune recognition.
The HBV genome has overlapping CDS’s.
Analyzing the implications of mutations affecting
overlapping genes (with regard to HBV’s evolution and
its interaction with the immune system) may also help
us learn about similar viruses with overlapping genes
(e.g. HPV).
Our goal is to characterize changes and viral
trade-off preferences in the epitope distribution
along the CDS of different proteins.
2
Background – Antigen Presentation
An epitope is the part of the
antigen that is recognized by
the immune system.
Proteasome – degrades
proteins within the cell into
peptides about 9 AA long.
TAP – delivers cytosolic
peptides into the ER, where
they bind to MHC class I
molecules.
MHC-I – found in every
nucleated cell, function being to
display fragments of proteins
from within the cell to T-cells.
3
Background - HIV
HIV attacks CD4+ cells. The spikes on the surface of the
virus particle stick to the CD4 and allow the viral
envelope to fuse with the cell membrane.
Leaving the envelope behind after fusion, the viral RT
converts its RNA genome into DNA. It is then
transported to the cell nucleus, and is spliced and
integrated into the human genome by the viral
integrase.
HIV provirus may lie dormant within a cell for a long
time. When the cell becomes activated, human as well
as HIV genes are transcribed using human enzymes.
Then the messenger RNA is transported outside the
nucleus, and is used as a blueprint for translation and
replication.
4
Background – HIV Genes
Regulatory
Proteins
Accessory
Proteins
Structural
Proteins
Tat – controls transactivation of all HIV proteins.
Rev – The differential regulator of expression of virus protein
genes.
Nef – negative regulator factor, retards HIV replication.
Vif – infectivity factor gene.
Vpr – undetermined function.
Vpu – required for efficient viral replication and release.
GAG – codes for various proteins necessary to protect the virus.
Has 3 parts: MA (matrix), CA (capsid), and NC (nucleocapsid).
POL – codes for the enzyme necessary for virus replication.
Has 3 parts: PR (protease), IN (endonuclease), and RT (reverse
transcriptase).
ENV – the envelope of the virus. Has two parts: SU (surface
envelope, gp120) and TM (transmembrane envelope, gp41).
5
Background - HBV
HBV is a small enveloped virus with partially double-stranded
circular DNA genome. It is the only member of the hepadnaviridae
family that infects human.
The HBV genome contains 4 main genes:
Core – encodes for the capsid protein.
Pol – encodes for a polymerase, with reverse transcriptase activity.
Surface – encodes for small, medium and large ER intermembrane proteins.
X – thought to have transcription regulation activity.
The HBV genome has 4
ORF’s –
the entire Surface protein,
the C-terminus of Core and
the N-terminus of X
overlap with Polymerase.
6
Background – Viral Epitopes
Previous works have shown that HIV tends
to decrease the number of epitopes in
regulatory proteins, which predominate in
the initial stages of replication.
On the other hand, in HBV, the protein
copy number more than the expression
time seems to affect the epitope density.
7
HLA Polymorphism
The advantage in a mutation that removes an
epitope is usually lost when the virus transfers to
a new host with different HLA alleles.
Therefore, we expect a high turnover of
mutations in potential epitopes in the new host
during the transfer.
Mutations affecting the cleavage sites (flanking
regions) are not dependent on the HLA allele and
will therefore provide the virus with this
advantage, also in the new host.
8
Algorithm
Multiple Sequence Alignment
Phylogenetic Tree
PreProcessing of Sequences
DNA-based Mutation Positioning Within the AA Sequences
Translation of DNA Sequences
Peptibase
Mutation Characterization for all Alleles
9
MSA and Phylogenetic Tree
MSA
Phylogenetic Tree
PreProcessing of Sequences
Mutation Positioning
Translation
Peptibase
Mutation Characterization
The input DNA sequences are
aligned using MUSCLE 3.6.
The sequences were retrieved
from the LANL HIV Database.
A genetically distant ‘Outgroup’ sequence is added
to properly position the root of the tree and
reconstruct the ancestral sequences.
The ‘Outgroup’ sequence for the HIV dataset was
selected from SIV.
10
MSA and Phylogenetic Tree
MSA
Phylogenetic Tree
PreProcessing of Sequences
Mutation Positioning
Translation
Peptibase
Mutation Characterization
The alignment is used to
build a phylogenetic tree
using the Maximum
Parsimony method (Phylip
3.69).
The intermediate sequences
built by the program reflect
the changes that occurred
within the coding sequence
of the viral protein.
The phylogenetic tree shows
the epitope development of
the virus.
11
PreProcessing of Sequences
MSA
Phylogenetic Tree
PreProcessing of Sequences
Mutation Positioning
Translation
Peptibase
Mutation Characterization
The sequences reconstructed by the Phylip
program may contain ambiguous nucleotides.
These nucleotides are fixed from the bottom of
the tree upwards, in order to rely on the original
input sequences.
Reconstructed sequences containing an early
stop-codon remained in the tree, but were not
taken into account in the analysis.
12
DNA-based Mutation Positioning
Within the AA Sequences
MSA
Phylogenetic Tree
PreProcessing of Sequences
Mutation Positioning
Translation
Peptibase
Mutation Characterization
Mutations of each sequence with its direct
descendant were noted in the DNA level.
Each such mutation was then associated with the
matching amino acids in the translated
sequences.
Mutation: C A
Between: AA1 in father
AA1 in son
Mutation: G Between: AA2 in father
AA1 in son
13
Translation of DNA Sequences
and Upload to Peptibase
MSA
Phylogenetic Tree
PreProcessing of Sequences
Mutation Positioning
Translation
Peptibase
Mutation Characterization
All DNA sequences (input and
intermediate) were:
translated to AA’s.
uploaded to the Peptibase server.
The Peptibase server was developed by
our lab and is used to predict epitopes
within AA sequences.
The analysis performed in Peptibase is
conducted on the 31 most frequent HLA
alleles, taking into account the allele
frequency in the human population.
14
Peptibase
Given an AA sequence, Peptibase uses 3 cut-offs on a
9-mer AA sliding window to predict its epitopes:
MSA
Phylogenetic Tree
PreProcessing of Sequences
Mutation Positioning
Translation
Peptibase
Mutation Characterization
Cleavage by the Proteasome
Binding to TAP
Binding to MHC-I
For each 9-mer, cleavage,
TAP and MHC-I binding
scores are computed.
9-mers passing all three
stages are defined as
epitopes.
15
Mutation Characterization
MSA
Phylogenetic Tree
PreProcessing of Sequences
Mutation Positioning
Translation
Peptibase
Mutation Characterization
Some mutations in the nucleotide level
may either affect the resulting amino acid
(replacement) or not (silent).
We defined 9 types of replacement
mutations:
E2N
F2N
N2N
E2F
F2F
N2F
E2E
F2E
N2E
Epitope
PGRAFYATGEITGDIR
N
F
E
F
N
16
Mutation Characterization
The mutation type is based on the original affiliation
of the amino acid in the father sequence, and the new
affiliation within the son sequence (whether it
belonged to an epitope/flanking region or a nonepitope region).
MSA
Phylogenetic Tree
PreProcessing of Sequences
Mutation Positioning
Translation
Peptibase
Mutation Characterization
E2N
F2N
N2N
E2F
F2F
N2F
E2E
F2E
N2E
For example, an E2N mutation occurred in a
nucleotide which belonged to an epitope in the father
sequence, and resulted in the loss of this epitope in
the sons sequence.
17
Results – HIV (Full Balance)
Full Balance Per Nucleotide Affiliation
0.004
0.003
0.002
Full Balance
0.001
0
-0.001
env
gag
nef
pol
rev
tat
vif
vpr
vpu
Epitope
Flanking
Non-epitope
-0.002
-0.003
-0.004
-0.005
-0.006
Full Balance Calculation
Epitope:
Flanking:
Non-Epitope:
E2N + E2F – N2E – F2E
F2N + F2E – N2F – E2F
N2E + N2F – E2N – F2N
18
The results were normalized by the average length of the proteins.
Results – HIV (Full Balance)
In compliance with HLA polymorphism, all HIV
proteins clearly tend to eliminate flanking
regions.
For most proteins, the non-epitope balance is
approximately 0, except for Nef and Vpu which
accumulate epitopes more than others, and Rev
and Vpr which remove epitopes.
In the epitope balance, most proteins (again,
except for Rev and Vpr) create new epitopes
instead of removing them.
An interesting point to notice is the total balance
within epitope and flanking regions, where there
is a tendency to remove cleavage sites by adding
epitopes.
19
Results – HIV (Transition Balance)
Transition Balance for HIV Proteins
0.003
Mutation Frequency Difference
0.002
0.001
0
E2N-N2E
env
-0.001
gag
nef
pol
rev
tat
vif
vpr
vpu
E2F-F2E
F2N-N2F
-0.002
-0.003
-0.004
20
The results were normalized by the average length of the proteins.
Results – HIV (Full Balance)
All HIV proteins tend to remove flanking
regions, either completely or by creating a
new epitope.
Rev and Vpr prefer to eliminate existing
epitopes without creating new epitopes.
21
Results – HBV (R/S Ratio)
HBV proteins with multiple copies undergo selection
against epitope presentation.
Pol is expressed in low levels and does not go through
the same selection.
Epitope-reducing mutations in other proteins are at
the expense of causing replacement mutations in the
overlapping regions of Pol.
22
Results – HBV (R/S Ratio)
R/S is the ratio between the number of
replacement and silent mutations.
The R/S ratio is significantly higher in regions
with two reading frames, since there are few
mutations that are simultaneously silent in the
two reading frames.
23
Results – HBV (Epitope turnover)
Epitope Turnover
0.9
0.8
Overlapping (2 RF’s)
Non-overlapping (1 RF)
0.7
0.6
No. of mutations
affecting epitopes
per 1000 bp in each
father-son pair
0.5
0.4
0.3
0.2
0.1
0
Pol-I/Pol-II
c1/c2
x1/x2
Turnover Calculation:
E2N+N2E+N2F+F2N
s
24
Results – HBV (Epitope turnover)
The epitope turnover is the number of mutations per 1,000
nucleotides either adding or removing an epitope between a
father sequence and its son in the phylogenic tree.
In the non-overlapping regions of proteins C and X (one
reading frame), there is a higher turnover than in
overlapping regions.
In their overlapping regions (two reading frames), most
mutations are not allowed due to functional constraints.
Pol, which is expressed in low levels and does not tend to
remove epitopes, has a lower turnover in its nonoverlapping region. The higher turnover is seen in its
overlapping region, due to mutations meant to affect the
other genes.
25
Results – HBV
The number of mutations affecting the cleavage sites
was observed (epitope removing mutations per 1000
nucleotides in father-son pair in the phylogenetic tree).
The difference is significantly positive in practically all
regions.
Net Decrease in the Number of Cleavage Sites
F2N–N2F
26
Conclusions
In order for a virus to survive in the presence of a CTL
immune response, it must minimize the total number of
exposed epitopes.
In HIV and HBV, there is a clear tendency to remove
epitopes by eliminating cleavage sites. This may be the viral
solution against the HLA polymorphism.
In HBV, there is a strong selection on Core, Surface and X
to remove epitopes.
Core and X have an easier time mutating their nonoverlapping regions, since in the overlapping regions Pol is
also affected.
Pol, having a low copy number, doesn’t try to remove
epitopes and is therefore mainly affected in overlapping
regions.
27
Conclusions
HIV removes cleavage sites by creating new epitopes.
A possible explanation:
The selection occurs only on the patient’s HLA alleles.
The other alleles not present in the host do not go
through the same selection.
A mutation eliminating a cleavage site to avoid
epitope presentation in the specific HLA allele, may
create a new epitope in a different allele.
In HIV, Rev and Vpr remove epitopes while other
proteins actually accumulate them.
28
Open Questions & Future Goals
Research further the phenomenon of
cleavage site destruction producing new
epitopes rather than non-epitope
nucleotides.
Characterize the changes in the epitope
density of a single HIV patient with known
HLA serotyping.
…
29
Acknowledgements
Thank you to:
Prof. Yoram Louzoun, for the dedicated
guidance…
Kobi Maman and the whole lab, for all the
help…
Prof. Ron Unger
Dr. Rachel Levy Drummer
Ariel Azia Amitai
30
Bibliography
Jonathan W. Yewdell, Eric Reits & Jacques Neefjes. 2003. Making sense of
mass destruction: quantitating MHC class I antigen presentation. Nature
Reviews Immunology 3, 952-961.
Vider-Shalit, T., M. Almani, R. Sarid, and Y. Louzoun. 2009. The HIV hide
and seek game: an immunogenomic analysis of the HIV epitope repertoire.
AIDS 23:1311-8.
http://www.righto.com/theories/hiv_genes.html
http://www.avert.org/hiv-virus.htm
http://peptibase.cs.biu.ac.il/peptibase/
http://www.hiv.lanl.gov/content/index
31