Dating the Origin of the CCR5-Δ32 AIDS
Download
Report
Transcript Dating the Origin of the CCR5-Δ32 AIDS
Dating the Origin of the CCR5-Δ32
AIDS-Resistance Allele by the
Coalescence of Haplotypes
J. Claiborne Stephens, David E. Reich, David B. Goldstein, Hyoung Doo Shin,
Michael W. Smith, Mary Carrington, Cheryl Winkler, Gavin A. Huttley, Rando Allikmets,
Lynn Schriml, Bernard Gerrard, Michael Malasky, Maria D. Ramos, Susanne Morlot,
Maria Tzetis, Carole Oddoux, Francesco S. di Giovine, Georgios Nasioulas, David Chandler,
Michael Aseev, Matthew Hanson, Luba Kalaydjieva, Damjan Glavac, Paolo Gasparini,
E. Kanavakis, Mireille Claustres, Marios Kambouris, Harry Ostrer, Gordon Duff,
Vladislav Baranov, Hiljar Sibul, Andres Metspalu, David Goldman, Nick Martin,
David Duffy, Jorg Schmidtke, Xavier Estivill, Stephen J. O’Brien, and Michael Dean
American Journal of Human Genetics 62:1507–1515, 1998
Presented by: Chad Brock, Lisa Ellison, and Travis Hagey
Cell Communication
One way cells communicate is through receptors.
A chemokine receptor is a particular type of protein found in
the cell membrane, used by the cell to send and receive
chemical messages to/from other cells.
What HIV Does
The CCR5 gene produces the
CCR5 chemokine receptor that,
with CD4, serves as an entry port
for HIV-1 strains that infect white
blood cells.
HIV attaches to the CCR5 and
CD4 proteins of the macrophage
cell membrane, inserting it’s viral
DNA into the cell.
CCR5-Δ32
The CCR5-Δ32 mutation leads to
truncation and loss of the receptor on
lymphoid cells.
Homozygous individuals have nearly
complete resistance to HIV-1
infection despite repeated exposure.
Heterozygous individuals have
delayed onset of AIDS two to three
years longer than do CCR5-+/+
individuals
http://www.hivmirror.com/what_we_do.php
Frequency of the CCR5-Δ32
Allele in Defined Populations
38 ethnic populations including 4,166 individuals were tested for the CCR5-Δ32 allele (table 1).
CCR5-Δ32 deletion
High allele frequency among several
Caucasian populations
Rarity or absence in non-Caucasian
populations
Led to theory that mutation occurred only
once in ancestry of Caucasians, after they
migrated out of Africa
European Distribution of the
CCR5-Δ32 Variant
A north-to-south cline of
allele frequency is affirmed
as well as the absence of
CCR5-Δ32 among East
Asian, Middle Eastern, and
American Indian
populations.
http://biology.plosjournals.org/perlserv/?request=getdocument&doi=10.1371/journal.pbio.0030397
CCR5-Δ32 Loci on Chromosome 3
The time of origin of the CCR5-Δ32
mutation was estimated on the basis
of the persistence of a common
ancestral three-locus haplotype
among modern CCR5-Δ32-bearing
chromosomes.
This haplotype includes:
•CCR5-Δ32 (gene of interest)
•GAAT (microsatellite)
•AFMB (microsatellite)
CCR5 Haplotypes Observed in
Modern Caucasians
Of the people found to have the CCR5
mutation, 85% are of the haplotype
Δ32-197-215. Thus, the authors
suggest that this is the ancestral
haplotype.
The authors suspect that this haplotype
was elevated by natural selective
pressures.
The estimated time to a common
ancestor (time of origin of the Δ32
mutation) was estimated using
coalescent methods based on the
modern distribution of derivative Δ32
haplotypes.
Microsatellites
Microsatellites are repeating sequences in the ‘junk’ DNA areas.
Non-coding (don’t code for proteins)
Not under selection
High rate of mutation
The authors evaluated seven
microsatellites on chromosome 3 for
linkage with the CCR5-Δ32 allele.
GAAT and AFMB were found to show
significant linkage the CCR5-Δ32
allele.
GAAT has 3 possible alleles:
197 base pairs
193 base pairs
191 base pairs
AMFB has 4 possible alleles:
215 base pairs
217 base pairs
219 base pairs
213 base pairs
Age of mutation under drift
Ne = 5000
25 years/generation
p = .1 (current allele frequency)
was either fixed (p=1) or very
rare (p=0)
-4Ne [ p( ln p) + (1-p) ln (1-p)]
yields 6500 generations
162,500 years ago
Since CCR5-Δ32 isn’t present in noncaucasian populations, we can assume p
was equal to 0, so we can use:
-4Ne
[
p ( ln p)
(p – 1)
]
yields 5100 generations
127,500 years ago
Age estimation based on haplotype variation
By looking at which alleles are
most commonly found with
CCR5-Δ32, the authors concluded
that the ancestral haplotype for
CCR5-Δ32 was GAAT-197 and
AFMB-215 (85%)
In order to estimate the age of the
CCR5-Δ32 mutation using the
frequency of the ancestral haplotype,
the authors first needed estimates for
the rates of mutation and
recombination.
Age estimation based on haplotype variation
Estimation of r
r = total rate of change from ancestral haplotype
= μ + c (μ = mutation rate) (c = recombination rate)
Used previous microsatellite mutation rate
estimations from Weber and Wong (1993)
μ = .001 as an upper limit at GAAT and AFMB
Age estimation based on haplotype variation
Estimation of c
Based on wild type haplotype frequencies
we able to estimate 1 cM (cemtimorgan
‘recombination distance) is equal to 3.76 cR
(centiray ‘physcial distance’)
Used radiation-hybrid analysis to estimate
physical distances for CCR5, GAAT, and
AFMB.
Age estimation based on haplotype variation
Estimation of c
Used radiation-hybrid analysis to
estimate physical distances for
CCR5, GAAT, and AFMB.
cR
CCR5 is .8 cR from GAAT (.21 cM)
GAAT is 2.7 cR from AFMB (.72 cM)
This means the is a .21% recombination rate between CCR5
and GAAT and .72% recombination between GAAT and
AFMB
Age estimation based on haplotype variation
Estimation of c
Looking at CCR5-+ (wild type)
populations, if recombinations
were to occur,
36% of them would result in
crossing over with the same
haplotype, so 64% of
recombinations that occurred
inbetween CCR5 and GAAT
would result in CCR5 moving
next to a different microsatellites.
30.8+1.4+14.4+1.4 = 48
Also, 48% of the wild type haplotypes do not have the 215bp
AFMB allele, so 48% of recombinations between GAAT and
AFMB would result in a different AFMB allele switching
chromosomes.
Age estimation based on haplotype variation
Estimation of r
Combining these values:
c = .64 (.21%) + .48 (.72%) = .005
rate of recombination events which would lead
the CCR5-Δ32-197-215 haplotype to transfer the
CCR5-Δ32 gene to a different haplotype.
r = μ + c = .001 + .005 = .006
Ignoring mutations in which resulting in
changing back to the original haplotype.
(very rare)
Estimation of selection coefficient
To estimate the selection pressure to
change CCR5-Δ32’s frequency from 0 to .1
in G generations:
p’ = p(pw11 + qw12)
w
w11,w12 and w are dependent on if CCR5Δ32 is dominant, codominant, or recessive.
If dominant, w11= w12 = 1
w22 = 1 – s
w = 1 – sq2
Trial values of s were used until p’ = 0.1
after G generations of selection.
Initial p = .0005 and .0001
(1/2Ne if Ne = 1,000 and 5,000)
Estimating the age of CCR-32
• Stephens et al. present an equation to calculate the age of the CCR-32
mutation based on it’s level of LD
• Assuming the mutation was unique, at time zero it will be in complete
LD with the alleles at the neighboring loci
• With an estimate of the rate of recombination between the locus of
interest and nearby loci, the age of the mutation may be gauged by the
degree of decay of LD
• In order to do this, the ancestral haplotype has to be identified
Estimating the ancestral haplotype
• As alluded to earlier, the authors use the relative frequencies of the
different CCR-32-bearing haplotypes to estimate the ancestral
haplotype
• 32-197-215 is the most common CCR-32-bearing haplotype
(84.8%) and is one mutational step away from the most common nonmutant haplotype, +-197-215
• Thus, 32-197-215 was identified as the most likely ancestral
haplotype for the CCR-32 mutation and all other CCR-32-bearing
haplotypes were considered derived
Calculating the number of generations since
the mutation
• The probability that a given haplotype does not change from its
ancestor G generations ago is the following:
– P = (1-r)G ~ e-rG
• Solving for G, we get the following:
– G = -ln(P)/r
• To estimate P, they use the proportion of observed haplotypes that are
ancestral
• Note: This estimate was originally derived for a dramatically
expanding population but also holds for a constant-sized population in
which many lineages are highly correlated (extensive periods of
coancestry)
• Variance in estimates of T, however, do depend on tree topology. Why?
Tree topology and variance in estimates of T
Calculating G and T
• Substituting the present frequency of the ancestral haplotype
(0.848) for P and the authors estimate of r (0.006) into Equation
2 gives a G = 27.5 generations
• Assuming a 25-year human-generation time:
– T = 25*27.5 = 688 Years
• This calculation, however, assumes the present frequency and r
are known without error
Accounting for uncertainty in parameter
estimates
•
•
•
•
•
The authors note two potential sources of error in their estimates of r and p
For r, the regression is consistent with the possibility of a 10%-20% reduction
in recombination from their estimate of r in the region where the haplotype
resides
When they considered lower values of r (i.e. 0.004 or 0.002) the G (and thus
T) estimates were still well within the range of recorded human history (G =
41.3 and 82.5, and T = 1,032 and 2,064, respectively)
To check their estimates of p, ancestral haplotype frequencies were estimated
from a larger sample of 1,400 chromosomes
Similar frequencies to those estimated from the smaller original sample were
found and gave an estimated G = 16-31 and T = 402-766
Coalescent simulations
• To further evaluate their estimate of G (= 28), the authors conducted
coalescent simulations incorporating a complete Markov transition
matrix (Reich and Goldstein, 1999)
• This approach considers regeneration of the ancestral haplotype and
incorporates a number of different population growth models (see
Hudson, 1990)
• From these simulations, 95% CI were derived
• They performed 1,000 simulations for each combination of
demographic parameters (population sizes ≤ 100,000, exp. growth
rates from zero to rapid)
Markov transition matrix
• The Markov transition matrix was calculated as follows:
– K = cR + M + (1- c - )I
Where:
– K = Markov transition matrix
– R = Recombination matrix
– M = Mutation matrix
– I = No event occurring matrix (Identity Matrix)
– c = recombination rate
– = mutation rate
Recombination Matrix
a = Pr(AH|RE)
Mutation Matrix
M=
b = proportion of alleles one mutation step away from AH
Variance estimates for G (and T)
• The simulations considered each combination of demographic
parameters separately
• Of all the simulated genealogies, only those that resulted in the
observed number of ancestral haplotypes were considered
further
• From this subset of trees, they produced a 95% confidence
interval around T
• Adding a further constraint on recent European population size
(N > 5,000) a 95% CI for T was estimated to be 275-1,875 Years
Conclusion
•The age of the CCR5- 32 mutation is estimated
to be approx. 700 years via selective coalescent
based estimates
•Since the mutation probably occurred so recently
and the fact the mutation gives strong resistance to
AIDS,
it is suspected that a strong selective pressure
(possibly an ancient plague) was responsible for
it’s high frequency in Eurasia.
•Bubonic Plauge “Bacillus
(yersinia pestis)”, Shigella,
Salmonella, Mycobacterium,
Tuberculosis are all candidate
sources for the selective pressure
since they all attack the immune
system.
•Additional possible diseases are
Syphilis, Small Pox, and Influenza.
The Bubonic Plague which claimed the
lives of 25%-33% of Europeans ~650
years ago is an obvious candidate.
Other deleterious mutations with
positive side effects.
Sickle-cell anemia, thalassemia, duffy mutations all
impart a resistance to malaria.
Similar hypotheses may also apply to Tay-Sachs disease
and cystic fibrosis.
Red Blood Cell showing sickle shape
due to sickle-cell anemia.
Key Words and Terms
•Allele – A particular ‘flavor’ of gene. Ex. Gene-hair color. Allele-blond, or brown, or red…
•Haplotype – A particular set of alleles. Ex. Blond hair, fair skin, and blue eyes is one haplotype. Brown hair, brown
eyes, and olive complexion is another haplotype.
•Genotype – All the alleles an organism possesses. Can be expressed or unexpressed.
•Wild Type Haplotype – The ancestral, non-mutated haplotype. Signified by a + symbol.
Ex. CCR5-+ (wild type) vs. CCR5-Δ32 (mutated type).
•Microsatellite – A non-coding (isn’t used to make proteins) region of DNA which is inheritable, not under selection,
and has a high rate of mutation. Because of these traits, they’re often used to establish relationships among individuals
and/or populations. Different microsatellite alleles contain different numbers of base pairs. Ex. 215 base pairs vs. 217
base pairs.
•Loci – A gene’s location on a chromosome.
•Recombination – “crossing over” During meiosis (the formation of the gametes. ie. sperm & egg), the chromosomes
from the mother and father cross over and exchange parts of the DNA strand, mixing up the alleles from each parent
onto one chromosome, effecting gene linkage.
http://en.wikipedia.org/wiki/Image:Morgan_crossover_1.jpg
•Gene Linkage – When genes are are close to one another on the chromosome, the alleles tend to be inherited together.
Crossover events move alleles to different chromosomes so they are no longer inherited together.
•Macrophages & Monocytes – Both are part of the non-specific immune system.
Common name: White blood cells. Located in the blood and in tissues. When in the vicinity of a foreign object
(invading virus or bacteria),they engulf and digest it.
http://en.wikipedia.org/wiki/Monocytes
•Chemokine Receptor – A particular type of protein found in the cell membrane, used by the cell to send and receive
chemical messages to/from other cells.
•Explain what r, μ, and c represent in the context of this paper. r = μ + c
r – The combined mutation and recombination rate. μ – The estimated microsatellite mutation rate. c – The estimated
recombination rate.