1 shared allele

Download Report

Transcript 1 shared allele

Medical Genetics:
Complex disorders
Lecturer: David Saffen. Ph.D.
Laboratory for Molecular Neuropsychiatric Genetics
Department of Cellular and Genetic Medicine
School of Medicine, Fudan University
[email protected]
Outline
A. Historical background
B. Phenotypes in populations
C. Genes in populations
D. Mapping disease genes
E. Complex disorders
A. Historical background
• Francis Galton:
normal distributions of quantitative traits
• Ronald A. Fisher:
polygenic models for quantitative traits
Biometrics and Mendelian genetics
Francis Galton was a pioneer in using statistical
methods to quantify human traits and behaviors.
For example, he recognized that the distribution of
many traits such as height, weight, intelligence closely
approximate the “normal” (aka “Gaussian”)
distribution. He also recognized that inherited traits
tended to move toward average values, a phenomenon
he termed, “regression to the mean.”
Sir Francis Galton 1822-1911
(English Victorian polymath;
Cousin of Charles Darwin;
biometician; eugenicist)
Most of Galton’s work on inheritance was carried
out before the re-discovery of Mendel’s experiments.
The paradigm under which Galton and other
“biometricians” worked was that inheritance of human
traits involved the mixing or blending of factors present
in the parents. This picture is very different from that
obtained from Mendel’s experiments, which implied that
inherited traits are determined by discrete factors that
remain unchanged from generation to generation.
Normal distributions of quantitative traits
and “regression to the mean”
45° slope
Mean
height
SD = standard deviation
Unification of biometrics and Mendelian genetics
RA Fisher was a 20th century genius who made
fundamental contributions to the fields of statistics
and biology. In statistics, he developed analysis
of variance (ANOVA), the maximum likelihood
method for estimating the values of parameters based
on experimental data, permutation testing to estimate
statistical significance (P-values), and exact tests for
estimating statistical significance in small samples.
Sir Ronald A. Fisher (1890-1962)
English statistician, evolutionary
biologist, geneticist, eugenicist;
Published: “The correlation
Between relatives on the
supposition of Mendelian
Inheritance” in 1918.
In biology, Fisher (together with Sewall Wright and
J.B.S. Haldane) is considered one of the founders
of the “Modern Evolutionary Synthesis,” which unified
Darwin’s theory of natural selection with Mendelinspired concepts of modern genetics. Among many
contributions, Fisher was the first to propose the idea
of heterozygote advantage to explain the persistence
of harmful genetic variants in certain populations.
The combined effects of multiple genes can
produce normal distributions for quantitative traits
B. Phenotypes in populations
• Familial aggregation of disease
• Disease risk as a quantitative trait
• Factors that obscure patterns of inheritance
Familial aggregation of disease
• Relative risk
l
Prevalence of the disease among
the relatives of an affected person
r =
Prevalence of the disease
in the general population
• Concordance and allele sharing among relatives
0.25 (2 shared alleles) + 0.5 (1 shared allele)
+ 0.25 (0 shared alleles) = 1 shared allele (average).
One-of-two alleles shared = 50% shared alleles.
Correlations between risk of developing schizophrenia
and the degree of relatedness among relatives
(1/8)
(1/4)
(1/2)
Disease risk as a quantitative trait:
Susceptibility and protective genes influence
the liability (risk) of developing a polygenic disease
Model for schizophrenia liability
Factors that obscure patterns
of disease inheritance
•
•
•
•
•
Phenocopy
Variable penetrance and expressivity
Locus heterogeneity
Allelic heterogeneity
Environmental influences
C. Genes in populations
• DNA polymorphisms
• Allele and genotypes frequencies
• Ethnic differences in allele frequencies and
susceptibility to disease
• Out of Africa
DNA polymorphisms
Single nucleotide
polymorphisms
(SNPs)
Microsatellites
Minisatellites
Technically, a polymorphism is a genetic variant that is
present in a population at a frequency of > 1%.
SNPs and haplotypes
rs384756 (A > C) or (T > G) [0.67/0.33]
Chromosome 1
AATGCCGATTCAGGGCTTAACG
TTACGGCTAAGTCCCGAATTCG
Chromosome 2
AATGCCGATTCCGGGCTTAACG
TTACGGCTAAGGCCCGAATTCG
SNP1 (A/C)
SNP2 (G/T)
SNP3 (C/A)
SNP4 (T/C)
Chromosome
A
A
C
C
G
T
T
G
C
C
A
A
T
T
C
T
H1
H2
H3
H4
Haplotypes
Allele and genotype frequencies in populations:
the Hardy-Wienberg equilibrium (HWE)
Let:
PA = population frequency of A-allele = p
Pa = population frequency of a-allele = q
PA + Pa = p + q = 1
Sperm A-allele Sperm a-allele
p
q
Egg A-allele
p
AA
p2
aA
qp
Egg a-allele
q
Aa
pq
aa
q2
AA 2Aa aa
p2 + 2pq + q2 = 1
Assumptions: large population; no: new mutations, selection; migration
Frequency of DCCR5 alleles in
Europe, the Middle East and India
DCCR5 is a 32 bp deletion within the cytokine receptor, subtype 5 (CCR5) gene.
(Individuals homozygous for DCCR5 are resistant to some strains of HIV virus.)
Out
of
Africa
D. Mapping disorder genes
•
•
•
•
•
Linkage
Linkage analysis within pedigrees
Linkage analysis within populations
Whole exome or genome sequencing
The “architecture” of complex disorders
Linkage
“Marker” locus
Chromosome 1
M
Disease risk locus (a = liability allele)
M
D
d
Recombination
Chromosome 2
m
Gametes
d
m
M
D
M
d
M
D
M
d
m
d
m
D
m
d
m
D
D
The alleles of chromosomal markers located in close proximity to a
disease risk locus tend not to be separated by recombination and therefore
co-assort with the risk and non-risk alleles. In the case above,
the “m” marker allele co-assorts with the “a” risk allele.
For this reason, the marker tends to be co-inherited with the disease.
Linkage analysis within pedigrees
• Examines the coinheritance of a disease with
chromosomal markers within multiple extended families
• SNPs, mini- or “micro-satellites are often used as markers
• Usually allows localization to only a general region of a
particular chromosome (e.g. within several million bp)
• Additional mapping is required to identify the disease gene
• Relatively insensitive to allelic heterogeneity
Example
Linkage-analysis within populations:
case-control association studies
• Depends upon “linkage-disequilibrium” between genetic
markers and disease risk variants
• Genetic markers are usually SNPs or CNVs.
Commercially available genotyping platforms allow over
1 million SNPS to be examined for association with
disease in each individual tested.
• Sensitive to allelic heterogeneity, but allows localization
of susceptibility alleles to within 10 – 100 kb.
Linkage-disequilibrium (LD)
SNP1 (A/a)
SNP2 (B/b)
A
B
A
b
a
B
a
b
D’ = [PAB - PAPB]/DMAX
PA = frequency of allele A in population
PB = frequency of allele B in population
PAB = frequency of association of alleles A and B
DMAX = Maximal value of |PAB-PAPB|
D’ = 0 when there is no LD;
±1 for complete association or dissociation.
TPH2 haplotypes
Linkage-disequilibrium (D’) map of the
tryptophan hydroxylase 2 (TPH2) gene
Case-control association analysis
(population-based method)
Whole exome sequencing and
whole genome sequencing
Lupski JR et al, Whole genome sequencing in a patient with Charcot-Marie-Toot Neuropathy,
The New England Journal of Medicine 362, 1181-1191, 2010
The “architecture” of complex diseases
Common disease-common variant (CDCV) model
vs
Common disease-rare variant (CDRV) model
Case-control association studies are an effective tool for
identifying common susceptibility alleles of moderate effect.
By contrast, large-scale DNA sequencing is more efficient
for detecting rare genetic variants of large effect.
Sequenced genetic variants in an individual
(Ref: Lupski JR et al, New England J. Medicine 362, 2010)
*In this table, ”SNPs” also includes small indels and
other possible duplications or deletions.
Total “SNPs” = 2,858,587 known + 561,719 novel.
Note: this study also identified 234 CNVs ranging
in size from 1690 to 1,627,813 bp; 220 of these
overlap with known CNVs.
E. Selected complex disorders
• Digenic retinitis pigmentosa
• Venous thrombosis
• Hirschsprung disease
• Coronary artery disease
• Alzheimer’s disease
Digenic
retinitis
pigmentosa
Ref: Kajiwara K et al,
Science 264, 1994
Venous thrombosis
(Risk influenced by two genes
+ “environmental” factor)
Factor V
Arg506Gln
WT
(Arg)
Leiden
(Gln)
3’-UTR SNP
G20210A
(rs1799963)
A-allele
increases
levels of
prothrombin
mRNA.
~2.9% of
Caucasians are
heterozygous
at this locus.
Factor V Leiden (FVL)
is more stable than
wild type Factor V
Risk for thrombosis
is increased 7-fold
for heterozygotes
(80-fold for homozygotes)
~ 5% of Caucasians are
heterozygous for FVL
OC = oral contraceptives
increase expression levels
of Factor X and prothrombin
a = activated form
Note: OC use
+ rs1799963-A
increases risk
of cerebral vein
thrombosis
30 - 150-fold!
Hirschsprung disease [HSCR]
(Congenital aganglionic megacolon)
Proportion
cases
Incidence ~1/5000 children; males affected
2-4 times as frequently than females. Cause:
incomplete development of sympathetic nervous
system (myenteric plexis) in one or more segments of
colon. Lack of these nerves prevents the colon from
relaxing, resulting in intestinal blockage. To date, at
least ten genes have been implicated in HSCR. Among
these, the tyrosine receptor kinase RET has been
identified as the major disease-causing gene.
Syndromic
18%
Associated
with abnormal
chromosomes
12%
Sporadic
70%
Inheritance
Short segment
(S-HSCR)
Recessive or
multigenic
Long segment
(L-HSCR)
Dominant; low
penetrance
Common SNPs that disrupt the binding of transcription factors to an
enhancer element located within the first intron of RET reduce RET
mRNA expression and are highly associated with sporadic HSCR.
Coronary artery disease [CAD]
Coronary artery
disease kills about
450,000 every
year in the US.
Cast of coronary arteries
(yellow = right; red = left arterial trees)
Steps leading to CAD
Genetics of CAD
Familial CAD
Familial hypercholesterolemia: autosomal
dominant disorder caused by inactivating
mutations of the low-density lipoprotein
receptor (LDLR) gene located at 19p13.2.
Familial aggregation
Proband with CAD
Recurrence risk*
Sister
7-fold for brothers
Brother
2.5-fod for sisters
< 55 years old
11.5-fold for siblings
Male < 55 with MI
6- to 8-fold for MZ twin**
3-fold for male DZ twin**
Female < 55 with MI
15-fold for MZ twin**
2.6-fold for DZ twin**
*Compared to general population; **After controlling for
risk factor including diabetes, hypertension and smoking
Idiopathic CAD
Genetic risk factors:
hypertension, obesity, diabetes mellitus
(each a disease with complex genetic components)
Non-genetic risk factors:
Age, sex (male>female), smoking, physical inactivity, stress
GWAs studies have identified candidate CAD risk genes that function within
biological pathways related to serum lipid transport and metabolism,
vasoactivity, blood coagulation, inflammatory and immune pathways,
and arterial wall components
Alzheimer’s disease [AD]
• Progressive, incurable neurodegenerative disease
that leads to dementia and death
• Symptoms usually appear after age of 65;
early on-set forms also known
• Familial AD (FAD): accounts for ~5% of cases;
Sporadic AD (Late-onset AD: LOAD)
• Currently thought to affect 5 M individuals in the US
and 35 M individuals worldwide; this number may
increase to > 115 M worldwide by 2050!
Brain pathology in AD (1)
AD brain
normal brain
Brain pathology in AD (2)
AD brain
normal brain
Brain pathology in AD (3)
amyloid plaques and neurofibrillary tangles
(A) Low-power: amyloid plaques,
(B) high-power: amyloid plaque,
(C) neurofibrillary tangles (NFT), silver stained
(D) electron micrograph of neurofibrillary tangles
composed of hyperphosphorylated tau
Co-staining of amyloid plaques & NFT
Pathways of amyloid protein
precursor (APP) proteolysis
Li H, Wolfe MS and Selkoe DJ, Structure 17, 2009
Genetics of AD
Familial AD (FAD)
Early onset AD
(EOAD)
Sporadic AD (SAD)
Late onset AD
(LOAD)
Proportion of
AD cases
~5%
~95%
Age of onset
< 65
> 65
Liability genes
APP
PSEN1 PSEN2
ApoE4 + many
additional genes
The effects of ApoE genotypes on AD risk
CHRM1
PRKCA
APP APP
ADAM9
ADAM10
ADAM17
α-secretase
C83 + sAPPa
b-secretase
C99 + sAPPb
miR-107
miR-9, mir29a/b-1
BACE1-AS RNA
CALHM1
ATXN1
IL33
BACE1
BACE2
PION
TNFRSF1, 2
TNF , GSK3A
γ-secretase
p3 + AICD
g-secretase
Ab40/42 + AICD
PSEN1, PSEN2
PSENEN, NCSTN
APH1A, APH1B
degraded Ab
(liver)
C3
CR1*
LRP1
LRP2
LDLR
CLU
ABCA1
APOEe3
APOEe4
degraded Ab
degraded Ab
(extracellular)
(intracellular)
IDE
NEP
PLAU*
proteolysis
clearance
Ab
serum
deposition
RAGE
blood vessel
(BBB)
Ab
(extracellular)
sRAGE
CST3
APOeE
oligomers
(extracellular)
plaques
(extracellular)
SORL1
SORC1
PICALM*
secretion
uptake
APOEe4
LRP1
FPRL1
CHRNA7
UCHL1
PARK2
proteolysis
Ab
(intracellular)
oligomers
(intracellular)
lipid homeostasis
blood pressure
LDL
CETP
APOEe4
ACE
blood vessel
pathology
neuroinflammation
LRP1
APOE4
APP
APP
(surface) SORL1 (endosomal)
SORC1
APBA1
PICALM*
APAB2
LRP1B
VEGF
PIN1
APOE2
IL33
APBB1
Ab burden
IL1,6, 8
TNF,TNK1
activation of microglia
RAGE
MAPK1,3,14
PTPRC
(monomers, oligomers, plaques)
synapse
dysfunction
PICALM*
CHRNA7
CHRNB2
CHRM1
cholinergic
neurotransmission
PIN1
cytokine secretion
MAPT
GSK3A
GSK3B
CDK5
CDK5R1
GAB2
phagocytosis &
complement-mediated
clearance (in liver)
C3
CR1*
free-radical
production
neurofibrillary tangles
(hyper-phosphorylated tau)
TNF
TNK1
FPRL1
DAPK11
NGF
BDNF
WWC1
neuronal cell death
AD
mitochondria
dysfunction
TFAM
References and further reading
RI Nussbaum, RR McInnes and HF Willard,
“Thompson & Thompson Genetics in Medicine,
Edition 7,” 2007, Saunders Elsevier, Philadelphia, PA;
ISBN: 978-1-4160-3080-5 (Chapters 8 -10)
T Strachan and A Read, “Human Molecular Genetics,
4th Edition,” 2011, Garland Science, New York, New York
ISBN: 978-0-815-34149-9 (Chapters 3, 14 &15)
Appendix
• Normal (Gaussian) distributions
• Punnett squares for multiple loci (genes)
Normal (Gaussian) distributions
0.4
0.3
f(x)
0.2
0.1
0.0
The standard deviation, , is a measure of the
“dispersion” or “variance” (v) of the measured quantity
with respect to the average or “mean” value.
Technically,  = √v.
1777-1855; German
mathematician and
physical scientist;
Professor: University
of Göttingen
~95% of total
area under curve
Punnett squares for multiple loci (genes)
Eggs
Sperm
AB
Ab
aB
ab
A
a
AB
ABAB
AbAB
aBAB
abAB
A
AA
aA
Ab
ABAb
AbAb
aBAb
abAb
a
Aa
aa
aB
ABaB
AbaB
aBaB
abaB
ab
ABab
Abab
aBab
abab
(a + A)2 = aa + 2Aa + AA (3 terms)
(a + A)2 (b + B)2 = aabb + aaBB + 2aaBb+ 2Aabb +
4AaBa + 2AaBB + 2AABb + AAbb + AABB (9 terms)
ABC
AbC
aBC
abC
ABc
Abc
aBc
abc
ABC
ABCABC
AbCABC
aBCABC
abCABC
ABcABC
AbcABC
aBcABC
abcABC
AbC
ABCAbC
AbCAbC
aBCAbC
abCAbC
ABcAbC
AbcAbC
aBcAbC
abcAbC
aBC
ABCaBC
AbCaBC
aBCaBC
abCaBC
ABcaBC
AbcaBC
aBcaBC
abcaBC
abC
ABCabC
AbCabC
aBCabC
abCabC
ABcabC
AbcabC
aBcabC
abcabC
ABc
ABCABc
AbCABc
aBCABc
abCABc
ABcABc
AbcABc
aBcABc
abcABc
Abc
ABCAbc
AbCAbc
aBCAbc
abCAbc
ABcAbc
AbcAbc
aBcAbc
abcAbc
aBc
ABCaBc
AbCaBc
aBCaBc
abCaBc
ABcaBc
AbcaBc
aBcaBc
abcaBc
abc
ABCabc
AbCabc
aBCabc
abCabc
ABcabc
Abcabc
aBcabc
abcabc
(a + A)2 (b + B)2 (c + C)2 = [aabb + aaBB + 2aaBb+ 2Aabb + 4AaBa + 2AaBB + 2AABb + AAbb + AABB][cc + 2Cc + CC] =
[ aabbcc + aaBBcc + 2aaBbcc + 2Aabbcc + 4AaBacc + 2AaBBcc + 2AABbcc + AAbbcc + AABBcc +
2aabbCc + 2aaBBCc + 4aaBbCc + 4AabbCc + 8AaBaCc + 4AaBBCc + 4AABbCc + 2AAbbCc + 2AABBCc +
aabbCC + aaBBCC + 2aaBbCC + 2AabbCC +4AaBaCC +2AaBBCC + 2AABbCC + AAbbCC + AABBCC] (27 terms)