Genetics for Imagers: How Geneticists Model Quantitative Phenotypes
Download
Report
Transcript Genetics for Imagers: How Geneticists Model Quantitative Phenotypes
Genetics for Imagers: How
Geneticists Model Quantitative
Phenotypes
Nelson Freimer
UCLA Center for
Neurobehavioral Genetics
What makes a genetic
association significant?
Outline
• The problem of achieving validated
findings in psychiatric genetics
• Approaches to genetic mapping and
statistical significance
- linkage analysis (+ examples)
- association analysis (+ examples
Psychiatric genetics: The
brains of the family
10 July 2008 | Nature 454, 154-157 (2008)
Does the difficulty in finding the genes
responsible for mental illness reflect the
complexity of the genetics or the poor
definitions of psychiatric disorders?
“The studies so
far are statistically
underpowered.
We need bigger
studies.”
— Jonathan Flint
“Geneticists know
nothing about
psychiatric disease.”
— Daniel Weinberger
WHAT IS THE PROBLEM?
• Psychiatric disorders are highly heritable
• No psychiatric susceptibility genes known
• Studies so far are underpowered
– Phenotypes are of uncertain validity
– Samples are too small and markers too few
– Signal to noise ratio is too low
(etiological heterogeneity: genetic and non-genetic)
We are just too ignorant
of the underlying
neurobiology to make
guesses about candidate
genes.” —Steven Hyman
“
This is why geneticists have
turned to genome wide
mapping
Genome-wide mapping and
allelic architecture
Effect Size
Large
Allelic architecture and genetic mapping
approaches
NOT FOUND TO DATE
LINKAGE
Small
Family-based
Case-control
OR
COPY NUMBER
VARIANTS
Rare (<1%)
ASSOCIATION
Common (>5%)
Disease Gene Allele Frequency
Founder
Disease Gene
IBD Region
Present-day affected
individuals
Shared IBD Region
IBD= Identical By Descent
The Principle of Genetic Linkage
If genes are located on different chromosomes they
show independent assortment.
compute this probability.
However, genes on the same chromosome, especially if
they are close to each other, tend to be passed onto
their offspring in the same configuration as on the
parental chromosomes.
Genetic markers: SNPs
Detecting Genetic Linkage:
Linkage Analysis vs Association
Analysis
• Linkage Analysis
– Using pedigree samples, search for regions of
the genome where affected individuals share
alleles more than you would expect
• Association Analysis
– Compare allele frequency distributions in
cases and controls
• For quantitative traits can apply similar
principles
Linkage
Analysis
G,T
T,T
T,T
G,T
G,T
Association
Analysis
T,T
G,T
G,T
G,T
G,T
G,T
G,T
T,T
T,T
T,T
T,T
When are two genetic loci
significantly linked?
Stringent significance thresholds based on…
• Low prior probability of linkage between
any two loci
– Considered when there were few markers
• Multiple tests involved in genotyping
studies
– Considered after there were many markers
• Both considerations yielded ~ same
threshold:
LOD score (log. base 10 of the likelihood ratio) >~
3
(i.e. p < 10-4)
• Prior probability of linkage between a given
locus and a random genome location: 0.02
• To obtain posterior probability of linkage of
>0.95 (i.e. <0.05 false positive linkages), apply
Bayes theorem:
• Solving for the likelihood ratio Pr(Data |
Linkage) / Pr(Data | NoLinkage)…
– ratio must be >1,000, i.e. LOD >3
Controlling for multiple testing in
linkage
• With complete genome marker sets, prior
probability that some marker linked is 1
• ~500 fully informative, independent
markers cover linkage in all regions of the
genome
• To control at 0.05 level, the global
hypothesis of no linkage anywhere in the
genome:
0.05/500 = 10-4 for each test, i.e. LOD >3
Significance thresholds for linkage
Lander and Kruglyak, 1996
•Suggestive linkage: a lod score or p value expected to occur
once by chance in a whole genome scan.
LOD >2.2, p < 7.4 x 10-4
•Significant linkage: a lod score or p value expected to occur
by chance 0.05 times in a whole genome scan
LOD >3.6, p < 2.2 x 10-5
•Highly significant linkage: a lod score or p value expected
to occur by chance 0.001 times in a whole genome scan.
LOD > 5.4, p < 3 x 10-7
•Confirmed linkage - a significant linkage observed in one
study is confirmed by finding a lod score or p value expected
to occur 0.01 times by chance in a specific search of the
candidate region.
An example of linkage to a
quantitative neurobehavioral
trait
Monoamine Neurotransmitters
Norepinephrine
and epinephrine
Attention
Blood pressure
Histamine
Dopamine
Reward
Serotonin
Appetite,Mood
Gastrointestinal motility
Gastric acid release
Immune response
From David Krantz
Catecholamine Synthesis and
Degradation
Genome wide linkage analysis
of HVA in a vervet monkey
pedigree
Vervet research colony pedigree
Heritability of Monoamine
Metabolites in vervet monkeys
MONOAMINE METABOLITES
PROP VAR
0.8
0.6
h2-GENETIC
c2-MATERNAL
0.4
0.2
0
5-HIAA
HVA
MHPG
HVA level in Vervets on Chromosome 10
Linkage analysis in extended
pedigrees may be powerful for
structural MRI phenotypes
Brain MRIs in the
VRC
357 Vervets scanned
Mobile Siemens Symphony
1.5 Tesla scanner
Genetic association analysis
Linkage analysis is not very
powerful for mapping complex
traits
(with many alleles of small
effect)
Effect Size
Large
Disease gene discovery methods
NOT FOUND TO DATE
LINKAGE
Small
Family-based
Case-control
OR
COPY NUMBER
VARIANTS
Rare (<1%)
ASSOCIATION
Common (>5%)
Disease Gene Allele Frequency
Linkage
Analysis
G,T
T,T
T,T
G,T
G,T
Association
Analysis
T,T
G,T
G,T
G,T
G,T
G,T
G,T
T,T
T,T
T,T
T,T
Significance thresholds for
association
Consider simple Bayesian argument:
- Prior probability that a random gene
associated with trait: ~1/30,000, assuming
30,000 genes/genome
- Likelihood ratio should be > 550,000 for
association to be significant (posterior
probability >0.95)
- With χ2 test, p< 2.6 x 10-7
A more complete evaluation of
significance
Posterior odds
(for true association)
=
Prior odds x Power
Significance
• Strength of evidence depends on likely number
of true associations and power to detect them
• These depend on effect sizes and sample sizes
• Less well-powered studies need more stringent
thresholds to control false-positive rate
See Wacholder et al.,
J. National Cancer Institute 2004
Genome wide association thresholds
• Controlling for multiple testing
E.g. Bonferroni: 0.05 x No. of SNPs x No. of traits
E. g. For single trait with 106 SNPs, p < 5 x10-8
• However, more complicated…
– SNPs are not all independent (LD)
– LD varies across genome and populations
– traits are not all independent
• False discovery rate (FDR) increasingly used
(proportion of false positives among all
positives)
…if 1 out of 20 hits are false not so bad
Evaluating association in
neurobehavioral genetics
studies
Monoamine Neurotransmitters
Norepinephrine
and epinephrine
Attention
Blood pressure
Histamine
Dopamine
Reward
Serotonin
Appetite,Mood
Gastrointestinal motility
Gastric acid release
Immune response
From David Krantz
Serotonin Transporter Promoter Polymorphism
Association Studies
as of 2002
Phenotype
P<.05 P>.05 Phenotype
P<.05
P>.05
Schizo.
2
7
BP/mood
disorder
8
13
OCD
2
2
Personality
traits
12
10
Drug
response
3
0
Suicide
4
1
Anorexia
0
2
Late Onset
Alzheimer’s
2
2
Smoking
related
4
1
Alcohol related
5
2
Autism
2
2
Fibromyalgia
1
0
Panic
disorder
0
3
Association of Anxiety-Related Traits with
Polymorphism in the Serotonin Transporter Gene
Regulatory Region
Lesch et al. Science. 1996;274(5292):1527-31.
• Two samples (N = 221, N = 284)
• Association with P ~ 0.02
A more complete evaluation of
significance
Posterior odds
(for true association)
=
Prior odds x Power
Significance
• Strength of evidence depends on likely number
of true associations and power to detect them
• These depend on effect sizes and sample sizes
• Less well-powered studies need more stringent
thresholds to control false-positive rate
See Wacholder et al.,
J. National Cancer Institute 2004
In large samples: No association of
5HTTLPR with temperament
Example from Northern Finland Birth Cohort, N ~ 4000
Influence of Life Stress on
Depression: Moderation by a
Polymorphism in the 5-HTT
Gene
Caspi et al.
Science 301: 386 – 389 2003
Interaction Between the Serotonin
Transporter Gene (5-HTTLPR),
Stressful Life Events, and Risk of
Depression: A Meta-analysis
Risch et al.
JAMA. 2009;301(23):2462-2471.
Logistic Regression Analyses of Risk of
Depression for 14 Studies
Copyright restrictions may apply.
Genomewide association
analysis
Progress in identifying gene
variants for common traits
Cholesterol
Obesity
Myocardial
infarction
QT interval
Atrial Fibrilliation
Type 2 Diabetes
Prostate cancer
Breast cancer
Colon cancer
height
PPAR
IBD5
NOD2
Age Related Macular Degeneration
Crohns Disease
Type 1 Diabetes
Systemic Lupus Erythematosus
Asthma
Restless leg syndrome
Gallstone disease
Multiple sclerosis
Rheumatoid arthritis
NOS1AP
Glaucoma
IFIH1
CTLA4
KCNJ11 PTPN22
2000 2001 2002 2003
CD25
IRF5
PCSK9
CFH
2004 2005
PCSK9
CFB/C2
LOC3877
15
8q24
IL23R
TCF7L2
CDKN2B/
A
8q24 #2
8q24 #3
8q24 #4
8q24 #5
8q24 #6
ATG16L1
5p13
10q21
IRGM
NKX2-3
IL12B
3p21
1q24
PTPN2
TCF2
CDKN2B/
A
IGF2BP2
CDKAL1
HHEX
SLC30A8
2006
Slide from David Altshuler
MEIS1 HMGA2
LBXCOR GDF5UQCC
1
BTBD9 HMPG
JAZF1
C3
8q24 CDC123
ORMDL3 ADAMTS
4q25 9
TCF2 THADA
GCKR WSF1
FTO LOXL1
C12orf30 IL7R
ERBB3 TRAF1/C
KIAA035 5
STAT4
0
CD226 ABCG8
16p13 GALNT2
PTPN2 PSRC1
SH2B3 NCAN
FGFR2 TBL2
TNRC9 TRIB1
MAP3K1 KCTD10
LSP1 ANGLPT
8q24 3
2007 GRIN3A
51
HDL Association at 16q22.1
HDL Association near LIPC
Progress in identifying gene
variants for common traits
Cholesterol
Obesity
Myocardial
infarction
QT interval
Atrial Fibrilliation
Type 2 Diabetes
Prostate cancer
Breast cancer
Colon cancer
height
PPAR
IBD5
NOD2
Age Related Macular Degeneration
Crohns Disease
Type 1 Diabetes
Systemic Lupus Erythematosus
Asthma
Restless leg syndrome
Gallstone disease
Multiple sclerosis
Rheumatoid arthritis
NOS1AP
Glaucoma
IFIH1
CTLA4
KCNJ11 PTPN22
2000 2001 2002 2003
CD25
IRF5
PCSK9
CFH
2004 2005
PCSK9
CFB/C2
LOC3877
15
8q24
IL23R
TCF7L2
CDKN2B/
A
8q24 #2
8q24 #3
8q24 #4
8q24 #5
8q24 #6
ATG16L1
5p13
10q21
IRGM
NKX2-3
IL12B
3p21
1q24
PTPN2
TCF2
CDKN2B/
A
IGF2BP2
CDKAL1
HHEX
SLC30A8
2006
Slide from David Altshuler
MEIS1 HMGA2
LBXCOR GDF5UQCC
1
BTBD9 HMPG
JAZF1
C3
8q24 CDC123
ORMDL3 ADAMTS
4q25 9
TCF2 THADA
GCKR WSF1
FTO LOXL1
C12orf30 IL7R
ERBB3 TRAF1/C
KIAA035 5
STAT4
0
CD226 ABCG8
16p13 GALNT2
PTPN2 PSRC1
SH2B3 NCAN
FGFR2 TBL2
TNRC9 TRIB1
MAP3K1 KCTD10
LSP1 ANGLPT
8q24 3
2007 GRIN3A
55
A success story in
neuropsychiatry
Genome Wide association in narcolepsy
in Japan (222 cases vs 389 controls)
-log10 (P value)
8
HLA
6
4
2
Chr 1
2
3
4
5
6
7
8
9
10
11 12 13 14 15 16
17 18 19 20 21 22
From Emmanuel Mignot
J. Hallmayer et al.
Nature Genetics 41, 708 - 711 (2009)
Narcolepsy is strongly associated
with the T-cell receptor alpha locus
2000 cases in GWAS +
~2000 cases in replication
~
Strong genome-wide evidence
Known genes and environment explain little of
trait variance
Sequencing: the currently unexplored
middle of the allelic spectrum
Whole genome sequencing is
coming soon…
But we don’t have very good
models for it yet
Summary
• The allelic spectrum of complex traits
determines the appropriate genetic mapping
approach
• Genetic linkage and association studies require
stringent statistical thresholds
• Single candidate gene studies have very low
probability of being true positives
• Genome-wide linkage and association studies
are beginning to bear fruit for neurobehavioral
traits
• Whole-genome sequencing is just around the
corner