Transcript ppt
Lecture 27 : Asscociation Genetics
April 21, 2014
Announcements
Final exam April 29 at 3 pm, 3306 LSB (computer lab)
Review session on Friday
Bring questions
Final lab on Wednesday
Course evaluations
Extra credit opportunity: earn up to 10 points for
lab report
Due at final exam
Last Time
Sequence data and quantification of variation
Sequence-based tests of neutrality
Ewens-Watterson Test
Tajima’s D
Hudson-Kreitman-Aguade Test
Synonymous versus Nonsynonymous substitutions
McDonald-Kreitman
Today
Quantitative traits
Genetic basis
Heritability
Linking phenotype to genotype
QTL analysis introduction
Limitations of QTL
Association genetics
Mendelian trait
Individual
1
2
3
4
5
6
7
8
9 10
Allele A1
Allele A2
Genotype =
12 11 22 22 11 22 12 11 22 12
Quantitative trait
16
28
40
52
Height
64
76
88
Courtesy of Glenn Howe
Quantitative traits are polygenic
55
Students at
Connecticut
Agricultural
College, 1914
60
65
70
75
As the number of loci
controlling a trait
increases, the
distribution of trait
values in a population
becomes bell-shaped
80
85
Influence of Environment on Human Height
Mean = 67 2.7 in.
1914
Height vs GDP (1925-1949)
By Country
Mean = 70 3 in.
1996
Baten 2006
4:10
Schilling et al. 2002. Amer. Stat. 56: 223-229
6:5
Hartl and Clark 2007
Hartl, D. 1987. A primer of Population Genetics.
3 loci, 2 additive
alleles
Uppercase alleles
contribute 1 unit to
phenotype (e.g.,
shade of color)
The phenotype is the outward manifestation
of the genotype
=
+
Phenotype
Genotype
σ2 P
σ2G
Environment
σ2E
Courtesy of Glenn Howe
Types of genetic variance (σ2G)
Additive
(σ2A): effects of individual alleles
Dominance
locus
Interaction
(epistasis)
(σ2D): effects of allele interactions within
(σ2I): effects of interactions among loci
σ2G = σ2A + σ2D + σ2I
Non-additive
Main cause for resemblance
between relatives
Heritability
Phenotype vs Genotype
Var(phenotype) = Var(genotype) + Var(environment)
Heritability:
Var(genotype) / Var(phenotype)
Two types of heritability
Broad-Sense Heritability includes all genetic effects: dominance,
epistasis, and additivity
− For example, the degree to which clones or monozygotic twins
have the same phenotype
Narrow-Sense Heritability includes only additive effects
− For example, degree to which offspring resemble their parents
Heritability (continued)
Characteristic of a trait measured in a particular population in a
particular environment
Best estimated in experiments (controlled environments)
Estimated from resemblance between relatives
The higher the heritability, the better the prediction of genotype
from phenotype (and vice versa)
h² = 0.5
h² = 0.1
P
h² = 0.9
P
G
P
G
http://psych.colorado.edu/~carey/hgss/hgssapplets/heritability/heritability1/heritability1.html
G
Identifying Genes Underlying Quantitative
Traits
Many
individual loci are responsible for quantitative
traits, even those with high heritability
Identification
of these loci is a major goal of breeding
programs
Allows
mechanistic understanding of adaptive
variation
Methods
usually rely on correlations between
molecular marker polymorphisms and phenotypes
Quantitative Trait Locus Mapping
A
B
C
A
B
C
Parent 2
a
b
c
X
HEIGHT
Parent 1
a
b
c
A
B
C
A
B
c
B
b
a
b
c
Bb
A
B
C
X
A A
b B
c c
a a
BB
c c
BB
F1
F1
BB
a A
B b
c c
bb
modified from D. Neale
A a
b b
c c
bb
AA
bB
cC
BB
bb
a
b
c
A A
B b
c c
Bb
a a
B B
c c
Bb
GENOTYPE
A a
b B
c c
Bb
a
B
c
BB
BB
Quantitative Trait Locus Analysis
Step 1: Make a controlled cross to create a large family
(or a collection of families)
Parents should differ for phenotypes of interest
Segregation of trait in the progeny
Step 2: Create a genetic map
Large number of markers phenotyped for all progeny
Step 3: Measure phenotypes
Need phenotypes with high heritability
Step 1: Construct Pedigree
Cross two individuals with
contrasting characteristics
Create population with segregating
traits
Ideally: inbred parents crossed to
produce F1s, which are intercrossed
to produce F2s
Recombinant Inbred Lines created
by repeated intercrossing
Allows precise phenotyping,
isolation of allelic effects
Grisel 2000 Alchohol Research & Health 24:169
Step 2: Construct Genetic Map
Number of recombinations between
markers is a function of map distance
Gives overview of structure of entire
genome
Anonymous markers are cheap and
efficient: AFLP, Genotyping by
Sequencing
Codominant markers much more
informative: SSR, SNP
Genotyping by Sequencing gives best of
both worlds: cheap, abundant,
codominant markers!
Step 3: Determine Phenotypes of Offspring
0.1
Phenotype must be segregating in
pedigree
Must differentiate genotype and
environment effects
How?
0.5
0.9
Works best with phenotypes with
high heritability
Step 4: Detect Associations between Markers and
Single-marker associations are
Phenotypes
simplest
Simple ANOVA, correcting for multiple
comparisons
Log likelihood ratio: LOD (Log10 of
odds)
LOD = log10
Pr(Data | QTL)
Pr(Data | noQTL)
If QTL is between two markers,
situation more complex
Recombination between QTL and
markers (genotype doesn't predict
phenotype)
'Ghost' QTL due to adjacent QTL
Use interval mapping or composite
interval mapping
Simultaneously consider pairs of loci
across the genome
Step 5: Identify underlying molecular mechanisms
QTL
chromosome
Genetic Marker
QTG: Quantitative Trait Gene
QTN: Quantitative Trait Nucleotide
Adapted from Richard Mott, Wellcome Trust Center for
Human Genetics
QTL Limitations
Huge
regions of genome underly QTL, usually hundreds
of genes
How to distinguish among candidates?
Biased
toward detection of large-effect loci
Need very large pedigrees to do this properly
Limited
genetic base: QTL may only apply to the two
individuals in the cross!
Genotype
x Environment interactions rampant: some
QTL only appear in certain environments
Linkage Disequilibrium and Quantitative Trait Mapping
Linkage and quantitative trait locus (QTL)
analysis
Need a pedigree and moderate number of
molecular markers
Very large regions of chromosomes
represented by markers
Association Studies with Natural Populations
No pedigree required
Need large numbers of genetic markers
Small chromosomal segments can be localized
Many more markers are required than in
traditional QTL analysis
Cardon and Bell 2001, Nat. Rev. Genet.
2: 91-99
ancestral
chromosomes
G
T
HEIGHT
Association Mapping
*
TT
TC
GENOTYPE
CC
recombination
through
evolutionary
history
present-day
chromosomes
in natural
population
G
C
G
T
A
C
A
C
*
G
T
A
T
*
*
Slide courtesy of Dave Neale
Next-Generation Sequencing and Whole Genome
Scans
The $1000 genome is on the horizon
Current cost with Illumina HiSeq
2000 is about $2000 for 10X depth
Thousands of human genomes have
now been sequenced at low depth
Can detect most polymorphisms with
frequency >0.01
True whole genome association
studies now possible at a very large
scale
Direct to Consumer Genomics: 23 &
Me and other genotyping services
http://www.1000genomes.org/
Commercial Services for Human Genome-Wide SNP
Characterization
NATURE|Vol 437|27 October 2005
Assay 1.2 million “tag SNPs” scattered across genome using Illumina
BeadArray technology
Ancestry analyses and disease/behavioral susceptibility
Identifying genetic mechanisms of simple vs. complex
diseases
Simple (Mendelian) diseases: Caused by a single major gene
High heritability; often can be recognized in pedigrees
Example: Huntington’s, Achondroplasia, Cystic fibrosis, Sickle Cell Anemia
Tools: Linkage analysis, positional cloning
Over 2900 disease-causing genes have been identified thus far: Human Gene Mutation
Database: www.hgmd.cf.ac.uk
Complex (non-Mendelian) diseases: Caused by the interaction between
environmental factors and multiple genes with minor effects
Interactions between genes, Low heritability
Example: Heart disease, Type II diabetes, Cancer, Asthma
Tools: Association mapping, SNPs !!
Over 35,000 SNP associations have been identified thus far:
http://www.snpedia.com
Slide adapted from Kermit Ritland
Complicating factor: Trait Heterogeneity
Same phenotype has multiple genetic mechanisms underlying it
Slide adapted from Kermit Ritland
Case-Control Example: Diabetes
Knowler et al. (1988) collected data on 4920 Pima
and Papago Native American populations in
Southwestern United States
High rate of Type II diabetes in these populations
Found significant associations with Immunoglobin
G marker (Gm)
Does this indicate underlying mechanisms of
disease?
Knowler et al. (1988) Am. J. Hum. Genet. 43: 520
Case-control test for association (case=diabetic, control=not diabetic)
Gm Haplotype
Type 2 Diabetes
present
absent
Total
present
8
29
37
absent
92
71
163
100
100
200
Total
Question: Is the Gm haplotype associated with risk of Type 2 diabetes???
(1) Test for an association
C21 = (ad - bc)2N
.
(a+c)(b+d)(a+b)(c+d)
= [(8x71)-(29x92)]2 (200)
= 14.62
(100)(100)(37)(163)
(2) Chi-square is significant. Therefore presence of GM haplotype
seems to confer reduced occurence of diabetes
Slide adapted from Kermit Ritland
Case-control test for association (continued)
Question: Is the Gm haplotype actually associated with risk of Type 2 diabetes???
The real story: Stratify by American Indian heritage
0 = little or no indian heritage;
8 = complete indian heritage
Index of indian
Heritage
0
4
8
Conclusion:
Gm
Haplotype
Percent
with
diabetes
Present
17.8
Absent
19.9
Present
28.3
Absent
28.8
Present
35.9
Absent
39.3
The Gm haplotype is NOT a risk factor for Type 2
diabetes, but is a marker of American Indian heritage
Slide adapted from Kermit Ritland
Population structure and spurious association
Assume populations are historically isolated
One has higher disease frequency by chance
Unlinked loci are differentiated between populations also
Unlinked loci show disease association when populations are lumped
together
Population with high
disease frequency
Gene flow barrier
Population with low
disease frequency
Alleles at neutral locus
Alleles causing
susceptibility to disease
Association Study Limitations
Population structure: differences between cases
and controls
Genetic heterogeneity underlying trait
Random error/false positives
Inadequate genome coverage
Poorly-estimated linkage disequilibrium
Association Analysis with a Mixed Model
effects of
background SNPs
phenotype
(response variable)
of individual i
effect of target SNP
Family effect
(Kinship
coefficient)
Population Effect (e.g., Admixture
coefficient from Structure or
values of Principal Components)
Implemented in the Tassel program (Wednesday in lab)