GLYPHOSATE RESISTANCE Background / Problem
Download
Report
Transcript GLYPHOSATE RESISTANCE Background / Problem
Lecture 25: Association
Genetics
November 30, 2012
Announcements
Final exam on Monday, Dec 10 at 11 am, in
3306 LSB
2010 exam and study sheets posted on
website
Exam is mostly non-cumulative
Review session on Friday, Dec. 7
Extra credit lab next Wednesday: up to 10
points
Extra credit report due at final exam
Last Time
Quantitative traits
Genetic basis
Heritability
Linking phenotype to genotype
QTL analysis introduction
Limitations of QTL
Today
Association genetics
Effects of population structure
Transmission Disequilibrium Tests
Quantitative Trait Locus Mapping
A
B
C
A
B
C
Parent 2
a
b
c
X
HEIGHT
Parent 1
a
b
c
A
B
C
A
B
c
B
b
Bb
A
B
C
X
A A
b B
c c
a a
BB
c c
BB
a
b
c
F1
F1
BB
a A
B b
c c
bb
modified from D. Neale
A a
b b
c c
bb
A A
b B
c C
BB
bb
a
b
c
A A
B b
c c
Bb
a a
B B
c c
Bb
Bb
GENOTYPE
A a
b B
c c
BB
a
B
c
BB
QTL for aggressive behavior in mice
http://people.bu.edu/jcherry/w
ebpage/pheromone.htm
X chromosome
F1
A
B
c
A
B
C
A
B
C
a
X b
c
a
b
c
A
B
C
a
b
c
A
X B
C
a
b
c
a a
B B
c c
A
b
c
A
B
c
a
B
c
Monoamine Oxidase A (MAOA)
A
b
c
A
b
c
Brodkin et al. 2002
Monoamine Oxidase A (MAOA)
Selectively degrades serotonin, norephinephrine, and
dopamine
Located near QTL for aggressive behavior on the X
chromosome
Levels of expression affected by a VNTR
(minisatellite) locus in the promoter region
Sabol et al. 1998
MAOA and childhood maltreatment
Genotype-by-Environment interaction
Caspi et al. 2002
QTL Limitations
Biased toward detection of large-effect loci
Need very large pedigrees to do this properly
Limited genetic base: QTL may only apply to
the two individuals in the cross!
Genotype x Environment interactions rampant:
some QTL only appear in certain environments
Huge regions of genome underly QTL, usually
hundreds of genes
How to distinguish among candidates?
Linkage Disequilibrium and Quantitative Trait
Mapping
Linkage and quantitative trait locus
(QTL) analysis
Need a pedigree and moderate number
of molecular markers
Very large regions of chromosomes
represented by markers
Association Studies with Natural
Populations
No pedigree required
Need large numbers of genetic markers
Small chromosomal segments can be
localized
Cardon and Bell 2001, Nat. Rev.
Genet. 2: 91-99
Many more markers are required than in
traditional QTL analysis
ancestral
chromosomes
G
T
HEIGHT
Association Mapping
*
TT
TC
GENOTYPE
CC
recombination
through
evolutionary
history
present-day
chromosomes
in natural
population
G
C
G
T
A
C
A
C
*
G
T
A
T
*
*
Slide courtesy of Dave Neale
Candidate Gene Associations vs. Whole
Genome Scans
Candidate
If LD is high and haplotype
blocks are conserved, entire
genome can be efficiently
scanned for associations with
phenotypes
Biased by existing knowledge
Use "Candidate Regions" from
high LD populations, assess
candidate genes in low LD
populations
ABOVE:BELOW
If LD is low, candidate
genes are usually identified a
priori, and a limited number
are scanned for associations
Region
I
COARSE ROOT
Simplest for case-control
studies (e.g., disease, gender)
QTL
154.1
157.3
163.4
171.3
178.2
180.8
182.1
184.2
193.5
198.1
206.8
210.6
219.9
226.5
230.3
232.7
243.1
P_204_C
S8_32
P_2385_C P_2385_A
T4_10
S15_8S5_37
T4_7S6_12
S8_29
P_2786_A S12_18
T1_13
T7_4
T3_13 T3_36
S17_21
S15_16T12_15
T2_30
S13_20
S1_20
T9_1 S1_19
S3_13
S1_24
S2_7
P_575_A
T12_22
S2_32
T7_9
S2_6
S13_16 T5_25
T5_12
T10_4
T1_26 T7_13
P_93_A
S4_20
S7_13 S7_12
T12_4
S4_24T3_10
S6_4
P_2852_A
S3_1
S6_20 S13_31
T7_15
T2_31
S8_4
S8_28
O_30_A
T5_4
T3_17
T12_12
S5_29
P_2789_A
P_634_A S17_43
S17_33
S17_12
S4_19
262.9
S17_26
0.0
8.8
11.6
12.1
13.8
15.5
17.9
20.4
22.3
23.5
24.1
25.3
26.5
29.5
36.5
43.2
50.5
52.9
54.1
59.1
60.6
85.0
95.7
107.8
121.4
124.3
129.0
135.7
148.6
150.2
152.8
Candidate
Gene Identification
Human HapMap Project and Whole Genome Scans
NATURE|Vol 437|27 October 2005
LD structure of human Chromosome 19 (www.hapmap.org)
1 common SNP genotyped every 5kb for 269 individuals
9.2 million SNP in total
Take advantage of haplotype blocks to efficiently scan
genome
Next-Generation Sequencing and Whole
Genome Scans
The $1000 genome is on the
horizon
Current cost with Illumina HiSeq
2000 is about $2000 for 10X
depth
The 1000 genomes project has
sequenced thousands of human
genomes at low depth
Can detect most polymorphisms
with frequency >0.01
True whole genome association
studies now possible at a very
large scale
http://www.1000genomes.org/
Identifying genetic mechanisms of simple vs.
complex diseases
Simple (Mendelian) diseases: Caused by a single major gene
High heritability; often can be recognized in pedigrees
Example: Huntington’s, Achondroplasia, Cystic fibrosis, Sickle Cell
Anemia
Tools: Linkage analysis, positional cloning
Over 2900 disease-causing genes have been identified thus far: Human
Gene Mutation Database: www.hgmd.cf.ac.uk
Complex (non-Mendelian) diseases: Caused by the interaction
between environmental factors and multiple genes with minor effects
Interactions between genes, Low heritability
Example: Heart disease, Type II diabetes, Cancer, Asthma
Tools: Association mapping, SNPs !!
Over 35,000 SNP associations have been identified thus far:
http://www.snpedia.com
Slide adapted from Kermit Ritland
Complicating factor: Trait Heterogeneity
Same phenotype has multiple genetic mechanisms underlying it
Slide adapted from Kermit Ritland
Case-Control Example: Diabetes
Knowler et al. (1988) collected data on
4920 Pima and Papago Native American
populations in Southwestern United States
High rate of Type II diabetes in these
populations
Found significant associations with
Immunoglobin G marker (Gm)
Does this indicate underlying mechanisms
of disease?
Knowler et al. (1988) Am. J. Hum. Genet. 43: 520
Case-control test for association (case=diabetic, control=not diabetic)
Gm Haplotype
Type 2 Diabetes
present
absent
Total
present
8
29
37
absent
92
71
163
100
100
200
Total
Question: Is the Gm haplotype associated with risk of Type 2 diabetes???
(1) Test for an association
C21 = (ad - bc)2N
.
(a+c)(b+d)(a+b)(c+d)
= [(8x71)-(29x92)]2 (200)
= 14.62
(100)(100)(37)(163)
(2) Chi-square is significant. Therefore presence of GM haplotype
seems to confer reduced occurence of diabetes
Slide adapted from Kermit Ritland
Case-control test for association (continued)
Question: Is the Gm haplotype actually associated with risk of Type 2 diabetes???
The real story: Stratify by American Indian heritage
0 = little or no indian heritage;
8 = complete indian heritage
Index of indian
Heritage
0
4
8
Conclusion:
Gm
Haplotype
Percent
with
diabetes
Present
17.8
Absent
19.9
Present
28.3
Absent
28.8
Present
35.9
Absent
39.3
The Gm haplotype is NOT a risk factor for Type 2
diabetes, but is a marker of American Indian heritage
Slide adapted from Kermit Ritland
Population structure and spurious association
Assume populations are historically isolated
One has higher disease frequency by chance
Unlinked loci are differentiated between populations also
Unlinked loci show disease association when populations are
lumped together
Population with high
disease frequency
Gene flow barrier
Population with low
disease frequency
Alleles at neutral locus
Alleles causing
susceptibility to disease
Association Study Limitations
Population structure: differences between
cases and controls
Genetic heterogeneity underlying trait
Random error/false positives
Inadequate genome coverage
Poorly-estimated linkage disequilibrium
Transmission Disequilibrium Test (TDT)
al 1993)
(Spiegelman et
Compare diseased offspring
genotypes to parental genotypes to
test if loci violate Mendelian
expectations
Mm
mm
mm
Controls for population structure
Mm
mm
a=# times M transmitted
Mm
b=# times M not transmitted
(a-b)2/(a+b)
Approximately distributed as 2 with 1 degree of freedom
Slide adapted from Kermit Ritland
Transmission Disequilibrium Test (TDT)
Compared with “standard” association tests:
Still need to have tight LD, so need many markers:
Is not affected by population stratification
Uses only affected progeny (and parental
genotypes), so method is efficient
Only detects signal if there is both linkage
and association, does not depend on mode of
inheritance
Association Tests and Population Structure
Transmission disequilibrium
tests have limited power and
range of application
sample size limitations
restricted allelic diversity
“Genomic Control” uses
random markers throughout
genome to control for false
associations
“Mixed Model” approach
allows incorporation of known
relatedness and population
structure simultaneously Cardon and Bell 2001 Nature Reviews Genetics 2:91
ANOVA/Regression Model
(monotonic)
transformation
effect size
(regression coefficient)
error
(residual)
phenotype
(response variable)
of individual i
p(β=0)
coded genotype
(feature) of individual i
Goal: Find effect size that explains best all (potentially
transformed) phenotypes as a linear function of the genotypes
and estimate the probability (p-value) for the data being
consistent with the null hypothesis (i.e. no effect)
http://www2.unil.ch/cbg/index.php?title=Genome_Wide_Association_Studies
Mixed Model
effects of
background
SNPs
phenotype
(response variable)
of individual i
effect of target SNP
Family effect
(Kinship
coefficient)
Population Effect (e.g.,
Admixture coefficient from
Structure or values of Principal
Components)
Implemented in the Tassel program (Wednesday in lab)
Commercial Services for Human GenomeWide SNP Characterization
NATURE|Vol 437|27 October 2005
Assay 1.2 million “tag SNPs” scattered across genome using
Illumina BeadArray technology
Ancestry analyses and disease/behavioral susceptibility