GLYPHOSATE RESISTANCE Background / Problem

Download Report

Transcript GLYPHOSATE RESISTANCE Background / Problem

Lecture 25: Association
Genetics
November 30, 2012
Announcements
 Final exam on Monday, Dec 10 at 11 am, in
3306 LSB
 2010 exam and study sheets posted on
website
 Exam is mostly non-cumulative
 Review session on Friday, Dec. 7
 Extra credit lab next Wednesday: up to 10
points
 Extra credit report due at final exam
Last Time
 Quantitative traits
Genetic basis
Heritability
 Linking phenotype to genotype
QTL analysis introduction
Limitations of QTL
Today
 Association genetics
 Effects of population structure
 Transmission Disequilibrium Tests
Quantitative Trait Locus Mapping
A
B
C
A
B
C
Parent 2
a
b
c
X
HEIGHT
Parent 1
a
b
c









A
B
C
A
B
c
B
b
Bb
A
B
C
X
A A
b B
c c
a a
BB
c c
BB
a
b
c
F1
F1
BB
a A
B b
c c
bb
modified from D. Neale
A a
b b
c c
bb
A A
b B
c C
BB
bb
a
b
c
A A
B b
c c
Bb
a a
B B
c c
Bb
Bb
GENOTYPE
A a
b B
c c
BB
a
B
c
BB
QTL for aggressive behavior in mice
http://people.bu.edu/jcherry/w
ebpage/pheromone.htm
X chromosome
F1
A
B
c
A
B
C
A
B
C
a
X b
c
a
b
c
A
B
C
a
b
c
A
X B
C
a
b
c
a a
B B
c c
A
b
c
A
B
c
a
B
c
Monoamine Oxidase A (MAOA)
A
b
c
A
b
c
Brodkin et al. 2002
Monoamine Oxidase A (MAOA)
 Selectively degrades serotonin, norephinephrine, and
dopamine
 Located near QTL for aggressive behavior on the X
chromosome
 Levels of expression affected by a VNTR
(minisatellite) locus in the promoter region
Sabol et al. 1998
MAOA and childhood maltreatment
Genotype-by-Environment interaction
Caspi et al. 2002
QTL Limitations
 Biased toward detection of large-effect loci
 Need very large pedigrees to do this properly
 Limited genetic base: QTL may only apply to
the two individuals in the cross!
 Genotype x Environment interactions rampant:
some QTL only appear in certain environments
 Huge regions of genome underly QTL, usually
hundreds of genes
 How to distinguish among candidates?
Linkage Disequilibrium and Quantitative Trait
Mapping

Linkage and quantitative trait locus
(QTL) analysis
 Need a pedigree and moderate number
of molecular markers
 Very large regions of chromosomes
represented by markers

Association Studies with Natural
Populations
 No pedigree required
 Need large numbers of genetic markers
 Small chromosomal segments can be
localized
Cardon and Bell 2001, Nat. Rev.
Genet. 2: 91-99
 Many more markers are required than in
traditional QTL analysis
ancestral
chromosomes
G
T
HEIGHT
Association Mapping






*



TT
TC
GENOTYPE
CC
recombination
through
evolutionary
history
present-day
chromosomes
in natural
population
G
C
G
T
A
C
A
C
*
G
T
A
T
*
*
Slide courtesy of Dave Neale
Candidate Gene Associations vs. Whole
Genome Scans
Candidate

If LD is high and haplotype
blocks are conserved, entire
genome can be efficiently
scanned for associations with
phenotypes
 Biased by existing knowledge
 Use "Candidate Regions" from
high LD populations, assess
candidate genes in low LD
populations
ABOVE:BELOW
If LD is low, candidate
genes are usually identified a
priori, and a limited number
are scanned for associations
Region
I
COARSE ROOT
 Simplest for case-control
studies (e.g., disease, gender)

QTL
154.1
157.3
163.4
171.3
178.2
180.8
182.1
184.2
193.5
198.1
206.8
210.6
219.9
226.5
230.3
232.7
243.1
P_204_C
S8_32
P_2385_C P_2385_A
T4_10
S15_8S5_37
T4_7S6_12
S8_29
P_2786_A S12_18
T1_13
T7_4
T3_13 T3_36
S17_21
S15_16T12_15
T2_30
S13_20
S1_20
T9_1 S1_19
S3_13
S1_24
S2_7
P_575_A
T12_22
S2_32
T7_9
S2_6
S13_16 T5_25
T5_12
T10_4
T1_26 T7_13
P_93_A
S4_20
S7_13 S7_12
T12_4
S4_24T3_10
S6_4
P_2852_A
S3_1
S6_20 S13_31
T7_15
T2_31
S8_4
S8_28
O_30_A
T5_4
T3_17
T12_12
S5_29
P_2789_A
P_634_A S17_43
S17_33
S17_12
S4_19
262.9
S17_26
0.0
8.8
11.6
12.1
13.8
15.5
17.9
20.4
22.3
23.5
24.1
25.3
26.5
29.5
36.5
43.2
50.5
52.9
54.1
59.1
60.6
85.0
95.7
107.8
121.4
124.3
129.0
135.7
148.6
150.2
152.8
Candidate
Gene Identification
Human HapMap Project and Whole Genome Scans


NATURE|Vol 437|27 October 2005
LD structure of human Chromosome 19 (www.hapmap.org)
 1 common SNP genotyped every 5kb for 269 individuals
 9.2 million SNP in total
Take advantage of haplotype blocks to efficiently scan
genome
Next-Generation Sequencing and Whole
Genome Scans

The $1000 genome is on the
horizon
 Current cost with Illumina HiSeq
2000 is about $2000 for 10X
depth

The 1000 genomes project has
sequenced thousands of human
genomes at low depth

Can detect most polymorphisms
with frequency >0.01

True whole genome association
studies now possible at a very
large scale
http://www.1000genomes.org/
Identifying genetic mechanisms of simple vs.
complex diseases

Simple (Mendelian) diseases: Caused by a single major gene
 High heritability; often can be recognized in pedigrees
 Example: Huntington’s, Achondroplasia, Cystic fibrosis, Sickle Cell
Anemia
 Tools: Linkage analysis, positional cloning

 Over 2900 disease-causing genes have been identified thus far: Human
Gene Mutation Database: www.hgmd.cf.ac.uk
Complex (non-Mendelian) diseases: Caused by the interaction
between environmental factors and multiple genes with minor effects
 Interactions between genes, Low heritability
 Example: Heart disease, Type II diabetes, Cancer, Asthma
 Tools: Association mapping, SNPs !!
 Over 35,000 SNP associations have been identified thus far:
http://www.snpedia.com
Slide adapted from Kermit Ritland
Complicating factor: Trait Heterogeneity
Same phenotype has multiple genetic mechanisms underlying it
Slide adapted from Kermit Ritland
Case-Control Example: Diabetes
 Knowler et al. (1988) collected data on
4920 Pima and Papago Native American
populations in Southwestern United States
 High rate of Type II diabetes in these
populations
 Found significant associations with
Immunoglobin G marker (Gm)
 Does this indicate underlying mechanisms
of disease?
Knowler et al. (1988) Am. J. Hum. Genet. 43: 520
Case-control test for association (case=diabetic, control=not diabetic)
Gm Haplotype
Type 2 Diabetes
present
absent
Total
present
8
29
37
absent
92
71
163
100
100
200
Total
Question: Is the Gm haplotype associated with risk of Type 2 diabetes???
(1) Test for an association
C21 = (ad - bc)2N
.
(a+c)(b+d)(a+b)(c+d)
= [(8x71)-(29x92)]2 (200)
= 14.62
(100)(100)(37)(163)
(2) Chi-square is significant. Therefore presence of GM haplotype
seems to confer reduced occurence of diabetes
Slide adapted from Kermit Ritland
Case-control test for association (continued)
Question: Is the Gm haplotype actually associated with risk of Type 2 diabetes???
The real story: Stratify by American Indian heritage
0 = little or no indian heritage;
8 = complete indian heritage
Index of indian
Heritage
0
4
8
Conclusion:
Gm
Haplotype
Percent
with
diabetes
Present
17.8
Absent
19.9
Present
28.3
Absent
28.8
Present
35.9
Absent
39.3
The Gm haplotype is NOT a risk factor for Type 2
diabetes, but is a marker of American Indian heritage
Slide adapted from Kermit Ritland
Population structure and spurious association




Assume populations are historically isolated
One has higher disease frequency by chance
Unlinked loci are differentiated between populations also
Unlinked loci show disease association when populations are
lumped together
Population with high
disease frequency
Gene flow barrier
Population with low
disease frequency
Alleles at neutral locus
Alleles causing
susceptibility to disease
Association Study Limitations
 Population structure: differences between
cases and controls
 Genetic heterogeneity underlying trait
 Random error/false positives
 Inadequate genome coverage
 Poorly-estimated linkage disequilibrium
Transmission Disequilibrium Test (TDT)
al 1993)
(Spiegelman et
 Compare diseased offspring
genotypes to parental genotypes to
test if loci violate Mendelian
expectations
Mm
mm
mm
 Controls for population structure
Mm
mm
a=# times M transmitted
Mm
b=# times M not transmitted
(a-b)2/(a+b)
Approximately distributed as 2 with 1 degree of freedom
Slide adapted from Kermit Ritland
Transmission Disequilibrium Test (TDT)
 Compared with “standard” association tests:
 Still need to have tight LD, so need many markers:
 Is not affected by population stratification
 Uses only affected progeny (and parental
genotypes), so method is efficient
 Only detects signal if there is both linkage
and association, does not depend on mode of
inheritance
Association Tests and Population Structure
 Transmission disequilibrium
tests have limited power and
range of application
 sample size limitations
 restricted allelic diversity
 “Genomic Control” uses
random markers throughout
genome to control for false
associations
 “Mixed Model” approach
allows incorporation of known
relatedness and population
structure simultaneously Cardon and Bell 2001 Nature Reviews Genetics 2:91
ANOVA/Regression Model
(monotonic)
transformation
effect size
(regression coefficient)
error
(residual)
phenotype
(response variable)
of individual i
p(β=0)
coded genotype
(feature) of individual i
Goal: Find effect size that explains best all (potentially
transformed) phenotypes as a linear function of the genotypes
and estimate the probability (p-value) for the data being
consistent with the null hypothesis (i.e. no effect)
http://www2.unil.ch/cbg/index.php?title=Genome_Wide_Association_Studies
Mixed Model
effects of
background
SNPs
phenotype
(response variable)
of individual i
effect of target SNP
Family effect
(Kinship
coefficient)
Population Effect (e.g.,
Admixture coefficient from
Structure or values of Principal
Components)
Implemented in the Tassel program (Wednesday in lab)
Commercial Services for Human GenomeWide SNP Characterization
NATURE|Vol 437|27 October 2005

Assay 1.2 million “tag SNPs” scattered across genome using
Illumina BeadArray technology

Ancestry analyses and disease/behavioral susceptibility