Association Mapping in Wheat Flavio Breseghello Advisor

Download Report

Transcript Association Mapping in Wheat Flavio Breseghello Advisor

Association Mapping as a Breeding Strategy
Mark E. Sorrells and Flavio Breseghello
Department of Plant Breeding & Genetics
Cornell University
Presentation Overview
A Genetic Model for Association Mapping in Plant Breeding
Populations
Comparison of Different Plant Breeding Materials for
Association Mapping
Association Mapping of Kernel Size and Milling Quality in Soft
Winter Wheat Cultivars
A Genetic Model for AM in Plant Breeding Populations:
Association as Conditional Probabilities
Gene
Marker
Population genetics theory
(Hedrick 2005)
Recombination (c)
Breeding Pool
Gene={a}
Marker={m,M}
New Parent
(A,M)
t generations
Pr(A,M)=φ
Pr(a,M)=θ
Pr(a,m)=1-φ-θ
Pr(A,m)=0
Pr(A|M,c,t,φ,θ,w)
“Probability of a plant with marker allele M
to have gene allele A, t generations after the
introduction of A”
Recombination x initial frequency
of M in the breeding pool
1.0
Freq. new parent:
φ=0.05
c=0.01
θ=0
0.8
Relative fitness: w=1
0.6
0.2
0.4
θ=0.05
θ=0.25
0.0
A novel marker allele at 10
cM distance can be more
predictive of the QTL allele
than an allele 1 cM away if
it was present in the original
pop at a freq of 0.05
Pr(A|M)
Freq. Recombination c
0
  0.05
  0.25
c=0.10
Pr(A|M)
Freq. M in original
population = θ
c=0.05
0
5
~8
10
Generations
t Generations
15
~18
20
Recombination x selection for M
• The generation at which the marker is depleted
[Pr(A|M)=Pr(A)], depends on the selection intensity
applied;
• The final frequency of A depends on selection and
tightness of linkage between marker and gene.
Pr(A|M)
Pr(A)
Generations
Freq. new parent: φ=0.05
Relative fitness: w = 4 (red),
2 (green), 1.25 (blue)
Freq. M in original pop: 0
Freq. Recombination: c =
0.01, 0.05, 0.10
Summary
• In plant breeding populations, the locus most associated
with the trait is not necessarily the closest locus;
• Loosely linked markers can still be useful for MAS if
high intensity of selection is applied.
MAS for Complex Traits: Issues
• Accurate detection and estimation of QTL effects
• Pre-existing marker alleles in a breeding population can be
linked to non-target QTL alleles
• Multiple QTL alleles can have different relative values
• Gene x gene and gene by environment interactions
Association Analysis as a Breeding Strategy
• Most association studies have focused on estimating
linkage disequilibrium and fine mapping.
• Breeding programs are dynamic, complex genetic
entities that require frequent evaluation of marker /
phenotype relationships.
Breseghello, F., and M.E. Sorrells. 2006. Association mapping of kernel size
and milling quality in wheat (Triticum aestivum L.) cultivars. Genetics
172:1165-1177.
Breseghello, F., and M.E. Sorrells. 2006. Association analysis as a strategy
for improvement of quantitative traits in plants. Crop Sci. In press.
Association Mapping versus QTL Mapping
• Association Mapping can be conducted directly on the breeding material,
therefore:
• Direct inference from data analysis to breeding is possible
• Phenotypic variation is observed for most traits of interest
• Marker polymorphism is higher than in biparental populations
• Routine variety trial evaluations provide phenotypic data
• Association Mapping provides other useful information about:
• Organization of genetic variation in relevant breeding populations
• Novel alleles can be identified and their relative value can be
assessed as often as necessary
Association Mapping versus QTL Mapping
•
Type I error (false positives) can be higher because of:
•
Unaccounted population structure
•
Simultaneous selection of combinations of alleles at different loci
•
High sampling variance of rare alleles
•
Type II error can be higher (low power) because of:
•
Lower LD than in biparental mapping populations
•
Unbalanced design due to differences in allele frequencies
•
A larger multiple-testing problem because of lower LD
Integration of Association Analysis in a Breeding Program
Germplasm
Parental
Selection
Hybridization
New Populations
Marker Assisted Selection
Novel & Validated
QTL/Marker
Associations
Selection
(Intermating)
New Synthetics, Lines, Varieties
Evaluation Trials
Genotypic &
Phenotypic data
Elite Synthetics, Lines, Varieties
Elite germplasm
feeds back into
hybridization
nursery
Types of Populations
• Germplasm Bank Collection
• A collection of genetic resources including landraces, exotic material and
wild relatives.
• Synthetic Populations
• Outcrossing populations (either male-sterile or manually crossed)
synthesized from inbred lines. May be used for recurrent selection.
• Elite Lines
• Inbred lines (and checks) manipulated with the objective of releasing new
varieties in the short term.
Characteristics Related to Association Mapping:
Practical aspects
Synthetic
Populations
Aspects of AM
Germplasm bank
Elite Germplasm
Sample
Core-collection
Segregating
progenies
Elite lines and checks
Sample turnover
Static
Ephemeral
Gradually substituted
Source of
phenotypic data
Screenings
Progeny tests
Yield trials
Type of traits
High heritability traits;
Domestication traits
Depends on the
Low heritability traits:
evaluation scheme yield, resistance to
abiotic stresses
Characteristics Related to Association Mapping:
Genetic Expectations
Aspects of AM
Germplasm bank
Synthetic Populations
Elite Germplasm
Linkage
Disequilibrium
Low
Intermediate and
fast-decaying
High
Medium
Low
High
Allele diversity
among samples
High
Intermediate
Low
Allele diversity
within samples
Variable
1 or 2 alleles
(diploid species)
1 allele
(inbred lines)
Population
structure
Characteristics Related to Association Mapping:
Potential Applications
Aspects
Germplasm bank
Synthetic Populations
Elite Germplasm
Power
Low
Intermediate and
decreasing
High; could allow
genome scan
Resolution
High; could allow fine Intermediate and
mapping
increasing
Low
Use of
significant
markers
Transfer of new alleles Incorporation in
by marker-assisted
selection index
backcross
Forward Breeding MAS in progenies
(requires validation)
Previous QTL information
Width
• Doubled-Haploid Population AC Reed x
2D
Grandin
• QTL for kernel size (width) near Xwmc18-2D
• Recombinant Inbred Population Synthetic
W7984 x Opata (ITMI population)
• QTL for kernel size (length) on 5A and 5B
Length
5B
Association Analysis
Materials
•
95/149 soft winter wheat cultivars from the Northeastern US: Mostly recent releases,
representing 35 seed companies / institutions
•
93 SSR loci: 33 on 2D, 20 on 5A, 9 on 5B, 31 on 16 other chromosomes
• Rare alleles (freq<5%):considered as missing for LD and population structure analysis;
considered as allele for AM analysis
Methods
•
Population Structure: 36 “unlinked” SSR markers- Structure without admixture, SPAGeDi
(Hardy & Vekemans) program for Kinship ; Visualization: Factorial (Multiple)
Correspondence Analysis (Benzecri, 1973 L' Analyse des correspondances. Dunod)
•
Linkage Disequilibrium: Tassel (maizegenetics.net) used to compute r2 , with p-values from
1000 permutations
•
Association Analysis: R stats package lme used to analyze Linear mixed-effects model with
marker as fixed effects (selected from previously identified QTL regions) and
subpopulations or Kinship as random effects (no obvious differentiating characteristics);
Two-marker models: tested by likelihood ratio test
•
Jianming Yu, Gael Pressoir, et al. (2006) A Unified Mixed-Model Method for Association Mapping
Accounting for Multiple Levels of Relatedness Nature Genetics 38:203-208
Estimating Relatedness
The K Matrix
Fij = (Qij-Qm)/(1-Qm)
(Ritland, Loiselle)
F11
i
If Fij is negative, then it is set
to zero.
………….
j
.
..
..
..
.
Θij≅ Fij
Fnj …… Fnn
Relatedness (K)
In cattle studies the analogous matrix is estimated from pedigrees, and it
controls for the polygene effect
Jianming Yu, Gael Pressoir, et al. (2006) A Unified Mixed-Model Method for Association
Mapping Accounting for Multiple Levels of Relatedness Nature Genetics 38:203-208
Population Structure:
Sample Subdivisions
S
2
S
3
S
4
Moderate Population Subdivision
S
1
Subpopulation No. of Varieties Fst
1
19
0.337
2
32
0.111
3
13
0.295
4
31
0.064
Total
95
0.188
Population Structure:
Factorial Correspondence Analysis
Orthogonal views of 4 soft winter wheat subpopulations
S2
S3
S4
S1
Linkage Disequilibrium:
Germplasm Sample Selection
• 149 lines genotyped
with 18 unlinked SSR
markers
R2
probability for unlinked SSR markers
149
lines
• Most similar lines
were excluded
• "Normalizing" the
sample drastically
reduced LD among
unlinked markers
95
lines
p<.0001
p<.001
p<.01
Definition of a baseline-LD specific for our sample
Defined as the 95th percentile of the distribution of r2 among unlinked loci
r2 estimates above this value are probably due to genetic linkage
Baseline LD for this sample: r2 = 0.0654
8
Normal curve
4
Normal Distr.
95th percentile
0.00
0
0.02
0.04
2
0.06
2
r
Density
0.08
0.10
6
0.12
LD
baseline
LD baseline
0.0
0
100
200
300
400
500
600
0.1
0.2
0.3
Correlation Coefficient r
0.4
Linkage Disequilibrium: Chromosome 2D
0.3
0.2
0.1
Baseline LD
0.0
r
2
0.4
0.5
0.6
Consistent LD was below 1 cM,
localized LD 1-5 cM
0
20
40
60
cM
80
100
Linkage Disequilibrium: Chromosome 5A
0.2
0.4
r
2
0.6
0.8
1.0
Significant LD extended for 5 cM
in pericentromeric region
0
~5 cM
0.0
Baseline LD
10
20
30
cM
40
50
Loci Associated with Kernel Size (p-values)
Chromosome 2D
Agreed with
QTL in
Reed x Grandin
Likelihood
Ratio Test
Kernel Size
**
Locus
cM Name
Weight
Area
Length
NY
OH
NY
OH
NY
OH
Width
NY
OH
7
Xcfd56
0.069
0.160
0.012
0.119
0.076
0.031
0.000*
0.252
11
Xwmc111
0.005
0.020
0.005
0.108
0.003’
0.107
0.000*
0.000**
23
Xgwm261
0.145
0.016
0.019
0.009
0.027
0.009
0.058
0.001*
28
Xwmc112
0.012
0.057
0.047
0.120
0.480
0.367
0.001*
0.024
64
Xgwm30
0.081
0.862
0.053
0.848
0.312
0.820
0.000**
0.212
91
Xgwm539
0.042
0.038
0.030
0.039
0.001*
0.005
0.290
0.334
Milling Quality
None of the loci on 2D were significant after multiple testing correction
Loci Associated with Kernel Size (p-values)
Likelihood
Ratio Test
Chromosome 5A
n.s.
**
Agreed with
QTL in
M6 x Opata
Kernel Size
Locus
cM Name
Weight
Area
Length
NY
OH
NY
OH
NY
OH
Width
NY
OH
55
Xcfa2250
0.021
0.007
0.044
0.014
0.014
0.002*
0.637
0.649
55
Xwmc150b
0.002*
0.003
0.003
0.005
0.009
0.002*
0.093
0.429
56
Xbarc117
0.009
0.002*
0.021
0.005
0.118
0.022
0.044
0.039
60
Xbarc141
0.631
0.037
0.232
0.024
0.038
0.002*
0.852
0.863
Milling Quality
cM
Locus
55
Xcfa2250
Milling
Score
Flour Yield
ESI
Friability
Break-Flour
Yield
0.010
0.029
0.047
0.002*
0.081
B.L.U.E. of allele effects
Kernel Length
N. of Cultivars:
9
5
18
37
9
9
41
45
43
49
B.L.U.E. of allele effects
Kernel Width
N. of Cultivars:
41
14
8
15
18
24
5
10
19
B.L.U.E of allele effects
Kernel Weight
N. of Cultivars:
41
45
43
49
Conclusions
• Linkage Disequilibrium
• Variation in LD across the genome can be characterized in relevant germplasm
• Markers closely linked to QTL of interest can be identified and allelic effects quantified
• Association Mapping as a Breeding Strategy
•
For recurrent selection, markers could be used to carry information from a “good year” to a
“bad year”
•
In pedigree breeding, markers could carry information about traits of interest from replicated
field trials to single row or single plant selection
• Allelic values of previously identified alleles can be updated annually based on advanced
trial data combined with genotypic data
• New alleles can be identified and characterized to determine their relative value
• A selection index can be used to incorporate both phenotypic and molecular data
Acknowledgements
• USDA Soft Wheat Quality Lab, Wooster, OH
• Embrapa
A
rro
zeF
e
ijã
o
Technical Support:
• David Benscher
• James Tanaka
• Gretchen Salm
Kangaroo Island
Wayne Powell