Transcript Document
Marker Assisted Selection in Tomato
Pathway approach for candidate gene identification and
introduction to metabolic pathway databases.
Identification of polymorphisms in data-based sequences
MAS – forward selection, background selection, combining traits,
relative efficiency of selection
Why (population) size matters
Example: QTL for color
uniformity in elite crosses
TG114
TG130
3.8
20.0
18.5
TG165
LEOH23
10.9
18.4
24.7
CT141
LEOH17*
12.8
TG273
9.3
5.2
10.6
LEOH17*
7.0
3.1
TG465
IL1-3
10.7
TG260
LEOH7
LEOH17*
CT178
9.1
6.5
CD51
TG129
CT93
LEOH16
TG96
TG100A
19.5
CT118
CT194
16.5
TG246
1.6
CT85
13.6
8.2
0.0
18.2
CT191
7.7
CT82
TG154
9.9
LEOH37
LEOH37
5.7
CT50
TG500
28.3
TG185
TG163
LEOH10
LEOH10
10.1
TG580
QTL Trait
Origin
2
L, YSD
S. lyc.
4
YSD
S. lyc.
6
L, Hue
ogc
7
L, Hue
S. hab.
11
L, Hue
S. lyc.
TG214
15.1
TG255
LEOH17*
12.5
10.1
TG167
TG151
3.0
3.0
CT157
18.5
LEOH17*
23.5
TG537
TG441
CT167
18.5
LEOH15*
10.2
TG59
5.8
1.1
TG645
3.6
13.4
IL3-3
IL1-2
CT149
7.2
TG469
CT101
23.6
CT244
6.1
Marker
Name
7.9
LEOH17*
TG483
TG520
IL2-4
TG14
TG125
CT62
TG15
12.4
IL4-3
IL1-1
TG67
LEOH36
LEOH36
10.7
CT205
LEOH15*
Dist
cM
Marker
Name
IL5-2
5.2
Dist
cM
Chr 5
IL4-1
Marker
Name
13.9
17.1
2.0
Dist
cM
TG608
CT233
2.1
Marker
Name
Chr 4
IL4-4
Dist
cM
Marker
Name
Chr 3
IL3-1
Dist
cM
Chr 2
IL3-2
Chr 1
Audrey Darrigues, Eileen Kabelka
Carotenoid Biosynthesis:
Candidate pathway for
genes that affect color and
color uniformity.
Disclaimer: this is not the only
candidate pathway…
Databases that link pathways to genes
http://www.arabidopsis.org/help/tutorials/aracyc_intro.jsp
Databases that link pathways to genes
http://metacyc.org/
http://www.plantcyc.org/
http://sgn.cornell.edu/tools/solcyc/
http://www.arabidopsis.org/biocyc/index.jsp
http://www.arabidopsis.org/help/tutorials/aracyc_intro.jsp
External Plant Metabolic databases
CapCyc (Pepper) (C. anuum)
CoffeaCyc (Coffee) (C. canephora)
SolCyc (Tomato) (S. lycopersicum)
NicotianaCyc (Tobacco) (N. tabacum)
PetuniaCyc (Petunia) (P. hybrida)
PotatoCyc (Potato) (S. tuberosum)
SolaCyc (Eggplant) (S. melongena)
http://www.plantcyc.org:1555/
Note: missing step
(lycopene isomerase,
tangerine)
Check boxes (Note:
MetaCyc has many more
choices, but no plants)
Capsicum annum
sequence retrieved
Scroll down page
http://www.ncbi.nlm.nih.gov/
Select database
Query CCACCACCATCCTCACTTTAACCCACAAATCCCACTTTCTTTGGCCTAATTAACAATTTT
|||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||
Sbjct CCACCACCATCCTCACTTTAACCCACAAATCCCATTTTCTTTGGCCTAATTAACAATTTT
Zeaxanthin epoxidase
Probable location on Chromosome 2
Alignment of Z83835 and EF581828 reveals 5 SNPs over
~2000 bp
51 annotated loci
Candidates
identified in
other databases
are here
Information missing from
other databases is here…
Comment on the databases:
Information is not always complete/up to date.
Display is not always optimal, and several steps may be needed to
go from pathway > gene > potential marker.
Sequence data has error associated with it. eSNPs are not the same
as validated markers.
There is a wealth of information organized and available.
We will be asking for feed-back RE how best to improve the SGN
database and access via the Breeders Portal
The previous example detailed how we might identify sequence
based markers for trait selection.
Query CCACCACCATCCTCACTTTAACCCACAAATCCCACTTTCTTTGGCCTAATTAACAATTTT
|||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||
Sbjct CCACCACCATCCTCACTTTAACCCACAAATCCCATTTTCTTTGGCCTAATTAACAATTTT
Improving efficiency of selection in terms of 1) relative efficiency
of selection, 2) time, 3) gain under selection and 4) cost will benefit
from markers for both forward and background selection.
Remainder of Presentation will focus on
Where to apply markers in a program
Forward and background selection
Marker resources
Alternative population structures and size
Comparison of direct selection with indirect
selection (MAS).
Line-mean heritability (H) for color
H
H
Trait
plant SE
line
BRIX
0.14 0.09 0.40
Color-L
0.11 0.04 0.57
Color-Hue
0.07 0.04 0.39
Color unif. -L
0.14 0.11 0.63
Color unif. -Hue
0.13 0.10 0.64
SE
0.26
0.22
0.23
0.25
0.23
Indirect Selection
Prop Vp SE
L-MAS
Hue-MAS
Ldiff-MAS
Hdiff-MAS
0.25
0.15
0.28
0.32
Relative efficiency of selection:
r(gen) x {Hi/Hd}
Line performance over locations >
MAS > Single plant
0.15
0.06
0.16
0.14
Accelerating Backcross Selection
F1 50:50
Expected proportion of
Recurrent Parent (RP)
genome in BC progeny
BC1 75:25
BC2 87.5:12.5
BC3 93.75:6.25
BC4 96.875:3.125
Two-stage selection
Select for RP
genome at
Select for target
unlinked
allele
markers
Three-stage selection
Select for RP
recombinants at
Select for target
flanking
allele
markers
Four-stage selection
Select for RP
recombinants at
Select for target
flanking
allele
markers
Select for RP
genome at
unlinked
markers
Select for RP
genome on
carrier
chromosome
Select for RP
genome at
unlinked
markers
References:
Frisch, M., M. Bohn, and A.E. Melchinger. 1999.
Comparison of Selection Strategies for Marker-Assisted
Backcrossing of a Gene. Crop Science 39: 1295-1301.
Progeny needed for Background Selection During MAS
Q10 of RP genome in percent
20
Two-Stage
BC1
76.7
BC2
90.3
BC3
95.8
Three-Stage
BC1
71.2
BC2
86.1
BC3
94.4
Population Size
60
80
100
40
125
150
200
78.7
91.9
96.2
79.7
92.8
97.1
80.3
93.3
97.3
80.7
93.6
97.4
81.3
93.9
97.5
81.7
94.0
97.6
82.2
94.6
97.8
72.7
87.2
95.7
73.4
88.5
96.5
73.6
89.3
96.9
73.3
90.2
97.2
73.2
90.7
97.3
72.8
91.3
97.5
72.2
91.8
97.6
Q10 indicates a 90% probability of success
From Frisch et al., 1999.
Marker Data Points required (Modified from Frisch et al., 1999;
based on assumption of 12 chromosomes; initial selection with 4
markers/chromosome)
Two-Stage Selection
BC1
BC2
BC3
Total Marker points
Cost
0.15
0.20
0.25
Three-Stage Selection
BC1
BC2
BC3
Total Marker points
Cost
0.15
0.20
0.25
60
2880
900
228
4008
601.2
801.6
1002.0
Population Size
80
100
125
3840
4800
6000
1164
1416
1716
264
300
348
5268
6516
8064
790.2
977.4 1209.6
1053.6 1303.2 1612.8
1317.0 1629.0 2016.0
2880
3840
4800
6000
492
708
960
1308
250
444
504
576
3622
4992
6264
7884
543.3
748.8
939.6 1182.6
724.4
998.4 1252.8 1576.8
905.5 1248.0 1566.0 1971.0
For effective background selection we need:
Markers for our target locus (C > T SNP for Zep)
Markers on the target chromosome (Chrom. 2)
Markers unlinked to the target chromosome
http://www.tomatomap.net
http://sgn.cornell.edu/
Ovate
HBa0104A12
44 polymorphic
markers
55 polymorphic
markers
Missing data in SGN
Limited ability to generate tables, PCR conditions
sometimes incomplete, Enzyme sometimes missing, SNP not
described.
Missing data in Tomatomap.net
SNP and sequence context requires BMC genomics
supplemental table , ASPE primers, GoldenGate primers.
2007. BMC Genomics 8:465
www.biomedcentral.com/content/pdf/1471-2164-8-465.pdf
Where can we expect to be?
TA496 ESTs with SNPs VS H1706 BAC sequences
n=1
n=2
n=3
n=4
Total
806
596
106
34
22
Where EST Coverage = Allele Coverage
n=1
n=2
n=3
n=4
Total
127
not tested
64
22
11
Proportion
0.16
0.60
0.65
0.50
n = 5-10
38
n > 10
10
n = 5-10
23
0.61
n > 10
7
0.70
analysis by Buell et al., unpublished
Data based on estimated ~42% of sequence, therefore expect as
many as 300 markers for a cross like E6203 x H1706
QTL’s mapped in a bi-parental cross may not be
appropriate for MAS in all populations…
Marker allele and trait may not be linked in all populations.
Genetic background effects may be population specific.
Original association may be spurious.
QTL detection is dependent on magnitude of the difference
between alleles and the variance within marker classes.
What about mapping and MAS in unstructured
populations?
A brief introduction to “Association Mapping” follows.
“Association Mapping” statistical model – designed to account
for population structure (Q), correct for genetic background
effects (Z), and identify marker-trait linkage (Marker)
Y=μ
REPy +
Qw + Markerα
+
Zv
+
Error
Processing
Fresh market
1.0
1.0
0.9
0.9
0.8
LD measure (R2)
LD measure (R2)
0.8
0.7
0.6
0.5
0.4
0.3
0.7
0.6
0.5
0.4
0.3
0.2
0.2
y = -0.054ln(x) + 0.2583
0.1
0.1
y = -0.037ln(x) + 0.1713
0.0
0.0
0
0
20
40
60
80
100
120
20
40
140
60
80
100
120
140
Distance between loci (cM)
Distance between loci (cM)
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
Tomato populations will have sub-structure
K=4
1
2
3
4
1) Fresh Market (FM) ; 2) Landrace; 3) Heirloom; 4) Processing
K=8
1 2
3
4
5
6
7
8
1,6,7) Processing; 2) Landrace: 3,5) FM; 4) FM & Processing;
8) Heirloom
Output from Pritchard’s “STRUCTURE”
Association mapping
Incorporates population structure and coefficient of relatedness
The number of markers needed depends on the rate of LD
decay (reflects recombination history)
Highly specific to “inference population”
wild species vs breeding program
Sensitive to marker coverage
LD decay and number of alleles (Nor, gf, and others all
have multiple alleles within populations used by breeders)
Will not be able to “map” traits where trait variation overlaps
with population structure.
Even without sequence or marker data, there are
lessons for practical breeding:
Use pedigree data, knowledge of population
structure, and objective data to increase
precision of estimates of breeding value.
Take home messages:
Marker resources exist for forward and background selection
in elite x elite crosses in tomato.
Marker resources are currently not sufficient for QTL
discovery in bi-parental or AM populations; they will soon be.
The best time to use genetic markers : early generation
selection
Restructuring of breeding program to integrate markers may
include:
1) Increasing genotypic replication (population size) at the
expense of replication (consider augmented designs).
2) Collecting objective data.
Further discussion of AM approach in session VI
“Unstructured mapping of bacterial spot resistance”
References:
Kaepler, 1997. TAG 95:618-621.
Frisch, et al., 1999. Crop Science 39: 1295-1301.
Knapp and Bridges, 1990. Genetics 126: 769-777.
Yu et al., 2006. Nature Genetics 38:203-308.
Van Deynze et al., 2007. BMC Genomics 8:465
www.biomedcentral.com/content/pdf/1471-2164-8-465.pdf