Transcript Document
Standard land plant barcoding requires a
multi loci approach?
Robyn Cowan
Sujeevan Ratnasingham
Peter Gasson
Mitochondrial DNA in land plants:
•undergoes rearrangements
•transfer of genes to nucleus
•incorporation of foreign genes
•substitution rates are VERY slow
(with a few notable exceptions e.g. Plantago, Cho & al.)
Eubacteria
20 % aa
Land Plants
Chlorophyceae
Rhodophyta
Prymnesiophyta
Eustigmatophyceae
Bacillarioophyceae
Cryptophyta
Phaeophyceae
Xanthophyceae
Fungi
Chlorophyceae
Animalia
Chloroarachniophyta
Dinoflagellata
Euglenophyta
Kinetoplastida
Ciliata
Apicomplexa
Resolving Species Through DNA Barcoding
COI Divergence in Eukaryotes
Partners
Instituto de Biologia UNAM,Mexico – Gerardo Salazar
Imperial College, UK - Timothy Barraclough
Natural History Museum, Denmark - Gitte Petersen
Natural History Museum (London), UK - Mark Carine
New York Botanical Garden, USA - Kenneth Cameron
Royal Botanic Garden Edinburgh, UK - Peter Hollingsworth
Royal Botanic Gardens, Kew, UK - Mark Chase
South African National Biodiversity Institute - Ferozah Conrad
University of Cape Town, South Africa - Terry Hedderson
U. Estadual de Feira de Santana, Brazil - Cássio van den Berg
Universidad de los Andes - Santiago Madriñán
U. of Wales Aberystwyth UK (previously University of Reading, UK) - Mike
Wilkinson
Alfred P. Sloan Foundation
Gordon and Betty Moore Foundation
To develop a universal approach to
barcoding of all landplants
• Phase 1: primer development (protein motifs); complete
genome sequences; problems: ferns; 46 pairs of sister taxa
from mosses, liverworts, hornworts, lycopods, ferns/fern
allies, gymnosperms, angiosperms – percent PCR success
& percent polymorphisms
• Phase 2: in depth trials of six markers identified in phase I
on a range of well sampled taxa from across land plants
So what are the characteristics of a good barcode?
•High inter-specific, low intra-specific sequence divergence
•Universal amplification/sequencing with standard primers
•Technically simple to sequence
•Short enough to sequence in one reaction
•Easily alignable (few insertions/deletions)
•Readily recoverable from museum or herbarium samples
and other degraded samples
**Universal + Variable**
What sort of marker should we use?
•Mitochondrial DNA
•Plastid
•Ribosomal DNA (ITS)
•Low-copy nuclear DNA (protein coding)
•Length variable ?
•Single loci
•Multiple loci (one genomic compartment) ?
•Multiple loci (two genomic compartments) ?
Advantages of plastid DNA (hence its use in
phylogenetics)
•Monomorphic (separation of different copies not required
in hybrids)
•High copy number (can even be amplified from highly
degraded DNA)
•Potentially highly diagnostic (in spite of its reputation to
the contrary)
However, will not detect hybrids, introgression, paralogy
Coding or non-coding?
Non-coding regions:
sometimes more variable
microsatellites difficult to sequence through
numerous indels-impossible to align, length variable
cannot translate to check for pseudoproteins and to
aid aligment
sometimes contain rearrangements and coding
insertions
(character based identification)
trnH-psbA spacer region
Criterion for locus selection
1.
2.
3.
4.
Species level sequence divergence
Appropriate length (200-800bp)
Presence of conserved primer target sites
At least 200bp exon sequence
Our Strategy
1.
2.
3.
4.
5.
6.
Identify suitable loci on the basis of in silico screens using
Nicotiana cp sequence
Design universal primers (sets of 4 primers/locus) using amino
acid and nucleic acid sequence data
Perform initial screen for universality (1 primer pair)
Screen for sequence variation using diverse species pairs
Improve universality (e.g. use all primer combinations)
Use statistical modelling approaches to identify optimal primer
sets
Standard PCR Recipe
• NH4 x1
• Mg2+ 1.5mM
• dNTPs 0.2mM
• FW test primer 1M
• RE test primer 1M
• Taq DNA polymerase 2 units
• BSA 0.1mg/ml
• Template 40ng
• Water to 20l
Results of First PCR
accD
rpoC2
ndhA
YCF9
YCF5
matK
70
67
57
45 23
%
success
90 80 80 76 68
66
65
64
61
52
41 28
rpl22
YCF2
71
rpoB
rpoC1
73
ndhJ
99 88 88 84 75
ndhK
Total
success
Gene
Number of Variable Sites
rpoC2 (7)
rpoB (4)
rpoC1 (3)
ndhJ (2)
ndhA (8)
YCF2 (5)
accD (6)
210
300
188
226
61
125
102
51
95
814
414
394
700
475
578
163
366
328
185
423
62
53
43
40
39
37
34
31
28
22
% sites 63
variable
ndhK (1)
YCF5 (10)
256
Length
YCF9 (9)
matK (11)
Variable 514
sites
Gene
Trial regions
Selected seven genes that represent the different
levels of universality and variability. Blue= high,
green = medium, yellow= low.
matK
YCF5
rpoC1
ndhJ
rpoB
YCF2
ndhK
accD
rpoC2
Universality
YCF9
Variability
ndhA
Gene
Trial groups
Asterella Anastrophyllum-Barbilophozia Tortella Bryum Triquetrella
Homalothecim Tortella Elaphoglossum Asplenium Equisetum Cupressus
Pinus Araucaria Labordia Conostylis Dactylorhiza maculata/incarnata
Mimetes Inga Hordeum Scalesia Crocus Laelia Cattleya Mormodes
Deiregyne Lauraceae
Group
Family
Primary genera
accD
matK
ndhJ
rpoB
rpoC1
Angio asterids
Asteraceae
Scalesia
1+3
2.1+5
1+3
1+3
1+3
Angio asterids
Loganicaceae
Labordia
2+4
X+5
1+4
1+3
2+4
Angio eudicots
Proteaceae
Mimetes
1+4
*
*
*
1+4
Angio magnoliids
Lauraceae
Lauraceae
2+4
X+5
2+4
2+3
2+4
Angio monocot
Agavaceae
Agave
1+4
2.1+3.2
1+4
1+4
1+4
Angio monocot
Haemodoraceae
Conostylis
2+4
X+5
1+3
2+3
2+4
Angio monocot
Iridaceae
Crocus
2+4
2.1a+5
1+3
2+3
1+3
Angio monocot
Orchidaceae
Aulosepalum
1+4
2.1+3.2
1+4
1+4
1+4
Angio monocot
Orchidaceae
Cattleya
2+4
2.1a+5
*
2+3
2+4
Angio monocot
Orchidaceae
Dactylorhiza
2+4
X+5
1+3
2+3
2+4
Angio monocot
Orchidaceae
Sophronitis
2+4
2.1a+5
1+3
1+3
2+4
Angio monocot
Poaceae
Hordeum
Missing
2.1a+5
1+3
2+3
2+4
Angio rosids
Fabaceae
Inga
2+4
X+3.2
1+3
1+3
2+4
Fern
Aspleniaceae
Asplenium
*
*
LP1+LP5
*
*
Fern
Dryopteridaceae
Elaphoglossum
LP1+LP4
*
*
*
LP1+LP5
Fern ally
Equisetaceae
Equisetum
1+LP3
FE+RE
LP1+LP4
LP1.1+LP4.3
LP1+LP5
Gymnosperm
Araucariaceae
Araucaria
2+4
FE+RE?
1+3
2+LP3
2+4
Gymnosperm
Cupressaceae
Cupressus
1+4
*
1+4
*
2+4
Gymnosperm
Pinaceae
Pinus
2+4
FE+RE
Missing
2+LP3
2+4
Gymnosperm
Zamiaceae
Encephalartos
1+4
FE+RE
*
2+3
1+4
Liverwort
Aytoniaceae
Asterella
2+4
*
1+3
*
2+4
Liverwort
Lophoziaceae
Anastrophyllum
*
*
LP1+LP4
*
2+4
Moss
Bryaceae
Bryum
*
*
LP1+LP4
LP1.1+LP5.2
2+4
Moss
Pottiaceae
Tortella
*
*
*
LP1.1+LP3.2
LP1+4
Moss
Pottiaceae
Triquetrella
*
*
LP1+LP4
LP1.1+LP5.2
1+3
Moss
Ptychomniaceae
*
1+4
*
LP1.1+L
P5.3
LP1.1+LP5.3
1+4
Summary
rpoC1 accD ndhJ rpoB matK
5/25
5/20
5/19
8/20 6/16
Trial regions
Selected seven genes that represent the different
levels of universality and variability. Blue= high,
green = medium, yellow= low.
matK
YCF5
rpoC1
ndhJ
rpoB
YCF2
ndhK
accD
rpoC2
Universality
YCF9
Variability
ndhA
Gene
Agavaceae X 22 sp.
Crocus X 9 sp.
Aulosepalum X 8 sp.(?all)
Cattleya X 30sp.(2 clades approx 43 sp.)
Dactylorhiza 15 sp. (species complex)
Sophrinitis 27 sp. (approx. 37 sp.)
Scalesia X 4 (species complex)
Conostylis X 42 (?all)
Equisetum X 14
Pinus X 66
Hordeum X 10
Lauraceae
Samples with unique ‘barcode’
Gaps as a 5th State
Gaps = missing data
With duplicates removed
Haplo
types
%
matK
201
69.55% 200
69.20%
186
64.36%
rpoB
129
44.64% 129
44.64%
122
42.21%
rpoC1
124
42.91% 124
42.91%
120
41.52%
rpoB + matK
234
80.97% 234
80.97%
214
74.05%
rpoB+rpoC1
184
63.67% 184
63.67%
175
60.55%
rpoC1+mak
235
81.31% 234
80.97%
214
74.05%
rpoC1+rpoB+matK
251
86.85% 251
86.85%
229
79.24%
Haplo
types
Haplotyp
es
Individuals
366
366
366
289
Species
289
289
289
289
Users of DNA Barcoding:
‘The Traffic Light approach’
Green - non-problematic taxa (current
markers appropriate, silver standard)
Orange - need for gold standard
(polyploidy, introgression, paralogy)
Red - barcoding needs investigation,
species complex, etc