Systematic Implications of DNA variation in subfamily
Download
Report
Transcript Systematic Implications of DNA variation in subfamily
Plant Molecular Systematics
Spring 2014
“Problems” with morphological
data…
• Convergence and parallelisms
• Reduction and character loss
• Phenotypic vs. genotypic differences
• Evaluation of homology
• Misinterpretation of change or polarity
• Limitation on number of characters
• Phenotypic plasticity
Always searching for new
types of characters…
Is molecular data intrinsically
better than morphological data?
Central Dogma
Central Dogma
Lipid pigments:
chlorophyll
lycopenes
xanthophylls
carotene
Phenolics: Iridoid
flavonols compounds
flavones
tannins
anthocyanins
Terpenes
Secondary Metabolites
Alkaloids
(N-containing)
e.g. nicotine
caffeine
morphine
betalains
Development of Molecular
(Chemical) Systematic Methods
“Chemosystematics”
•
•
•
•
•
•
•
Early methods relied on chromatography to separate complex
mixtures of secondary metabolites, detect them, and then compare
between taxa “spot botanists” – very phenetic
Better separation and identification methods developed – used
pathway stages as cladistic characters - phytochemistry
Move away from secondary metabolites to proteins
Early protein studies used immunological reactions
Development of improved electrophoretic methods – permitted
direct protein comparisons between taxa
Comparison of seed storage proteins
Development of direct estimates of genetic relationships based on
allele frequency of enzyme variants
Molecular (DNA) Systematics
• Next step was to examine DNA directly
through examination and comparison of
restriction fragments (RFLP bands)
• Technology evolved to make it feasible to
sequence DNA directly
• Initially limited to single genes or noncoding regions
• Now feasible to sequence large numbers
of genes or regions or increasingly even
whole genomes relatively quickly
Molecular Systematics
- Can obtain phylogenetically informative
characters from any genome of the organism
- Assumes that genomes accumulate molecular
changes by lineage, as morphological
characters do
- Possibly greater assurance of homology with
molecular data (less likely to misinterpret
characters) but homoplasy happens!
- Principal advantages are the much greater
number of molecular characters available &
greater comparability across lineages
How big are genomes of organisms?
Genomes of the Plant Cell
Plastid
Nuclear
Mitochondrial
Three genomes in plant cells
Chloroplast
Mitochondrion
Nucleus
135,000160,000 bp
200,0002,500,000 bp
1.1 x 106
to 1.1 x 1011
kilobase pairs
Generally
maternally
inherited
(seed parent)
Generally
maternally
inherited
(seed parent)
Biparentally
inherited
Selection of DNA region to compare:
•
•
•
•
•
Should be present in all taxa to be compared
Must have some knowledge of the gene or other
genomic region to develop primers, etc.
Evolutionary rate of sequence changes must be
appropriate to the taxonomic level(s) being
investigated; “slow” genes versus “fast” genes
Sequences should be readily alignable
The biology of the gene (or other DNA sequence)
must be understood to assure homology
Genes frequently used for phylogenetic
studies of plants:
•
•
•
Mitochondrial genome – uniparentally (maternally) inherited, but
genes evolve very slowly and structural rearrangements happen
very frequently, so generally not useful in studying relationships,
but there are some exceptions
Plastid genome – uniparentally (maternally) inherited
- rbcL – ribulose-bisphosphate carboxylase large subunit
- ndhF – NADH dehydrogenase subunit F
- atpB – ATP synthetase subunit B
- matK – maturase subunit K
- rpl16 intron – ribosomal protein L16 intron
Nuclear genome – biparentally inherited
- ITS region – internal transcribed spacers ITS1 and ITS2
- 18S, 26S ribosomal nuclear DNA repeat
- adh – alcohol dehydrogenase
- many other genes now with next generation sequencing
Plastid Genome
-Circular, derived
from endosymbiosis of cyanobacteria
-Three zones:
LSC (large single
copy region)
SSC (small single
copy region)
IR (inverted repeats)
- Genes related to
photosynthesis and
protein synthesis
Fig. 14.4
The Polymerase Chain Reaction (PCR) (Fig. 14.2)
Automated Sequencing
Scanning of gel to detect
fluorescently-labeled
DNAs; data fed directly to
computer.
Fig. 14.3
How do we analyze molecular
variation?
- DNA nucleotide sequences (point
mutations)
- Structural rearrangements
-insertions and deletions (indels)
-inversions
Aligned DNA sequences showing substitutions
Insertion-Deletion Events
- Can occur as single
nucleotide gains or losses
or as lengths of 2-many
base pairs
- Can also be “chunks” of
DNA (i.e., losses of introns)
A molecular synapomorphy for Subfamily Cactoideae
(Cactaceae) – deletion of the plastid rpoC1 intron…
ancestral
derived
(Wallace & Cota, Current Genetics, 1995)
Cactaceae: trnL Intron Deletions
trnL intron deletions – Columnar Cacti
North American Clades
Pachycereeae
Leptocereeae
Hylocereeae
Corryocactus
“Browningieae I”*
- 268 bp
Shared Deletion 2
“Browningieae II”*
Cereeae
Trichocereeae
South American Clades
(*Tribe Browningieae polyphyletic)
Chloroplast DNA Inversion
23 kb inversion in all Asteraceae except for members of
Tribe Barnadesieae (now Subfamily Barnadesioideae)
Fig. 14.6
Comparative DNA Sequencing
•
•
•
•
•
•
•
•
Obtain DNA samples from representative organisms (try
to represent morphological diversity) and outgroups
Identify DNA region(s) for comparison
Extract DNA and use PCR to amplify targeted region
Carry out sequencing reactions
Run sequencing procedures (automated)
Align sequences
Use aligned sequences for phylogenetic analysis
(various programs using various algorithms)
Evaluate data in context of taxonomy and morphology
Partial sequence of rbcL (plastid gene coding for
Rubisco) in Poaceae
Anomochlooideae
Pharoideae
Puelioideae
BEP Clade
Bambusoideae
(bamboos)
Pooideae
(bluegrasses, wheat)
Ehrhartoideae
(rices and allies)
Aristidoideae
Stamens
reduced to 3;
+ 55 mya
(wiregrasses)
Panicoideae
(maize, panicgrasses)
Chloridoideae
(love grasses)
PACMAD Clade
Danthonioideae
(pampas grasses)
Micrairoideae
Crepet & Feldman 1991
Arundinoideae
(reeds)
Genetic Databases
International Nucleotide Sequence Database Collaboration
GenBank: National Institutes of Health (NIH) Genetic Sequence
Database
http://www.ncbi.nlm.nih.gov/genbank/
EMBL: European Bioinformatics Institute Nucleotide Sequence
Database
DDBJ: DNA Databank of Japan
Data mining
Climatic Data
-Global Biodiversity
Information Facility (GBIF)
-1,584,351 independent
collection sites
-10,469 taxa
Edwards et al., Science 2010, Fig. 4
Genetic Data
-2,684 taxa
-8 regions (plastid
and nuclear)
-phylogenetic analysis
Edwards & Smith, PNAS 2010, Fig. 1