THE CYTOCHROME B (CYT-B) GENE SEQUENCES ANALYSIS …

Download Report

Transcript THE CYTOCHROME B (CYT-B) GENE SEQUENCES ANALYSIS …

ALYSIS OF SEQUENCE DIVERGENCE
AT MITOCHONDRIAL GENES ON FIVE
DIFFERENT TAXONOMIC LEVELS.
APPLICABILITY OF mtDNA DISTANCE
BASE DATA IN GENETICS OF
SPECIATION AND PHYLOGENETICS
Y.Ph. Kartavtsev
A.V. Zhirmunsky Institute of Marine Biology of Far
Eastern Branch of Russian Academy of Sciences,
Vladivostok 690041, Russia, e-mail:
[email protected]
MAIN GOALS
1. What the Data Base is?
2. The Review of Literature Data on p- Distances.
3. Species Concept. Speciation Modes (SM):
Population Genetic View.
INTRODUCTION (1)
•Mitochondrion DNA (mtDNA) is a
ring molecule of 16-18 kilo-base
pairs (kbp) in length. As literature
data show, mtDNA of all fishes has
similar organization (Lee et al.,
2001; Kim et al., 2004; Kim et al.,
2005; Nagase et al., 2005; Nohara
et al., 2005) and small differences
among all vertebrate animals,
including men (Anderson et al.,
1981; Bibb et al., 1981; Wallace,
1992; Kogelnik et al., 2005).
•The complete content of whole
mitochondrial genome
(mitogenome) includes: control
region (CR or D loop), where the
site of initiation of replication and
promoters are located, big (16S)
and small (12S) rRNA subunits, 22
tRNA and 13 polypeptide genes.
INTRODUCTION (2)
• Usually in phylogenetic research single gene sequences are used for both
•
•
mtDNA and nuclear genome. However, recently more and more frequent
are become complete mitogenome usage. Japanese scientists are leading
here for water realm organisms.
Most popular in phylogenetics are sequences of cytochrome b (Cyt-b) and
cytochrome oxidase 1 (Cо-1) genes, which used for taxa comparison at the
species - family level (Johns, Avise, 1998; Hebert et al., 2004; Kartavtsev,
Lee, 2006). Many sequences that bringing the phylogenetic signal
obtained for different taxa at gene 16S rRNA as well.
Sequences of separate genes can have different phylogenetic signal
because of differences in substitution rates. This is also true for different
sections of genes. Also, under comparison of higher taxa there may be
effects of homoplasy. When numerous taxa available there are problems
of insufficient information capacity of sequences to cover big species
diversity and adequate taxa representation is quite important (Hilish et al.,
1996). Nevertheless, for the species identification, excluding rare cases,
fine results are available even with the usage of short sequences, like Со-1,
with 650 bp.
Applicability of Different DNA Types in
Phylogenetics and Taxonomy
Species
Genus
Family Order
Spacers
[ITS-1, 2]
mtDNA
nDNA,
rDNA
Most substantiated statistically results
Statistically significant results
Class
Phylum
1. WHAT THE DATA BASE IS?
1.1. USING P-DISTANCES. SUMMARY
•
To estimate the actual number of substitutions among sequences X and Y
it is necessary to introduce a certain mathematical model.
•
At least 8 major models (Nei, Kumar, 2000; Felsenstein, 2004) and 56 in
total are referred in sources nowadays (Posada, 2005;
http://darwin.uvigo.es/software/modeltest.html).
•
Among most simple and known are Jukes, Cantor (1968; JC) and Kimura
(1980) two parametric (K2P) models. The late is default in some packages
(e.g. PAUP). These models consequently suggest the equality of all kinds
of substitutions and non equality for transitions (α) and transversions (β).
•
Titles of some other models: Equal-input, Tamura, HKY (Hasigawa-KishinoYano), Tamura-Nei (TrN), General time reversible (GTR), Unrestricted.
1.2. USING P-DISTANCES. SUMMARY
• In the K2P model equilibrium frequencies of 4 nucleotides are 0.25. However,
the algorithms suggested for calculations (expected p^ and its variance) here
and in Jukes-Cantor model are applicable irrelevant to frequency deviations
(Rzhetsky, Nei, 1995). Thus, both models are suitable for wider range of
conditions, where real parameters stay unknown.
• Be unconfused we should remember that in Kimura’s model ratio of transitions
to transversions is R = α / 2β, however many authors and many software using
different proportion - k = α / β.
• In our estimates (Kartavtsev, Lee, 2006) most authors using K2P (29%) and
many using simple p^ or such measures as HKY, TrN etc. To choose an
appropriate model there is a popular program MODELTEST (Posada, Grandal,
1998). Very useful info on model properties and their applicability over wide
range of specific data sets may be find in literature (Nei, Kumar, 2000; Hall,
2001; Sanderson, Shaffer, 2002; Felsenstein, 2004).
1.3. USING P-DISTANCES. SUMMARY
•
Numerical simulations showed that when p-distances are
small, <20%, then any model give similar values (Fig. 1.1).
•
Because of heterogeneity of substitution rates along
sequences and different parts of genes an important correction
of p-distance is gamma-correction (e.g. Nei, Kumar, 2000;
Felsenstein, 2004).
Fig. 1.1. Estimates of the
number of nucleotide
substitutions obtained by
different distances measures
when actual numbers follows
TrN-model (From Nei, Kumar,
2000).
2. THE REVIEW OF LITERATURE DATA
ON P- DISTANCES
2.1. DIVERSITY AT DNA MARKERS WITHIN
SPECIES AND IN TAXA OF DIFFERENT RANK.
AN ANALYSIS OF EMPIRICAL DATA
RESULTS (1)
Fig. 2.1. Rooted consensus
(50%) tree showing
phylogenetic
interrelationships on the
basis of Cyt-b sequence
data for the analyzed
flatfish species
(Pleuronectiformes).
Bayesian tree; repetition
frequencies for n=106
simulated generations are
shown (%) in the nodes.
The tree was built based
on the TrN+I+G model
and was rooted with the
sequences of three outgroup species
(Perciformes). The scales
in the left bottom corners
indicate relative branch
lengths.
RESULTS (2)
Fig. 2.2. Rooted Neighbor Join Tree (NJ) showing phylogenetic interrelationships based on
sequence diversity at Co-1 gene for 13 flatfish species (Pleuronectiformes) and two outgroup taxa (Perciformes), total 21 sequences. In the nodes a bootstrap support, n=1000.
Kumara 2 parametric model is used. Line in the bottom shows the scale for branch length.
Intraspecies diversity
There are many and variable estimates
based on different markers. For instance,
two copepod species obtained nucleotide
diversity (π) dependent on latitude at rRNA
gene of mtDNA. Subarctic species Calanus
finmarchicus, π=0.37%, SD = 0.26, was less
variable, than temperate water, Nanocalanus
minor, π=0.50%, SD = 0.32 (Bucklin,
Wiebe, 1998). If focus on Cyt-b and Co-1
sequence diversity, К2Р value at Со-1 at
sequence some 600 bp was estimated for
107 intraspecies groups of different species
for five families of baterfly (Lepidoptera:
Arctidae, Geometridae, Noctuidae,
Notodontidae, Sphingidae), as small (Hebert
et al., 2002). For average values the
variation is within limit: 0.17 – 0.36%. Our
recalculation gives average for these groups:
К2Р = 0.25 ± 0.04%.
• In our data base for hundred intraspecies р-distances averages comprise at Cyt-b and Co-1:
M=1.55± 0.56% and M=0.55± 0.19%, correspondingly (Kartavtsev, Lee, 2005).
• Most important thing that I like to stress here is that for many species have been detected stable,
geographically restricted intraspecies gatherings. They are marked by mtDNA genes and obviously
there are isolated intraspecies phylogroups existing for many generations, which as real as real are
the local stocks that defined by other biological methods. Number of such examples is summarized
by Avise & Wolker (1999) and many also presented in our review (Kartavtsev, Lee, 2006). Among
others may be mentioned bottle-nose dolphin, Tursiops truncatus (Dowlin, Brown, 1993), Canadian
gees, Branta canadiensis (Van Wagner, Baker, 1990), fishes, Fundulus heteroclitus and
Stizostedion vitreum (Gonzales-Willasefior, Powers, 1990, Billington, Strange, 1990) etc. (Stepien,
Faber, 1998).
p-DISTANCES IN GROUPS OF COMPARISON,
Flatfish
Fig. 2.4. Resulting graph of one factor ANOVA and mean p-distance values at four levels of
differentiation in the flatfish species (Pleuronectiformes) for Cyt-b gene. Groups: 1.
Intraspecies, among individuals of the same species; 2. Intragenus, among species of the same
genera; 3. Intrafamily, among genera of the same family; 4. Intraorder, families of the order
Pleuronectiformes. Statistically significant variation are shown on top of the graph. SE: a
standard error of mean (From Kartavtsev et al., 2007, Marine Biology).
p-DISTANCES IN GROUPS OF COMPARISON,
Review
22
18
p-Distance
14
10
6
2
-2
1
2
3
4
1
Cyt-b
2
3
Co-1
4
±1.96*SE
±1.00*SE
Mean
GROUP OF COMPARISON
Fig.2.6. Categorized plot of distribution of weighted mean p-distances among four groups of
comparison at Cyt-b and Co-1 genes. Groups here: 1. Intra-species, among individuals of the
same species; 2. Intra-sibling species, 3. Intra-genus, among species of the same genera; 4.
Intra-family, among genera of the same family (From Kartavtsev, Lee, 2006).
p-DISTANCES IN GROUPS OF COMPARISON,
Review
Interaction effect: F = 268.63, d.f. = 4, 18295, P = 0.0001
30
25
DISTANCE SCORE
20
15
10
5
0
-5
1
2
3
4
5
Co-1
Cyt-b
COMPARISON GROUP
Fig. 2.7. Plot of distribution of weighted mean p-distances among five groups of comparison at
Cyt-b and Co-1 genes. Groups here: 1. Intra-species, among individuals of the same species; 2.
Intra-sibling species (semispecies + subspecies), 3. Intra-genus, among species of the same genera;
4. Intra-family, among genera of the same family , 5. Intra-oreder, families of the same order (From
Kartavtsev, 2009, NOVA Publ., NY).
Thus, data available suggest that in general a phyletic evolution prevail in animal world,
and so far, the Geographic speciation events (Type 1a) prevail in nature.
Do data presented assume that speciation is always follows the Type 1a mode? I guess,
no. Few examples below let to support this answer.
GENETIC DISTANCES AMONG SPECIES IN
SEPARATE ANIMAL GENERA
(After Avise, Aquadro, 1982)
This plot illustrate a thought that different animal groups of the same rank are unequal in structural
gene divergence; i.e. the rate of evolution differ either at genes or at morphology or both.
EXAMPLES OF REGULATORY DIVERGENCE AMONG FISH TAXA
Сопоставление
гольцов
Comparison
of chars Salvelinus malma & S. Taranetzi (Kartavtsev et al.,
1983)
TableСРАВНЕНИЕ
2.1. COMPARISON
OF ISOZYME ACTIVITY
IN THREE WHITEFISH
АКТИВНОСТИ
ИЗОФЕРМЕНТОВ
У ТРЕХ
ФОРМ
СИГОВЫХ
РЫБ
(COREGONIDAE)
И
FORMS (COREGONIDAE) AND GRAYLING (THYMALLIDAE)
ДВУХ ФОРМ ХАРИУСА (THYMALLIDAE)
LEVELS OFРАЗЛИЧИЯ
DIFFERENCES
ACTIVITY
УРОВЕНЬ
В IN
АКТИВНОСТИ
(ЭКСПРЕСИИ)
LOCUS/
C. autumnalis C. autumnalis C. lavaretus
T. arcticus
ЛОКУС \ & C. lavaretus & C. lavaretus pidschian & C. (Black form)
FORM
ФОРМА
pidschian
baikalensis
& T. arcticus
Lavaretus
baikalensis
-
(White form)
++
GPDH-1*
-
-
GPDH-2*
-
+
++
-
MDH-2*
ME*
-
-
+
+
-
6PGD*
++
-
+
+
IDH-1*
-
-
+
-
IDH-2*
+
-
-
-
PGM-2*
++
+++
++
-
FUM*
+
-
+
-
ACPH-1*
-
-
-
+++
18.2±8.2
9.1±6.1
27.3±9.5
17.4+7.9
Доля (E):
Ratio, %
“+”/ N (%)
– общееare:
число
локусов
(сиговые
– 23), do
Активности
Note. TotalПримечание.
number of lociNanalyzed
Whitefish
– 22,
Grayling– –22,
23,хариусы
“-” – Activity
not differ
отличаются - “+”, отличия в два раза - “++”, в три и более раз - “+++”
significantly,
Iterative activity
difference,
(по“+”
Картавцев,
Мамонтов,
1983). “++” – two-fold difference, “+++” – three-fold or greater difference
WHAT IS MAIN OUTCOME
• Distance measure alone is not satisfactory
•
•
•
descriptor.
Data on intraspecies diversity (heterozygosity) at
structural genes are necessary.
Measures of regulatory genome changes should
be necessary to describe transformative modes
of speciation.
Other descriptors of genomic change are
required (e.g. chromoseme number, NF, etc.).
3. SPECIES CONCEPT.
OLD IDEAS AND NEW
DEVELOPMENTS
WHAT SPECIES IS?
Species is a biological unity which reproductively isolated from other unities and
consisting from one to several more or less stable populations of sexually reproducing
individuals that occupy certain area in nature (my definition). In principal points, this is the
definition of BSC (Biological Species Concept). In one of the original BSC definitions “A
species is a reproductive community of populations (reproductively isolated from
others) that occupies a specific niche in nature” (Mayr, 1982, p. 273). We will accept BSC
for further discussion, although will keep in mind that it is restricted mainly to bisexual
organisms (Mayr, 1963, Timofeev-Resovsky et al., 1977, Templeton, 1998).










The Linnaean Species
The Biological Species Concept (BSC) (Mayr, 1942, 1963)
BSC Modification II (Mayr, 1982)
The Recognition Species Concept (Paterson, 1978, 1985)
The Cohesion Species Concept (Templeton, 1989)
Evolutionary Species Concept
Simpson (1961) Evolutionary Species Concept.
Wiley’s (1978) Evolutionary Species Concept.
The Ecological Species Concept (Van Vallen, 1976).
The Phylogenetic Species Concept (Crawcraft, 1983).
SCHEMATIC REPRESENTATION OF SPECIES
DIVERGENCE AND ORIGIN
(After Dobzhansky, 1955)
A
C
B
A
The keystone of STE (Synthetic Theory
of Evolution) may be represented by
Dobzhansky’s scheme (Fig. 3.1), in
which the gene pool separation is a key
to speciation. If one provides a fact that
evolution is possible without genetic
change in lineages, then the
evolutionary genetic paradigm and STE
in particular can be rejected.
Fig. 3.1. Dobzhansky’s (1955) scheme of in time divergence.
А – Single species population.
B – Initial phase of divergence (subspecies).
C – Different species.
Fig. 3.1. Main Modes of
Speciation
Bush, 1975)
FIG. 3.2. DIAGRAMMATIC REPRESENTATION OF BASIC
MODES OF SPECIATION (After Bush, 1975)
The gene flow breaks
are able to create
Reproductive Isolating
Barriers (RIB) or
Reproductive Isolation
Mechanisms (RIM),
which in their turn lead
to further origin of
species; under different
situation in nature, the
different modes of
speciation acted (Fig.
3.2). Neither, the
scheme above, nor the
paper itself (Bush,
1975), answer many
fundamental questions
of speciation. For
instance, it is unclear,
what mode is most
frequent and is a gene
flow the sole primary
factor, that alter gene
pools or there are
others?
In other words we have to conclude that there is no a theory of speciation in scientific
meaning at all.
SPECIATION MODES (SM): POPULATION GENETIC VIEW
• ABSENCE OF QUANTITATIVE THEORY OF SPECIATION (QTS)
We have mentioned in preceding section that the speciation theory in evolutionary genetics is
absent in exact scientific meaning, which expects the ability to predict future by the theory. In this
case this is to predict species origin, or at least discriminate among several speciation modes on
the basis of some quantitative parameters or their empirical estimates. Attempts made in this
direction (Avise, Wollenberg, 1997, Templeton, 1998) do not fit the above criteria. That is why we
attempted to step in the discrimination of the speciation modes on the basis of main population
genetic measurements available in literature, and that may be laid in the frame of a genetic
speciation concept.
• BASEMENT FOR THE QTS
As a basis for the set of evolutionary genetic concepts we used the descriptions made by
Templeton (1981). As a result the classification scheme for 7 different modes of speciation was
created (Fig. 3.3). This approach leads to quite simple experimental scheme that permits: (i) to
arrange further investigation of speciation in different groups of organisms, and (ii) to derive
analytical relations for each speciation mode (Fig. 3.4).
• EMPIRICAL QTS TESTING
The scheme was tested for Cyprinids (Kartavtsev et al., 2002) and explains well our own earlier
data on salmons (Kartavtsev, Mamontov, 1983, Kartavtsev et al., 1983). Certainly, both the testing
of the scheme presented, and its theoretic background must be further developed.
Fig. 3.3. SPECIATION MODES (SM): POPULATION
GENETIC VIEW (After Kartavtsev et al, 2002)
DIVERGENCE SM
D1. ADAPTIVE
D2. CLINAL
D3. HABITAT
Necessary Conditions for Speciation
D1. a) Erection of extrinsic
Isolating barriers followed by
gene flow break; b) Pleotropic
origin of RIB (Reproductive
Isolatiion Barriers) in long time
D2. a) Selection on a cline
with isolation by distance;
b) Pleotropic
origin of RIB
D3. a) Selection over multiple
habitats with no isolation by
distance; b) RIB origin by
disruptive selection at genes
determined behavior
Sufficient Conditions for Speciation
Lack of efficient hybridization in the zone
of contact
1. DT > DS
2. ED = EP
3. HD = HP
4. TM-
1 (S)
Lack of efficient hybridization outside the zone
of contact
1. DT > DS
2. ED  EP
3. HD = HP
4. TM-
2 (S)
Lack of efficient hybridization inside and outside the
zone of contact
1. DT = DS
2. ED  EP
3. HD =< HP
4. TM-
3 (S)
Experimentally measurable features and possible descriptors for the model (theory),  (S)
DESCRIPTORS:
D – Genetic distance at structural
genes:
DT – in suggested parent taxa,
DS – among conspecific demes,
DD – among subspecies or sibling
species;
HD – Mean heterozygosity in
suggested
daughter population;
Hp – Mean heterozygosity in suggested
parent population;
EP – Divergence in regulatory genes
among suggested parent taxa;
ED – Divergence in regulatory genes
among suggested daughter taxa;
TM+- Test for modification (positive);
TM-- Test for modification (negative).
RIB – Reproductive isolation Barriers.
Fig. 3.4. ANALITICAL DESCRIPTION OF
SEVEN TYPES OF SPECIATION MODES
1 (S)  {(DT > DS)  (ED = EP)  (HD = HP)  TM-}
(D1)
2 (S)  {(DT = DS)  (ED  EP)  (HD = HP)  TM-}
(D2)
3 (S)  {(DT = DS)  (ED  EP)  (HD <= HP)  TM+}
(D3)
4 (S)  {(DT > DD)  (ED  EP)  (HD < HP)  TM-}
(T1)
5 (S)  {(DT = DD)  (ED = EP)  (HD < HP)  TM-}
(T2)
6 (S)  {(DT > DD)  (ED  EP)  (HD > HP)  TM-}
(T3)
7 (S)  {(DT > DS)  (ED  EP)  (HD < HP)  TM-}
(T4)
Note. Descriptors are explained in previous figure.
DISTANCE VS TAXA SPLITTING
Has punctuation an impact in species origin on molecular level?
• Avise, Ayala, 1976; Kartavtsev et al., 1980; current – No.
• Pegel et al., 2006 – Yes.
1000
800
of Splittings
Number PV-comp
600
400
200
rs = 0.22, p < 0.05
0
-200
-400
-600
-800
0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 1,1 1,2 1,3
Mean
±0,95 Conf . Interv al
p-dis-tr
Transformed
p-distance
Fig. 3.6. Plot of p-distance on number of splittings at Cyt-b sequence data for catfishes and flatfishes
FEW WORDS ON INSERTION
SEQUENCES (IS)
4
3
CV2 (Align scores)
2
1
0
-1
-2
-3
-2,5
-2,0
-1,5
-1,0
-0,5
0,0
0,5
1,0
1,5
2,0
Av es 1
Av es 2
Fish 1
Fish 2
CV1 (Align scores)
Fig. 3.7. Bivariate plot of distribution of canonical variable (CV) roots among four groups of comparison among
obtained IS sequences in 60 complete mitogenomes of birds and fishes.
Groups: 1. Aves-1, aquatic species, that have fish as a food; 2. Aves-2, land species mostly plant & corn eaters, 3.
Fish-1, species that abundant as a food for birds; 4. Fish-2, species that could not be abundant as a food for
birds (From Kartavtsev, 2010). CV score variation is statistically significant: R = 0.82, X2 = 104.48, P = 0.0003.
Mean classification precision for groups 1 & 2 is 85.1%.
Summary
•
•
•
Algorithms of nucleotide diversity estimates and other measures of genetic divergence for the
two genes Cyt-b (cytochrome b) and Co-1 (cytochrome oxidase 1) are analyzed. Based on the
theory and algorithms of distance estimates on DNA sequences, as well as on the observed
distance values retrieved from literature, it is recommended for realistic tree building to use a
specific nucleotide substitution model from at least 56 available from Modeltest 3.7 or other
software depending on the specific set of nucleotide sequences. Using a database of pdistances and similar measures gathered from published sources and GenBank
(http://www.ncbi.nlm.nih.gov) sequences, genetic divergence of populations (1) and taxa of
different rank, such as subspecies, semispecies or/and sibling species (2), species within a
genus (3), species from different genera within a family (4), and species from separate
families within an order (5) have been compared.
Empirical data for 18,192 vertebrate and invertebrate animal species demonstrate that the
data series are realistic and interpretable when p-distance and its various derivates are used.
The focus was on vertebrates and fish species in particular, and the newest dataset obtained in
the framework of FishBOL (http://www.fishbol.org). Distance data revealed various and
increasing levels of genetic divergence of the sequences of the two genes Cyt-b and Co-1 in
the five groups compared. Mean unweighted scores of p-distances (%) for five groups are:
Cyt-b (1) 1.46±0.34, (2) 5.35±0.95, (3) 10.46±0.96, (4) 17.99±1.33 (5) 26.36±3.88 and
Co-1 (1) 0.72±0.16, (2) 3.78±1.18, (3) 10.87±0.66, (4) 15.00±0.90, (5) 19.97±0.80. The
estimates show good correspondence with former analyses. This testifies to the applicability of
p-distance for most intraspecies and interspecies comparisons of genetic divergence up to the
order level for the two genes compared. As seen from the numbers above, and from a
regression analysis, there is no a sign of saturation, usually expected from a homoplasy effect.
Differences in divergence between the genes themselves at the five hierarchical levels were
also found. This conforms to the ample evidence showing different and nonuniform evolution
rates of these and other genes and their various regions. The results of the analysis of the
nucleotide as well as allozyme divergence within species and higher taxa of animals are, firstly,
in a good agreement with previous results and showed the stability of a general trend, and,
secondly, suggest that in animals, phyletic evolution is likely to prevail at the molecular level,
and speciation mainly corresponds to the geographic model (type D1). The prevalence of the
D1 speciation mode does not mean that other modes are absent. There are at least seven
possible modes of speciation. How we can recognize them formally with operational genetic
criteria is a key question for establishing a quantifiable genetic model (theory) of speciation.
An approach is suggested that allows a step forward in this direction
THANKS FOR ATTENTION!
FEW FORMULAE
• MEAN HETEROZYGOSITY
(ON LOCUS/INDIVIDUAL)
H =  Li=1 hk / L
hk = 1 -  mi=1 pi2, pi – an i allele
frequency; L – loci number.
• p-DISTANCE
p ^ = nd / n
nd – number of nucleotides that are different between
sequences X and Y, n – total number of nucleotides
analysed.