Current Approaches to Whole Genome Phylogenetic Analysis
Download
Report
Transcript Current Approaches to Whole Genome Phylogenetic Analysis
Current Approaches to Whole
Genome Phylogenetic
Analysis
Hongli Li
Content
Background
Genome Evolution
Phylogenetic Analysis
Performing Statistical Tests
Phylogenetic Networks
Conclusion
Phylogenetic Analysis Background
Early attempts – Based on morphological
characters
Directly compare genes make more sense
Modern attempts – Using sequences from
individual homologous genes
A gene’s evolutionary history might not the same as
the evolutionary history of its organisms
Some genes that are sufficiently conserved across all
interested species might not be identified
Genome Evolution
Prokaryotes
Eukaryotes
Relatively Simple
Prokaryote evolutionary history cannot properly be
represented by a tree
More complicated
Frequent inversions of small segments, gene
duplication and loss and polyploidy events
Organellar Genomes
Contain smaller and simpler mitochondrial genome
Plant species have chloroplast genome
Genome Evolution (cont.)
Model of Genome Evolution
Nadeau – Taylor Model
1,2,3,4,5,6,7,8,9,10
e
Inv
n
rsio
Inv
e
1,2,3,4,-8,-7
-6,-5,9,10
rsio
n
1,2,-6,-5,-4,-3,7,8,9,10
In
v
er
si
on
1,2,3,4,-8,-7,-6,-5,9,10
ro m
Fis osom
si o
n e
Ch
1,2,-6,4,5,-3,7,8,9,10
De
let
ion
1,2,-6,-5,-4,8,9,10
Phylogenetic Analysis
– Binary Character Encoding
Binary Character Encoding
Encode the presence or absence of particular
genes or protein families are obvious whereas
gene order are not
Many different approaches.
Nature restriction
A gene
cannot adjacent to more than two others
A evolutionary event will create two adjacent and
break two
Phylogenetic Analysis
– Distance Methods
Distance Methods
Smallest number of evolutionary events
between two gnomes
Breakpoint Distance
The distance between two genome with
unequal content is a problem
There are several software available for
distance analysis
Phylogenetic Analysis
– Maximum Parsimony
Try to find minimum tree is NP-hard
Several attempts
Find “breakpoint phylogeny” – Easier to find the
maximum parsimony tree but still NP-hard
Try to find the true maximum parsimony with
improved algorithms and computing power
Parsimony method has more advantages
compared to distance methods
But difficult to measure the accuracy of solutions
Phylogenetic Analysis
– Other Methods
Maximum Likelihood
Method of Invariants
Computationally prohibitive
Relies on having good estimates for the
invariant function, which requires large
dataset
Bayesian Analysis
The probability distributions involved can
become extremely complicated
Performing Statistical Tests
Performing Statistical Tests for
Phylogenetic features is not straight
forward in any situation
Re-sampling methods should preserve the
gene order and should be used with
caution since new error might introduced
Phylogenetic Networks
When dealing with whole genomes and in
particular prokaryotic genomes we need
phylogenetic networks
Split graphs
Reticulograms
Can express uncertainty in a tree or a lack of
faith in the tree model of evolution
Not suitable for representing phenomena such
as horizontal transfer or allopolyploid events
Conclusion
Comparison of gene content are becoming
commonplace but comparison gene order
present a wider range of problems
It is important to focus on the data we
already or will have
Methods for whole genome phylogenetic
analysis need to be robust against missing
or inaccurate information