Current Approaches to Whole Genome Phylogenetic Analysis

Download Report

Transcript Current Approaches to Whole Genome Phylogenetic Analysis

Current Approaches to Whole
Genome Phylogenetic
Analysis
Hongli Li
Content
Background
 Genome Evolution
 Phylogenetic Analysis
 Performing Statistical Tests
 Phylogenetic Networks
 Conclusion

Phylogenetic Analysis Background

Early attempts – Based on morphological
characters


Directly compare genes make more sense
Modern attempts – Using sequences from
individual homologous genes


A gene’s evolutionary history might not the same as
the evolutionary history of its organisms
Some genes that are sufficiently conserved across all
interested species might not be identified
Genome Evolution

Prokaryotes



Eukaryotes



Relatively Simple
Prokaryote evolutionary history cannot properly be
represented by a tree
More complicated
Frequent inversions of small segments, gene
duplication and loss and polyploidy events
Organellar Genomes


Contain smaller and simpler mitochondrial genome
Plant species have chloroplast genome
Genome Evolution (cont.)
Model of Genome Evolution
Nadeau – Taylor Model
1,2,3,4,5,6,7,8,9,10
e
Inv
n
rsio
Inv
e
1,2,3,4,-8,-7
-6,-5,9,10
rsio
n
1,2,-6,-5,-4,-3,7,8,9,10
In
v
er
si
on
1,2,3,4,-8,-7,-6,-5,9,10
ro m
Fis osom
si o
n e

Ch

1,2,-6,4,5,-3,7,8,9,10
De
let
ion
1,2,-6,-5,-4,8,9,10
Phylogenetic Analysis
– Binary Character Encoding

Binary Character Encoding
Encode the presence or absence of particular
genes or protein families are obvious whereas
gene order are not
 Many different approaches.
 Nature restriction

 A gene
cannot adjacent to more than two others
 A evolutionary event will create two adjacent and
break two
Phylogenetic Analysis
– Distance Methods

Distance Methods
Smallest number of evolutionary events
between two gnomes
 Breakpoint Distance

The distance between two genome with
unequal content is a problem
 There are several software available for
distance analysis

Phylogenetic Analysis
– Maximum Parsimony


Try to find minimum tree is NP-hard
Several attempts



Find “breakpoint phylogeny” – Easier to find the
maximum parsimony tree but still NP-hard
Try to find the true maximum parsimony with
improved algorithms and computing power
Parsimony method has more advantages
compared to distance methods

But difficult to measure the accuracy of solutions
Phylogenetic Analysis
– Other Methods

Maximum Likelihood


Method of Invariants


Computationally prohibitive
Relies on having good estimates for the
invariant function, which requires large
dataset
Bayesian Analysis

The probability distributions involved can
become extremely complicated
Performing Statistical Tests
Performing Statistical Tests for
Phylogenetic features is not straight
forward in any situation
 Re-sampling methods should preserve the
gene order and should be used with
caution since new error might introduced

Phylogenetic Networks

When dealing with whole genomes and in
particular prokaryotic genomes we need
phylogenetic networks




Split graphs
Reticulograms
Can express uncertainty in a tree or a lack of
faith in the tree model of evolution
Not suitable for representing phenomena such
as horizontal transfer or allopolyploid events
Conclusion
Comparison of gene content are becoming
commonplace but comparison gene order
present a wider range of problems
 It is important to focus on the data we
already or will have
 Methods for whole genome phylogenetic
analysis need to be robust against missing
or inaccurate information
