Transcript Slide

Inferring Phylogeny using
Permutation Patterns on
Genomic Data
1Md
Enamul Karim
2Laxmi Parida
1Arun Lakhotia
1University
of Louisiana at Lafayette
2IBM T. J. Watson Research Center
Phylogeny
Reconstruction of the evolutionary
relationship of a collection of
organisms, usually in the form of a
tree.
Phylogenetic data


Behavioral, morphological, metabolic,
etc.
Molecular data: sequence data,
gene-order data etc.
Why gene order data?


Low error rate.
Rare evolutionary events unlikely to
cause “silent" changes; can help
inferring millions of years.
Genomes rearrangements
1
2
3
4
5
6
7
8
9
10
1
2
3 –8
9 –7
-8
4 –6
–7
5 –5
–6
6 -4
–5
7 –4
8
9
10
• Transposition
Inverted Transposition
Inversion
Breakpoint distance
For some datasets, a close-to-linear

Breakpoints
are
number
of
relationship between the breakpoints and
adjacencies
present
evolutionary
events may
exist.in one

genome, but not in the other.
Can be used for building phylogeny
(Blanchette
1 2 et
3 al.4). 5 6 7 8 9 10

1 –3 –2
4
5
9
6
7
8
10
Limitations of breakpoint



The number of breakpoints created by a
certain number of inversions may vary.
Also, transpositions generally create
more breakpoints than inversions.
Computing the breakpoint phylogeny is
NP-hard.
MPBE (Maximum Parsimony
on Binary Encoding)
A heuristic for the breakpoint phylogeny
(Cosner et al.).
 All ordered pairs of signed genes
appearing consecutively are coded as
binary features.
 Exponential time complexity, however,
much faster than BPAnalysis.

Limitations

May fail to find feasible solutions to the
breakpoint phylogeny problem.
Observation: The closer is the evolution
history, the more permutations (of
different granularity) are in common
1
2
3
4
5
6
7
8
9
10
1
2
3 –8 –7 –6 –5 –4
9
10
1
8 –3 –2 –7 –6 –5 –4
9
10
Maximal pi-pattern (Eres et al.)


Matches permutations at different
granularity.
Polynomial time complexity.
pi-pattern
Pattern with minimum k permutations

Example :
abc
For S = acbcab and k=2
All pi-patterns are: ac, bc, abc, abcc
Cover
P1 covers P2=>
 Every P1 has a P2
 Every P2 is within a P1
 Example
In S = acbcab
abc covers ac
Maximal pi-pattern

pi-pattern which is not covered
 Example not covered by abcc
In S = acbcab
pi-patterns: ac, bc, abc, abcc
Maximal pi-patterns: abc, abcc
Results
Phylogeny for simulated
evolution on synthetic data
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
12 genera of Campanulaceae
and the outgroup tobacco
Tree1: MPBE tree
Tree2: Neighbor joining tree
(using few different distances)
Tra
Sym
Cam
Ade
Wah
Mer
Leg
Asy
Tri
Cod
Cya
Pla
Tob
Tree3: Neighbor joining tree
using permutation patterns
Tra
Sym
 167 Maximal pi-patterns(from
10769 pi-patterns) used as binary
feature
Cam
Ade
Wah
Mer
Asy
Leg
 XOR Distance measure
Tri
Cod
Cya
Pla
 Distance/Similarity matrix is
created to find neighbor joining
tree
Tob
Tree3 vs Tree2
Conclusion



Permutation patterns may preserve more
evolutionary information.
Evolutionary events could be counted within
permuted segments to develop a hybrid
scheme.
Current approaches remain unable to handle
unequal gene content, which could be solved
using maximal pi-patterns.