presentation

Download Report

Transcript presentation

Efficient Data Structures and a
New Randomized Approach
for Sorting Signed
Permutations by Reversals
Haim Kaplan and Elad Verbin
Unsigned Sorting by reversals
Given a permutation, find a
shortest sequence of reversals
that transforms it to
(1 2 … n)
(3 4 2 5 1) (3 4 2 5 1)
Signed Sorting by reversals
Given a signed permutation, find
a shortest sequence of reversals
that transforms it to
(+1 +2 … +n)
(3 4 -2 -5 1) (3 4 -2 -5 1)
Example
•
•
•
•
(3 4 -2 -5 1)
(-4 -3 -2 -5 1)
(-4 -3 -2 -1 5)
( 1 2 3 4 5)
Motivation
• Biology:
computing large-scale
evolutionary distance
(most parsimonious scenario)
• Heuristics for TSP
History of Unsigned SBR
• 1993: Conjectured to be NP-Hard and 2approx. algorithm, Kececioglu and Sankoff
• 1997: Proven NP-Hard, Caprara
• 1999: Proven MAX-SNP Hard, Berman
and Karpinski
• 2001: 1.375-approximation by Berman,
Hannenhalli and Karpinski
History of Signed SBR
• 1993: Conjectured NP-Hard, Kececioglu and
Sankoff
• 1995: Polynomial Algorithm, Pevzner and
Hannenhalli
• 1996: O(n2(n)) Algorithm, Berman and
Hannenhalli
• 1997: Simpler O(n2) Algorithm, Kaplan, Shamir
and Tarjan - Best to date.
• 2001: Very Simple cubic solution, Bergeron
Variants
• Computing just the distance (signed
version) – linear (Bader et. al., 2001)
• Many, many, more
Sorting Signed
Permutations
The Breakpoint Graph
+3
0
3a
+4
3b
4a 4b
-2
2b
-5
2a
5b
5a
• Transform Permutation
• Blue Edges Between Adjacent Vertices
• Red Edges Between Consecutive elements
+1
1a
1b
6
x
-x
xa,xb
xb,xa
Calculating the distance
dist=n+1-c+h+f
h,f are small
parameters of the
breakpoint graph
(usually h+f=0)
Eliminated in
linear time
dist=n+1-c
0
Goal: Find a reversal that creates a cycle
3a keeps
3b
4h+f=0
2b safe
2a reversal)
5b
5a
1a 1b
a 4b
and
(a
Note: single decomposition to alternating cycles
6
Oriented Edges
We consider only reversals that reverse between
the endpoints of a red edge
Red edges can be :
Right-to-Right
{
Unoriented{
Oriented
Left-to-Left
Left-to-Right
Right-to-Left
Def:This reversal acts
on the red edge
Oriented Edges
Red edges can be :
Unoriented
c=0
Oriented
c=1
Thm: A reversal acting on a red edge creates a
new cycle
the edge is oriented
Safe reversals
• Safe oriented reversal: does not increase h+f
• Theorem(H-P): a permutation with h+f=0 always
has a safe oriented reversal (or is id)
• SBR algorithms iteratively find a safe reversal.
KST does this in linear time (total running time –
O(n2))
• H-P characterized safe oriented reversals using
the overlap graph
What happens if
we disregard
safety?
perms with h+f>0 are rare
RandomWalk
’s oriented
reversals
=(1 -4 2 -3)
H-P: all nodes with no
children are either id or red
Failed but don’t
know up
it yetat
ended
Each
path
is
<n+2
steps
Yey!
id
Prop: oriented reversals
never decrease h+f (i.e.
red points only to red)
Prop: if we
id, then the sequence
wash+f>0
a minimal sorting
sequence, otherwise we
can retry
Whoops..
NO CHILDREN
After how many trials will
RandomWalk succeed?
•Worst permutation – many
•Average
permutation
–
very
few
50%
chance of
pi =(2 4 6 -1 -3 -5 7)
failure
7
(permutation of Michal
Ozery and Ron Shamir)
Probability of success
50% chance of
failure
1
2
n
50% chance of
failure
2
What is the average over all
permutations of the expected
number of trials until success?
•Theoretically – we don’t know (but we
do know that red permutations are
polynomially rare)
•Experimentally – 1.6, regardless of n
Behavior of RandomWalk on the
average
Empirical testing of RandomWalk on random permutations
(selected uniformly without unoriented components)
Number of
permutations tested
% success at first trial
Average number of trials until
success
10
300M
66.20%
1.642
20
100M
63.86%
1.607
50
30M
63.01%
1.594
100
10M
62.86%
1.593
200
3M
62.76%
1.594
500
600K
62.69%
1.596
1000
200K
62.62%
1.596
2000
50K
63.35%
1.586
5000
6000
63.7%
1.58
10000
2000
63%
1.59
n
Implementing the
Random Walk
Basic structure
Our DS is based on a simple data structures of
Fredman, Johnson, McGeoch & Hostheimer `95
(which are based on those by Chrobak, Szymacha
& Krawczyk, `90). These structures were invented
for implementing TSP heuristics
These data structures allow us to maintain
permutation under:
• Reversals
1
• Queries:  i ,  i
All operations taking logarithmic time.
Random Walk
• Repeatedly:
• Select a random oriented reversal
• Perform it and update the permutation
How fast can we run one
RandomWalk iteration?
Repeatedly:
• Select a random oriented reversal
• Perform it and update the permutation
• Can be done in O(n) time
• In the paper we show how to do it
in O( n log n ) so one Random Walk
3/ 2
takes time O(n
log n )
Further Questions
• Why does RandomWalk work (so well)?
• Are there variants of RandomWalk that
work better (i.e. have good worst-case
behavior – no bad cases)?
• Can RandomWalk be easily and efficiently
implemented?
Further Questions
• Are there variants to RandomWalk that
can be easily and efficiently implemented
but maintain good average-case behavior
(i.e. by waving the demand for a uniform
selection of an oriented reversal)?
• Can we maintain safety or hurdelity too?
General Further Research
• Is there a subquadratic algorithm for SBR?
• What is the structure of the space of all
sorting reversal sequences? (i.e. the
permutation graph we saw before)
Fin.
Fredman et. al.’s DS
• Splay tree with the permutation in the
leaves – inorder scan gives the
permutation.
• Reverse bits at the nodes. If on means
that the order of the subtree should be
reverse, as should the signs of the
elements.
Reverse(i,j):
Splay operations
should keep the
invariant that the
tree indeed
represents the
perumutation
i
j
splay(j)
Reverse(i,j):
j
i
splay(i)
Reverse(i,j):
i
i
j
T1
k
T1
j
T2
T3
T2
-j
T3
-j
-k
T1
-i
T1
-i
T3R
T2
R
T3
T2R
T4
T4