Genome rearrangement (Part I)

Download Report

Transcript Genome rearrangement (Part I)

Genome Rearrangement
By
Ghada Badr
Part I
Genome, chromosome, gene, gene order
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
• The entire complement of genetic material carried by an individual is
called the genome.
• Each genome contains one or more DNA molecules, one per
chromosome
2
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
3
Genome, chromosome, gene, gene order
QuickTime™ and a
TIFF (Unc ompressed) decompres sor
are needed to see this picture.
• A gene is a segment of DNA sequence with a specific function
4
Genome, chromosome, gene, gene order
5’
3’
A
C
B
Gene order: A -B C
D
F
E
D
-E F
3’
5’
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this pic ture.
• Genes can be ordered by their DNA sequence location.
• DNA consists of two complementary strands twisted around each
other to form a right-handed double helix.
• A sign (+/-) is usually used to indicate on which strand a gene is
located.
5
Genome, chromosome, gene, gene order
A B C
H
D
E F
I
K
J
The DNA molecule (chromosome) may be circular or linear
6
Genome Rearrangement
• The genome is structurally specific to each species, and it
changes only slowly over time. Therefore genome
comparison among different species can provide us with
much evidence about evolution.
• Genome rearrangements are an important aspect of the
evolution of species. Even when the gene content of two
genomes is almost identical, gene order can be quite
different.
Genome 1
Genome 2
A -B C D
-E F
B -E
A C
F-D
7
Genome Rearrangement
Gene order analysis on a set of organisms is a powerful
technique for genomic comparison phylogenetic
inference.
8
Genome Rearrangement

General Definition for the problem:
Given a set of genomes and a set of possible
evolutionary events (operations), find a shortest set of
events transforming (sorting) those genomes into one
another.
What genome means and what events are, makes the
diversity of the problem.
Since these events are rare, scenarios minimizing their
number are more likely close to reality.
Many models have been proposed.
9
Genome Models

Genes (or blocks of contiguous genes) are a good
example of homologous markers, segments of genomes,
that can be found in several species.

The simplest possible model is:

The order of genes in each genome is known,

All the genomes share the same set of genes,

All genomes contain a single copy of each gene, and

All genomes consist of a single chromosome.
10
Genome Models

Genomes can be modeled by permutations:
each gene
can be assigned a unique number and is exactly found
once in the genome.
Signed
Permutation: Each gene may be assigned + or sign to indicate the strand it resides on.
Unsigned
Permutation: If the corresponding strand is
unknown.
11
Permutaions

Genes (markers) are represented by integers:
1, 2, . . . . , n,
with +,- sign to indicate the strand they lie on.

The order and orientation of genes of one genome in
relation to the other is represented by a signed
permutation .

 = ( 2 n-1 n) of size n over {-n, ... , -1, 1, ... ,
n}, such that for each i from 1 to n, either i or -i is
mandatory represented, but not both.
12
Permutaions
Identity permutation:

The identity permutation n = (1, 2, 3, . . . . , n).

When multiple genomes with the same gene content are
compared, one of them is chosen as a base (reference),
i.e, represented as n, and all other identical genes are
given the same integer values.
13
Permutaions
Sorted/unsorted permutation:

In order to sort a permutation this means that we
want to apply some operations on to change it to
 n.

If (1 = 2) We say that is sorted with respect to
.

If (1  2) We say that is unsorted with respect
to .
14
Permutaions
Example: Mitochondrial Genomes
of 6 Arthropoda
1= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)
Fruit Fly
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
2= (1 , 2 , 3 , 4 , 5 , 6 , 8 , 7 , 9 ,-10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)
Mosquito
1 2 3 4 5 6 8 7 9 -10 11 12 13 14 15 16 17
3= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 14 , 13 , 15 , 16 , 17)
Silkworm
1 2 3 4 5 6 7 8 9 10 11 12 14 13 15 16 17
4= (1 , 2 , 3 , 5 , 4 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)
Locust
1 2 3 5 4 6 7 8 9 10 11 12 13 14 15 16 17
5= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 13 , 14 , 15 , 16 , 17)
Tick
1 3 4 5 6 7 8 9 10 11 -2 12 13 14 15 16 17
6= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 16 , 13 , 14 , 15 , 17)
Centipede 1 3 4 5
6 7 8 9 10 11 -2 12 16 13 14 15 17
15
Permutaions
Example: Mitochondrial Genomes
of 6 Arthropoda
1= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)
Fruit Fly
2= (1 , 2 , 3 , 4 , 5 , 6 , 8 , 7 , 9 ,-10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)
Mosquito
3= (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 14 , 13 , 15 , 16 , 17)
Silkworm
Locust
Tick
4= (1 , 2 , 3 , 5 , 4 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)
5= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 13 , 14 , 15 , 16 , 17)
6= (1 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , -2 , 12 , 16 , 13 , 14 , 15 , 17)
Centipede
16
Permutaions
Linear and circular permutation:

 is linear when it represents a linear chromosome, or
circular when it represents a circular chromosome.

When  = ( 2 n-1 n) is circular:
’ = (-n n-1 2 1)
all permutations obtained by shifts on or ’
shift( , i) = (n-i+1 n-i+2n-1 n1 n-i
are all equivalent.
Example: (-3,2,1,-4) & (-1,-2,3,4)
17
Permutaions
Points in permutations

For a given permutation  = ( 2 n-1 n), there
is a point between each pair of consecutive values i and
i+1 in .

If  is linear: there are two additional points, one before
 and one after n.

If  is circular: there is one additional point between
n and 1.

Pts() = n+1 if linear, and pts() = n if circular.
18
Permutaions
Linear extension of a permutation:

For a given  = ( 2 n-1 n)

If  is linear: a linear extension of  is
’= (0,  2 n-1 n, n+1)

If  is circular: a linear extension of  is
’= (0,  2 n-1 n-1, n)
19
Permutaions

Example:
 = (4,8,9,7,6,5,1,3,2)
’= (0,4,8,9,7,6,5,1,3,2,10)
’= (0.4.8.9.7.6.5.1.3.2.10)
Then Pts() = 10

Now: we want to compare our genomes.
20
Permutations - similarity/distance
Problem: Given two genomes, How do we
measure their similarity and/or distance?

A Related Problem: Given two permutations,
How do we measure their similarity and/or
distance?
21
Permutations - similarity/distance

A distance measure should be a metric on the set of
genomes.

A Metric d on a set S (d: S  S  R) satisfies the
following three axioms:
1.
2.
3.
Positivity: for all s, t in S, d(s,t)  0, and d(s,t)=0
iff s = t.
Symmetry: for all s, t in S, d(s,t) = d(t,s).
Triangular inequality: for all s, t, u in S,
d(s,u)  d(s,t) + d(t,u).
22
Permutations - similarity/distance

Measures of similarity between permutations that are
used in computational biology are numerous in literature.

First measures used are (will be useful later on):

Breakpoints (Introduced by Sankoff and Blanchette (1997))

Common intervals
23
Permutations-distance - Breakpoints

When analyze  with respect to , each point in  can be
an adjacency or a breakpoint.

A point (pair of consecutive values) (i, i+1) in  is an
adjacency between  and : when either (i, i+1) or (I+1, -i) are consecutive in .

If  is linear: we have adjacency before  if  is also the
first value in , and an adjacency after n, if n is also last
value in .

If  is circular: we assume that n is also last value in 
and (n, 1) is an adjacency if  is also the first value in
.
24
Permutations-distance - Breakpoints

Breakpoint distance counts the lost adjacencies
between genomes.

The breakpoint distance between  and  is:
brp() = pts() - adj()
where:
pts() is the number of points in .
adj() is the number of adjacencies.
•
If  is sorted ( = ):  has only adjacencies and no
breakpoints (brp() = 0).
•
If  is unsorted (  ):  has at least one breakpoint
(brp()  0).
25
Permutations-distance - Breakpoints

Back to our Example:
 = (4,8,9,7,6,5,1,3,2)
’= (0,4,8,9,7,6,5,1,3,2,10)
’= (0.4.8.9.7.6.5.1.3.2.10)
Then Pts() = 10, brp()?
Adjacencies?
n= (0.1.2.3.4.5.6.7.8.9.10)
(8,9) (7,6) (6,5) (3,2)  adj() = 4
 brp() = pts() - adj()
= 10 - 4 = 6
26
Permutations-distance - Breakpoints

Breakpoint distance is based on the notion of conserved
adjacencies and can be defined on a set of more than two
genomes.

It is easy to compute.

It always fails to capture more global relations between
genomes.

The first generalization of adjacencies is the notion of
common intervals.
27
Permutations-distance - Common Intervals

Common intervals: subsets of genes that appear
consecutively together in two or more genomes, where
genes are the same in each interval but may be not in the
same order or orientation.
Example (circular chromosomes)
1= (1
2= (1
3= (1
4= (1
5= (1
6= (1
,2
,2
,2
,2
,3
,3
,
,
,
,
,
,
3
3
3
3
4
4
,4
,4
,4
,5
,5
,5
,
,
,
,
,
,
5
5
5
4
6
6
,
,
,
,
,
,
6
6
6
6
7
7
,
,
,
,
,
,
7
8
7
7
8
8
,
,
,
,
,
,
8
7
8
8
9
9
, 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)
, 9 ,-10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)
, 9 , 10 , 11 , 12 , 14 , 13 , 15 , 16 , 17)
, 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)
, 10 , 11 , -2 , 12 , 13 , 14 , 15 , 16 , 17)
, 10 , 11 , -2 , 12 , 16 , 13 , 14 , 15 , 17)
If compare the first 4 species: they share 6 adjacencies
{1,2}, {2,3},{11.12},{15,16},{16,17},{17,1}
If compare all 6 species: they share only 1 adjacency
{17,1}
28
Permutations-distance - Common Intervals

Common intervals: subsets of genes that appear
consecutively together in two or more genomes, where
genes are the same in each interval but may be not in the
same order or orientation.
Example (circular chromosomes)
1= (1
2= (1
3= (1
4= (1
5= (1
6= (1
,2
,2
,2
,2
,3
,3
,
,
,
,
,
,
3
3
3
3
4
4
,4
,4
,4
,5
,5
,5
,
,
,
,
,
,
5
5
5
4
6
6
,
,
,
,
,
,
6
6
6
6
7
7
,
,
,
,
,
,
7
8
7
7
8
8
,
,
,
,
,
,
8
7
8
8
9
9
, 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)
, 9 ,-10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)
, 9 , 10 , 11 , 12 , 14 , 13 , 15 , 16 , 17)
, 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17)
, 10 , 11 , -2 , 12 , 13 , 14 , 15 , 16 , 17)
, 10 , 11 , -2 , 12 , 16 , 13 , 14 , 15 , 17)
The six permutations are very similar.
The genes in the interval [1,12] are all the same, as genes in
the intervals [3,6], [6,9],[9,11], and [12,17].
29
Permutations-distance - Common Intervals

We can use common intervals as a measure of similarity
between species.
Disadvantage: All these measures do not reflect
rearrangement operations or explain what happened to
the genome over time.
30
Rearrangement operations (events)
Back to our original problem:
Given a set of genomes and a set of possible
evolutionary events (operations), find a shortest set
of events transforming those genomes into one
another.
What are the Rearrangement events (Operation)?
These events (Operation) could be applied to a single
gene or to a group of genes, intervals.
31
Rearrangement operations
Example: Mitochondrial Genomes
of 6 Arthropoda
Fruit Fly
1
2
3
4
5
6 7
8
9
10 11 12 13 14 15 16 17
Mosquito
Silkworm
Locust
Tick
Centipede
32
Rearrangement Operations
Rearrangement operations affect gene order
and gene content. There are various types:
In case of single-chromosome genome:
• Inversions
• Transpositions
• Reverse transpositions
• Gene Duplications
• Gene loss
In case of multiple-chromosomes genomes we add:
• Translocations
• fusions
• fissions
33
Rearrangement Operations - Single Chro.
Inversion
34
Rearrangement Operations - Single Chro.
Inversion
35
Rearrangement Operations - Single Chro.
Inversion
36
Rearrangement Operations - Single Chro.
Example: Mitochondrial Genomes
of 6 Arthropoda
An inversion.
Fruit Fly
Mosquito
Silkworm
Locust
Tick
Centipede
37
Rearrangement Operations - Single Chro.
Transposition
38
Rearrangement Operations - Single Chro.
Transposition
39
Rearrangement Operations - Single Chro.
Transposition
40
Rearrangement Operations - Single Chro.
Example: Mitochondrial Genomes
of 6 Arthropoda
Fruit Fly
Mosquito
Silkworm
A transposition
Locust
Tick
Centipede
41
Rearrangement Operations - Single Chro.
Reverse Transposition
42
Rearrangement Operations - Single Chro.
Reverse Transposition
43
Rearrangement Operations - Single Chro.
Reverse Transposition
44
Rearrangement Operations - Single Chro.
Example: Mitochondrial Genomes
of 6 Arthropoda
Fruit Fly
Mosquito
A reverse transposition
Silkworm
Locust
Tick
Centipede
45
Rearrangement Operations - Multiple Chro.
Translocation
46
Rearrangement Operations - Multiple Chro.
Translocation
47
Rearrangement Operations - Multiple Chro.
Translocation
48
Rearrangement Operations - Multiple Chro.
Translocation
49
Rearrangement Operations - Multiple Chro.
Translocation
50
Rearrangement Operations - Multiple Chro.
Translocation
51
Rearrangement Operations - Multiple Chro.
Fusion
Fission
52
Rearrangement Operations - Multiple Chro.
Fusion
Fission
53
Rearrangement Operations - Multiple Chro.
Fusion
Fission
54
Rearrangement Operations - Multiple Chro.
Fusion
Fission
55
Rearrangement Operations - Multiple Chro.
Fusion
Fission
56
Rearrangement Operations - Multiple Chro.
Fusion
Fission
57
Rearrangement Operations - Multiple Chro.
From 24
chromosomes
To 21
chromosomes
[Source: Linda Ashworth, LLNL]
58
DOE Human Genome Program Report
Rearrangement Problems
Back to our original problem:
Given a set of genomes and a set of possible
evolutionary events (operations), find a shortest set
of events transforming those genomes into one
another.
Any set of operations yields a distance between
genomes, by counting the minimum number of
operations needed to transform one genome
into the other.
59
Rearrangement Problems
Back to our original problem:
Given a set of genomes and a set of possible
evolutionary events (operations), find a shortest set
of events transforming those genomes into one
another.
Two classical problems
• Computing the distance d()
• Computing one optimal sorting sequence of
events.
60
Reversal Distance - Sorting by Reversals
•
Given a permutation , calculate reversal distance
d() and find one optimal sequence of reversals
sorting .
Assumption:

Only reversals are allowed.

No duplication in genes.

Genomes are unichromosomal.
61
Reversal Distance - Sorting by Reversals
A reversal is represented as a set of genes appearing together
in the given genome.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
62
Reversal Distance - Sorting by Reversals
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
63
Reversal Distance - Sorting by Reversals
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
64
Reversal Distance - Sorting by Reversals
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
65
Reversal Distance - Sorting by Reversals
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
66
Reversal Distance - Sorting by Reversals
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
67
Reversal Distance - Sorting by Reversals
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
This approach is symmetric
68
Reversal Distance - Sorting by Reversals
Reversal graph for n = 3
Vertices: all permutations of n = 3.
Edges: connect an edge between 1 and 2 if reversal
distance d(1, 2) = 1.
69
Reversal Distance - Sorting by Reversals
Reversal graph for n = 3

Reversal distance d(i, k) = length of shortest path
between vi and vk.
70
Reversal Distance - Sorting by Reversals
Reversal graph for n = 3


The graph is huge |V| = n!.2n
A feasible graph-search algorithm is not possible!
71
Reversal Distance - Sorting by Reversals

The classical approach for solving these two problems
in polynomial time was developed by Hannenhalli and
Pevzner. (1995)

The reversal distance can be computed in O(n) time
by Bader et. al. (2000)

The fastest algorithm to find an optimal sorting
sequence is < O(n2) by Tannier et. al. (2007)

Most approaches are based on a special structure
called the breakpoint graph.
72
Reversal Distance - Sorting by Reversals

Breakpoint Graph: edges are black or gray.

Given  = (n-1n)

If  is linear: we add the values 0, and n+1, the
represents the extremities of the chromosome obtaining:
 = (0, n-1n, n+1)

If  is circular: assume n = n and add only the value 0,
obtaining:
 = (0, n-1n-1, n)
73
Reversal Distance - Sorting by Reversals

Black edge: Links each pair of consecutive value in  by a
horizontal (a point in ).

Gray edges: Link the extremities of black edges such that
the values will be in order.

Graph: collection of cycles, where black and gray edges
alternate.

Trivial cycle: one black and one gray edge (adjacency)

Long Cycle: four or more edges ( 2 breakpoints)
74
Reversal Distance - Sorting by Reversals
0
+5
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
When sorted
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
75
Reversal Distance - Sorting by Reversals
0
+5
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
When sorted
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
76
Reversal Distance - Sorting by Reversals
0
+5
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
When sorted
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
77
Reversal Distance - Sorting by Reversals
0
+5
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
When sorted
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
78
Reversal Distance - Sorting by Reversals
0
+5
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
When sorted
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
79
Reversal Distance - Sorting by Reversals
0
+5
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
When sorted
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
80
Reversal Distance - Sorting by Reversals
0
+5
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
When sorted
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
81
Reversal Distance - Sorting by Reversals
0
+5
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
When sorted
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
82
Reversal Distance - Sorting by Reversals
Linear
 = (-3 , 2 , 1 , -4)
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Circular
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.

Linear and circular permutations are different in
breakpoint graph construction.

Same analyses.
83
Reversal Distance - Sorting by Reversals
0
+5
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
When sorted
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
84
Reversal Distance - Sorting by Reversals
0
+5
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
When sorted
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
85
Reversal Distance - Sorting by Reversals
 = (-3 , 2 , 1 , -4)
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
sorted
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
If is sorted:
•
Only adjacencies, no breakpoints.
•
Breakpoint graph is a collection of trivial cycles.
•
# cycles in sorted graph cyc() = pts()
86
Reversal Distance - Sorting by Reversals
 = (-3 , 2 , 1 , -4)
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
sorted
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
If is unsorted:
•
At least one breakpoint, at least one long cycle.
•
# cycles cyc() is at most = pts() - 1
Observation: To sort a permutation , we would
like to increase the number of cycles in its
breakpoint graph.
87
Reversal Distance - Sorting by Reversals
The effects of a reversal  over a breakpoint graph .
Split reversal
Joint reversal
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
cyc(  cyc(
cyc(  cyc(
Neutral reversal
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
cyc(  cyc(
88
Reversal Distance - Sorting by Reversals
The effects of a reversal  over a breakpoint graph .
QuickTi me™ and a
T IFF (LZW ) decompressor
are needed to see t his pict ure.
Quic k Ti me™ and a
T IFF (LZW ) dec ompres s or
are needed to s ee t his pic t ure.
89
Reversal Distance - Sorting by Reversals
Observation: To sort , we must maximize the number
of split reversals in the sorting sequence s.
If s has only split reversals: what will be the reversal
distance d(? (Hint: in terms of pts() and cyc())
d(pts() - cyc()
Are we done?
90
Reversal Distance - Sorting by Reversals
A split reversal does not always exist.
For example, if all black edges in the graph have the
same direction.
In this case, we need to add some joint and/or neutral
reversals in the sorting sequence s.
d(pts() - cyc()
91
Reversal Distance - Sorting by Reversals
•
It is always possible to calculate the number of non-split
reversals in a sorting sequence.
•
It will be the number of non-split reversals to sort some
hard components in the graph with no orientation,
unoriented components.
•
Unoriented components can be a hurdle hrd( or more
hardly a fortress frt() in the breakpoint graph.
•
Hardles are very rare, and fortresses are even more rare
in permutations that represent real genomes.
•
In practice, split reversals are sufficient to sort the
permutation.
92
Reversal Distance - Sorting by Reversals
Can we choose any split reversal? only safe reversals.
Safe reversal: a split reversal not producing hurdles.
Unsafe reversal
Safe reversal
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
There is always a safe reversal for any oriented .
93
Reversal Distance - Sorting by Reversals
The final formula for the reversal distance d( is:
d(pts() - cyc() + hrd() + frt()
Where:
•
frt() = 1, if  is a fortress, and 0 otherwise.
•
pts() = n+1, if is linear, and n if is circular.
94
Reversal Distance - Sorting by Reversals
Algorithm: Get optimal sorting sequence s that sorts 
Input: A signed permutation .
Output: An optimal sequence of reversals sorting .
1.
2.
3.
4.
5.
6
7.
8.
9.
10.

12.
13.
14.
15.
6
17.
18.
Construct the breakpoint graph of .
S   [empty]
If frt() = 1 then
choose a reversal  to eliminate the fortress
  
ss.
[concatenate the reversal  to s]
End if
While there is hurdles in  do
choose a reversal  to eliminate the hurdle
  
ss.
[concatenate the reversal  to s]
End while
While  is not sorted do
choose a safe split reversal  to 
  
ss.
[concatenate the reversal  to s]
End while
return s
95
Reversal Distance - Sorting by Reversals
Algorithm: Get optimal sorting sequence s that sorts 
Input: A signed permutation .
Output: An optimal sequence of reversals sorting .
1.
2.
3.
4.
5.
6
7.
8.
9.
10.

12.
13.
14.
15.
6
17.
18.
Construct the breakpoint graph of .
S   [empty]
If frt() = 1 then
choose a reversal  to eliminate the fortress
  
ss.
[concatenate the reversal  to s]
End if
While there is hurdles in  do
choose a reversal  to eliminate the hurdle
  
ss.
[concatenate the reversal  to s]
End while
While  is not sorted do
choose a safe split reversal  to 
  
ss.
[concatenate the reversal  to s]
End while
return s
96
Reversal Distance - Sorting by Reversals
Algorithm: Get optimal sorting sequence s that sorts 
Input: A signed permutation .
Output: An optimal sequence of reversals sorting .
1.
2.
3.
4.
5.
6
7.
8.
9.
10.

12.
13.
14.
15.
6
17.
18.
Construct the breakpoint graph of .
S   [empty]
If frt() = 1 then
choose a reversal  to eliminate the fortress
  
ss.
[concatenate the reversal  to s]
End if
While there is hurdles in  do
choose a reversal  to eliminate the hurdle
  
ss.
[concatenate the reversal  to s]
End while
While  is not sorted do
choose a safe split reversal  to 
  
ss.
[concatenate the reversal  to s]
End while
return s
97
Reversal Distance - Sorting by Reversals
Algorithm: Get optimal sorting sequence s that sorts 
Input: A signed permutation .
Output: An optimal sequence of reversals sorting .
1.
2.
3.
4.
5.
6
7.
8.
9.
10.

12.
13.
14.
15.
6
17.
18.
Construct the breakpoint graph of .
S   [empty]
If frt() = 1 then
choose a reversal  to eliminate the fortress
  
ss.
[concatenate the reversal  to s]
End if
While there is hurdles in  do
choose a reversal  to eliminate the hurdle
  
ss.
[concatenate the reversal  to s]
End while
While  is not sorted do
choose a safe split reversal  to 
  
ss.
[concatenate the reversal  to s]
End while
return s
Complexity O(n5)
Tools: GRIMM & GRAPPA
98
Reversal Distance - Sorting by Reversals
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
We can have more than one optimal solution
99
conclusions
•
Represented linear and circular genomes as
permutations in our simple model.
•
Described first measures for similarity between
permutation were breakpoint and common intervals -->
has no biological interpretation.
•
Used genome rearrangement events to describe
similarity/distances between genomes --> has more
biological meaning.
•
Described in details one distance measure (reversal
distance) and events (reversals) to sort genomes.
10
0
Thank you
Questions?
Next Lecture?
10
1