Transcript Abstract

A rapid algorithm for generating
minimal pathway distances:
Pathway distance correlates with genome
distance but not enzyme function
Stuart Rison1*, Evangelos Simeonidis2, Janet Thornton1,3,
David Bogle2, Lazaros Papageorgiou2#
Department of Biochemistry and Molecular Biology and
2 Department of Chemical Engineering,
University College London, London, WC1E 6BT, UK
3 Department of Crystallography,
Birkbeck College, Malet Street, London, WC1E 7HX, UK
1
Corresponding author (biology): [email protected]
# Corresponding author (algorithm): [email protected]
*
Outline
•
•
•
•
What is pathway distance?
Why calculate pathway distance?
Original method
Novel method - mathematical
programming
• Application:
– Genomic distance
– Enzyme function
Glycolysis
+
TCA
This step is
reversible
Each metabolic transition represents a pathway distance unit (step)
Pathway Distance
Pathway distance considers distance between
metabolic enzymes
Should take into account:
• directionality
• circularity
The pathway distance between GapA and GltA is 7 steps
The shortest pathway distance between GltA and Mdh is 8 steps
(considering directionality) or 2 steps (without directionality)
This step is irreversible
(pathway from EcoCyc: http://ecocyc.pangeasystems.com/)
Pathway Distance
• Reverses the “usual” pathway
representation (substrates as nodes,
enzymes as edges)
• Pathway distance is inclusive; the
source enzyme has a distance of 1 step
Why calculate pathway distance?
• Metabolic pathways are complex networks of
interaction enzymes, substrates and cofactors
• Relatively well characterised for certain
organisms (e.g. E. coli )
• Much work done on modelling metabolism
but now also much interest in pathways as an
indicator of “connectivity” between genes
• Pathway distance (Dp) is an extension of this
connectivity
Original Method
• Represent pathways as directed acyclic
graphs
• Use arbitrary direction for pathways
• “Snip” open any cycle
• Perform DFT of resulting graphs
• Collect set of genes at distances
2,3,…,n along resulting traversals
Glycolysis
+
TCA
Original Method
Original EcoCyc pathways include:
• Directionality
• Cycles
gltA
Dictate directionality:
• Arbitrarily set direction (top to bottom,
clockwise)
“Snip” cycles
mdh
(pathway from EcoCyc: http://ecocyc.pangeasystems.com/)
Pathway Distance Algorithm
• For each metabolic pathway
– For each enzyme in the pathway
• Find the minimal distances from the source enzyme to all
other enzymes by solving linear programming problems
of the type:
Maximise Summation_of_Enzyme_Distances
subject to
Enzyme_Connectivity_Constraints
• Post processing “calculations” are integrated
in the algorithm (e.g. genome distance or
enzyme function conservation)
Algorithm - objective function and constraints
For each node i* (source)
Maximise
SD
i
i
i
subject to:
Dj  Di + 1,
0  Di  T,
Di* = 1
 (i,j): Lij = 1
i
SETS
– i,j: nodes
PARAMETERS
– Lij:1 if there is a link from i to j, 0 otherwise
– T: large number
CONTINUOUS VARIABLES
– Di: Distance of node i from source node
j
Algorithm - Inequalities
i*  A
Max DA+DB+DC+DD
A
s.t.
B
C
D
DA = 1
DA  DB+1
DB  DA+1
DC  DB+1
DC  DD+1
DD  DC+1
DD  DB+1
1 A
2 B
C 3
3 D
Key Features of Algorithm
• Hierarchical solution procedure
• Based on linear programming
techniques
• Using an enzyme-node network
representation
Advantages of Algorithm
• Efficiency in tackling
– pathway circularity
– reaction directionality
• Modest computational times
• Implementation within GAMS software
system
Metabolic pathways
• We encoded 68 E. coli small molecule
metabolism (SMM) pathways, these
pathways were derived from EcoCyc
• This represents a set of 594 enzymes
• Pathway distances ranged from 2 to 15
Pathway Distance and
Genome Distance
• Calculate minimal pathway distances for
all gene pairs in each pathway
• For the same pairs, calculate the base
pair separation of the genes encoding
the enzymes in the E. coli genome (Dg)
• Plot percentage of gene pairs within a
certain genome distance against
pathway distance
Shorter genomic distances are more likely at smaller pathway
distances
20.00%
Cumulative percentages
18.00%
16.00%
14.00%
12.00%
10.00%
8.00%
6.00%
4.00%
2.00%
0.00%
2
3
4
5
6
7
8
9
10
11
12
13
14
Pathw ay Distance
<100bp
<1000bp
<10000bp
<100000bp
15
Genome Distance - Conclusions
• Strong correlation between Dp and Dg
• Genes with small Dp tend to have
shorter Dg
• Genes involved in nearby metabolic
reactions are genomically clustered
Pathway Distance and Function
• Calculate minimal pathway distances for
all gene pairs in each pathway
• Compare the EC numbers assigned to
the genes in each pair
e.g. G-3-P dehydrogenase
12. enzyme
1.2.1.12 specific
1. NAD/NADP as
1. oxidoreductase
acceptor
2. acts on aldehyde
or oxo group
1.2.1.12
1.2.1.20
L3 cons
1.2.1.12
2.2.1.20
No cons
Percentage of pairs at
pathway distance
Pathway distance and EC number conservation
100.00
90.00
80.00
70.00
60.00
50.00
40.00
30.00
20.00
10.00
0.00
0
2
4
6
8
10
12
14
Pathway Distance
None
Level 1
Levels 1+2
Level 1+2+3
All levels
16
Function - Conclusions
• No observable correlation between
pathway distance and function (as
represented by EC number)
• Enzymatic chemistries are varied along
the conversion from one substrate to
the next and aren’t performed in
‘blocks’ of similar catalysis
Conclusions - Algorithm
• We have an effective, correct and rapid
algorithm to calculate metabolic
distance
• The Dp metric can be usefully used as a
measure protein functional relation
Conclusions - Biology
• As expect pathway distance correlates
with genome distance
• Pathway distance does not correlate
with function as determined by EC
number
Acknowledgements
• Sarah Teichmann, University College
London
• Peter Karp, SRI international, Melno
Park, CA
• Monica Riley, Alida PellegriniToole, Marine Biological Laboratory,
Woods Hole, MA
A rapid algorithm for generating
minimal pathway distances:
Pathway distance correlates with genome
distance but not enzyme function
Stuart Rison1*, Evangelos Simeonidis2, Janet Thornton1,3,
David Bogle2, Lazaros Papageorgiou2#
Department of Biochemistry and Molecular Biology and
2 Department of Chemical Engineering,
University College London, London, WC1E 6BT, UK
3 Department of Crystallography,
Birkbeck College, Malet Street, London, WC1E 7HX, UK
1
Corresponding author (biology): [email protected]
# Corresponding author (algorithm): [email protected]
*
All distances
Cumulative percentages
100%
80%
60%
40%
20%
0%
2
3
4
5
6
7
8
9
10 11 12 13 14 15
Pathw ay distance
100
1000
10000
100000
1000000
10000000
Cumulative percentage of
pairs
Pathway distance and EC number conservation
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
2
3
4
5
6
7
8
9
10
11
12
13
14
Pathway distance
None
Level 1
Levels 1+2
Levels 1+2+3
All levels
15
• i*  A
A
B
C
D
F
E
DA = 1
DA  DB+1
DB  DA+1
DC  DB+1
DC  DD+1
DD  DC+1
DD  DB+1
DE  DD+1
DE  DF+1
DF  DC+1
DF  DE+1
1 A
2 B
C 3
3 D
F
4 E
4