ppt - Chair of Computational Biology

Download Report

Transcript ppt - Chair of Computational Biology

V21 Metabolic Pathway Analysis (MPA)
Metabolic Pathway Analysis searches for meaningful structural and functional units
in metabolic networks. The most promising, very similar approaches are based on
convex analysis and use the sets of elementary flux modes (Schuster et al. 1999,
2000) and extreme pathways (Schilling et al. 2000).
Both sets span the space of feasible steady-state flux distributions by nondecomposable routes, i.e. no subset of reactions involved in an EFM or EP can hold
the network balanced using non-trivial fluxes.
MPA can be used to study e.g.
- routing + flexibility/redundancy of networks
- functionality of networks
- idenfication of futile cycles
- gives all (sub)optimal pathways with respect to product/biomass yield
- can be useful for calculability studies in MFA
Klamt et al. Bioinformatics 19, 261 (2003)
21. Lecture WS 2004/05
Bioinformatics III
1
Metabolic Pathway Analysis: Elementary Flux Modes
The technique of Elementary Flux Modes (EFM) was developed prior to extreme
pathways (EP) by Stephan Schuster, Thomas Dandekar and co-workers:
Pfeiffer et al. Bioinformatics, 15, 251 (1999)
Schuster et al. Nature Biotech. 18, 326 (2000)
The method is very similar to the „extreme pathway“ method to construct a basis
for metabolic flux states based on methods from convex algebra.
Extreme pathways are a subset of elementary modes, and for many systems, both
methods coincide.
Are the subtle differences important?
21. Lecture WS 2004/05
Bioinformatics III
2
Elementary Flux Modes
Start from list of reaction equations and a declaration of reversible and irreversible
reactions and of internal and external metabolites.
E.g. reaction scheme of monosaccharide
metabolism. It includes 15 internal
metabolites, and 19 reactions.
 S has dimension 15  19.
Fig.1
It is convenient to reduce this matrix
by lumping those reactions that
necessarily operate together.
 {Gap,Pgk,Gpm,Eno,Pyk},
 {Zwf,Pgl,Gnd}
Such groups of enzymes can be detected automatically.
This reveals another two sequences {Fba,TpiA} and {2 Rpe,TktI,Tal,TktII}.
Schuster et al. Nature Biotech 18, 326 (2000)
21. Lecture WS 2004/05
Bioinformatics III
3
Elementary Flux Modes
Lumping the reactions in any one sequence gives the following reduced system:
FP2
F6P
GAP
R5P
T(0)=
Ru5P
Construct initial tableau by combining
S with identity matrix:
1
0
...
0
0
0
1
0
0
0
1
...
0
0
-1
0
2
0
0
0
...
0
-1
0
0
0
1
0
0
...
0
-2
0
2
1
-1
0
0
...
0
0
0
0
-1
0
0
0
...
0
1
0
0
0
0
0
0
...
0
0
1
-1
0
0
0
0
...
0
0
-1
1
0
0
0
0
...
1
0
0
0
0
-1
Pgi
{Fba,TpiA}
Rpi
{2Rpe,TktI,Tal,TktII}
{Gap,Pgk,Gpm,Eno,Pyk}
{Zwf,Pgl,Gnd}
Pfk
Fbp
Prs_DeoB
reversible
irreversible
Schuster et al. Nature Biotech 18, 326 (2000)
21. Lecture WS 2004/05
Bioinformatics III
4
Elementary Flux Modes
Aim again: bring all entries
of right part of matrix to 0.
1
0
0
1
0
0
0
-1
0
2
0
-1
0
0
0
1
-2
0
2
1
-1
0
0
0
-1
0
1
0
0
0
0
0
1
-1
0
0
0
-1
1
0
0
0
0
0
0
-1
0
0
1
0
0
0
-1
0
2
0
0
0
-2
-1
3
0
0
0
-1
0
0
1
-1
0
0
0
-1
1
0
0
0
0
0
0
-1
1
0
0
0
0
1
2
0
0
2
1
-1
1
E.g. 2*row3 - row4 gives
„reversible“ row with 0 in column 10
1
1
T(0)=
1
New „irreversible“ rows with 0 entry
in column 10 by row3 + row6 and
by row4 + row7.
1
1
1
In general, linear combinations
of 2 rows corresponding
to the same type of directionality go into the part of
the respective type in the
tableau. Combinations by
different types go into the
T(1)=
„irreversible“ tableau
because at least 1 reaction is
irreversible. Irreversible reactions
can only combined using positive
coefficients.
1
1
1
2
-1
1
1
1
1
1
1
Schuster et al. Nature Biotech 18, 326 (2000)
21. Lecture WS 2004/05
Bioinformatics III
5
Elementary Flux Modes
Aim: zero column 11.
1
Include all possible (direction-wise
allowed) linear combinations of
rows.
0
0
1
0
0
0
-1
0
2
0
0
0
-2
-1
3
0
0
0
-1
0
0
1
-1
0
0
0
-1
1
0
0
0
0
0
0
-1
1
0
0
0
0
1
2
0
0
2
1
-1
0
0
1
0
0
0
0
-2
-1
3
0
0
0
-1
0
0
0
0
0
-1
1
0
0
0
0
1
2
0
0
2
1
-1
0
0
-1
2
0
1
0
0
1
-2
0
1
0
0
0
0
0
1
2
-1
1
1
1
T(1)=
1
1
1
1
2
-1
1
1
T(2)=
1
1
1
1
-1
continue with columns 1214.
21. Lecture WS 2004/05
1
Schuster et al. Nature Biotech 18, 326 (2000)
Bioinformatics III
6
Elementary Flux Modes
In the course of the algorithm, one must avoid
- calculation of nonelementary modes (rows that contain fewer zeros than the row
already present)
- duplicate modes (a pair of rows is only combined if it fulfills the condition
S(mi(j))  S(mk(j))  S(ml(j+1)) where S(ml(j+1)) is the set of positions of 0 in this row.
- flux modes violating the sign restriction for the irreversible reactions.
Final tableau
T(5) =
1
1
0
0
2
0
1
0
0
0
...
-2
0
1
1
1
3
0
0
0
...
...
0
2
1
1
5
3
2
0
0
0
0
1
0
0
1
0
0
1
5
1
4
-2
0
0
1
0
6
-5
-1
2
2
0
6
0
1
0
...
...
0
0
0
0
0
0
1
1
0
0
...
...
...
0
0
This shows that the number of rows may decrease or increase in the course of the
algorithm. All constructed elementary modes are irreversible.
Schuster et al. Nature Biotech 18, 326 (2000)
21. Lecture WS 2004/05
Bioinformatics III
7
Elementary Flux Modes
Graphical representation of the elementary flux modes of the monosaccharide
metabolism. The numbers indicate the relative flux carried by the enzymes.
Fig. 2
Schuster et al. Nature Biotech 18, 326 (2000)
21. Lecture WS 2004/05
Bioinformatics III
8
Two approaches for Metabolic Pathway Analysis?
The pathway P(v) is an elementary flux mode if it fulfills conditions C1 – C3.
(C1) Pseudo steady-state. S  e = 0. This ensures that none of the metabolites is
consumed or produced in the overall stoichiometry.
(C2) Feasibility: rate ei  0 if reaction is irreversible. This demands that only
thermodynamically realizable fluxes are contained in e.
(C3) Non-decomposability: there is no vector v (unequal to the zero vector and to
e) fulfilling C1 and C2 and that P(v) is a proper subset of P(e). This is the core
characteristics for EFMs and EPs and supplies the decomposition of the network
into smallest units (able to hold the network in steady state).
C3 is often called „genetic independence“ because it implies that the enzymes in
one EFM or EP are not a subset of the enzymes from another EFM or EP.
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
9
Two approaches for Metabolic Pathway Analysis?
The pathway P(e) is an extreme pathway if it fulfills conditions C1 – C3 AND
conditions C4 – C5.
(C4) Network reconfiguration: Each reaction must be classified either as exchange
flux or as internal reaction. All reversible internal reactions must be split up into
two separate, irreversible reactions (forward and backward reaction).
(C5) Systemic independence: the set of EPs in a network is the minimal set of
EFMs that can describe all feasible steady-state flux distributions.
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
10
Two approaches for Metabolic Pathway Analysis?
A(ext) B(ext) C(ext)
R1
R4
A
R5
R6
R2
R3
R8
B
R7
C
R9
P
D
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
11
Reconfigured Network
A(ext) B(ext) C(ext)
R1
R2
R4
B
R7f
A
R5
R6
R3
R8
R7b
C
R9
P
D
3 EFMs are not systemically independent:
EFM1 = EP4 + EP5
EFM2 = EP3 + EP5
EFM4 = EP2 + EP3
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
12
Property 1 of EFMs
The only difference in the set of EFMs emerging upon reconfiguration consists in
the two-cycles that result from splitting up reversible reactions. However, two-cycles
are not considered as meaningful pathways.
Valid for any network: Property 1
Reconfiguring a network by splitting up reversible reactions leads to the same set of
meaningful EFMs.
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
13
Software: FluxAnalyzer
What is the consequence of when all exchange fluxes (and hence all
reactions in the network) are irreversible?
EFMs and EPs always co-incide!
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
14
Property 2 of EFMs
Property 2
If all exchange reactions in a network are irreversible then the sets of meaningful
EFMs (both in the original and in the reconfigured network) and EPs coincide.
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
15
Reconfigured Network
A(ext) B(ext) C(ext)
R1
R2
R4
B
R7f
A
R5
R6
R3
R8
R7b
C
R9
P
D
3 EFMs are not systemically independent:
EFM1 = EP4 + EP5
EFM2 = EP3 + EP5
EFM4 = EP2 + EP3
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
16
Comparison of EFMs and EPs
Problem
EFM (network N1)
EP (network N2)
Recognition of
operational modes:
routes for converting
exclusively A to P.
4 genetically indepenSet of EPs does not contain
dent routes
all genetically independent
(EFM1-EFM4)
routes. Searching for EPs
leading from A to P via B,
no pathway would be found.
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
17
Comparison of EFMs and EPs
Problem
EFM (network N1)
EP (network N2)
Finding all the
optimal routes:
optimal pathways for
synthesizing P during
growth on A alone.
EFM1 and EFM2 are
optimal because they
yield one mole P per
mole substrate A
(i.e. R3/R1 = 1),
whereas EFM3 and
EFM4 are only suboptimal (R3/R1 = 0.5).
One would only find the
suboptimal EP1, not the
optimal routes EFM1 and
EFM2.
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
18
Comparison of EFMs and EPs
Problem
EFM (network N1)
EP (network N2)
Analysis of network
flexibility (structural
robustness,
redundancy):
relative robustness of
exclusive growth on
A or B.
4 pathways convert A
to P (EFM1-EFM4),
whereas for B only one
route (EFM8) exists.
When one of the
internal reactions (R4R9) fails, for production
of P from A 2 pathways
will always „survive“.
By contrast, removing
reaction R8 already
stops the production of
P from B alone.
Only 1 EP exists for
producing P by substrate A
alone, and 1 EP for
synthesizing P by (only)
substrate B. One might
suggest that both
substrates possess the
same redundancy of
pathways, but as shown by
EFM analysis, growth on
substrate A is much more
flexible than on B.
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
19
Comparison of EFMs and EPs
Problem
EFM (network N1)
EP (network N2)
Relative importance
of single reactions:
relative importance of
reaction R8.
R8 is essential for
producing P by substrate
B, whereas for A there is
no structurally „favored“
reaction (R4-R9 all occur
twice in EFM1-EFM4).
However, considering the
optimal modes EFM1,
EFM2, one recognizes the
importance of R8 also for
growth on A.
Consider again biosynthesis
of P from substrate A (EP1
only). Because R8 is not
involved in EP1 one might
think that this reaction is not
important for synthesizing P
from A. However, without this
reaction, it is impossible to
obtain optimal yields (1 P per
A; EFM1 and EFM2).
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
20
Comparison of EFMs and EPs
Problem
EFM (network N1)
EP (network N2)
Enzyme subsets
and excluding
reaction pairs:
suggest regulatory
structures or rules.
R6 and R9 are an enzyme
subset. By contrast, R6
and R9 never occur
together with R8 in an
EFM. Thus (R6,R8) and
(R8,R9) are excluding
reaction pairs.
The EPs pretend R4 and R8
to be an excluding reaction
pair – but they are not
(EFM2). The enzyme
subsets would be correctly
identified.
(In an arbitrary composable
steady-state flux distribution they
might occur together.)
However, one can construct simple
examples where the EPs would also
pretend wrong enzyme subsets (not
shown).
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
21
Comparison of EFMs and EPs
Problem
EFM (network N1)
EP (network N2)
Pathway length:
shortest/longest
pathway for
production of P from
A.
The shortest pathway
from A to P needs 2
internal reactions (EFM2),
the longest 4 (EFM4).
Both the shortest (EFM2)
and the longest (EFM4)
pathway from A to P are not
contained in the set of EPs.
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
22
Comparison of EFMs and EPs
Problem
EFM (network N1)
EP (network N2)
Removing a
reaction and
mutation studies:
effect of deleting R7.
All EFMs not involving the
specific reactions build up
the complete set of EFMs
in the new (smaller) subnetwork. If R7 is deleted,
EFMs 2,3,6,8 „survive“.
Hence the mutant is
viable.
Analyzing a subnetwork
implies that the EPs must be
newly computed. E.g. when
deleting R2, EFM2 would
become an EP. For this
reason, mutation studies
cannot be performed easily.
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
23
Comparison of EFMs and EPs
Problem
EFM (network N1)
EP (network N2)
Constraining
reaction
reversibility:
effect of R7 limited to
B  C.
For the case of R7, all
EFMs but EFM1 and
EFM7 „survive“ because
the latter ones utilize R7
with negative rate.
In general, the set of EPs
must be recalculated:
compare the EPs in network
N2 (R2 reversible) and N4
(R2 irreversible).
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
24
Software: FluxAnalyzer
FluxAnalyzer has
both EPs and EFMs
implemented.
Allows convenient
studies of metabolic
systems.
Klamt et al. Bioinformatics 19, 261 (2003)
21. Lecture WS 2004/05
Bioinformatics III
25
Summary
EFM are a robust method that offers great opportunities for studying functional and
structural properties in metabolic networks.
Klamt & Stelling suggest that the term „elementary flux modes“ should be used
whenever the sets of EFMs and EPs are identical.
In cases where they don‘t, EPs are a subset of EFMs.
It remains to be understood more thoroughly how much valuable information about
the pathway structure is lost by using EPs.
Ongoing Challenges:
- study really large metabolic systems by subdividing them
- combine metabolic model with model of cellular regulation.
Klamt & Stelling Trends Biotech 21, 64 (2003)
21. Lecture WS 2004/05
Bioinformatics III
26
Integrated Analysis of Metabolic and Regulatory Networks
Sofar, studies of large-scale cellular networks have focused on their connectivities.
The emerging picture shows a densely-woven web where almost everything is
connected to everything.
In the cell‘s metabolic network, hundreds of substrates are interconnected through
biochemical reactions.
Although this could in principle lead to the simultaneous flow of substrates in
numerous directions, in practice metabolic fluxes pass through specific pathways
( high flux backbone, V20).
Topological studies sofar did not consider how the modulation of this connectivity
might also determine network properties.
Therefore it is important to correlate the network topology (picture derived from
EFMs and EPs) with the expression of enzymes in the cell.
21. Lecture WS 2004/05
Bioinformatics III
27
Analyze transcriptional control in metabolic networks
Regulatory and metabolic functions of cells are mediated by networks of interacting
biochemical components.
Metabolic flux is optimized to maximize metabolic efficiency under different
conditions.
Control of metabolic flow:
- allosteric interactions
- covalent modifications involving enzymatic activity
- transcription (revealed by genome-wide expression studies)
Here: N. Barkai and colleagues analyzed published experimental expression data of
Saccharomyces cerevisae.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
28
Recurrence signature algorithm
Availability of DNA microarray data  study transcriptional response of a complete
genome to different experimental conditions.
An essential task in studying the global structure of transcriptional networks is the
gene classification.
Commonly used clustering algorithms classify genes successfully when applied to
relatively small data sets, but their application to large-scale expression data is
limited by 2 well-recognized drawbacks:
- commonly used algorithms assign each gene to a single cluster, whereas in fact
genes may participate in several functions and should thus be included in several
clusters
- these algorithms classify genes on the basis of their expression under all
experimental conditions, whereas cellular processes are generally affected only by
a small subset of these conditions.
Ihmels et al. Nat Genetics 31, 370 (2002)
21. Lecture WS 2004/05
Bioinformatics III
29
Recurrence signature algorithm
Aim: identify transcription „modules“ (TMs).
 a set of randomly selected genes is unlikely to be identical to the genes of any
TM. Yet many such sets do have some overlap with a specific TM.
In particular, sets of genes that are compiled according to existing knowledge of
their functional (or regulatory) sequence similarity may have a significant overlap
with a transcription module.
Algorithm receives a gene set that partially overlaps a TM and then provides the
complete module as output. Therefore this algorithm is referred to as „signature
algorithm“.
Ihmels et al. Nat Genetics 31, 370 (2002)
21. Lecture WS 2004/05
Bioinformatics III
30
Recurrence signature algorithm
normalization
of data
identify modules
classify genes
into modules
a, The signature algorithm.
b , Recurrence as a reliability measure. The signature algorithm is applied to distinct input
sets containing different subsets of the postulated transcription module. If the different input
sets give rise to the same module, it is considered reliable.
c, General application of the recurrent signature method.
Ihmels et al. Nat Genetics 31, 370 (2002)
21. Lecture WS 2004/05
Bioinformatics III
31
Normalize expression matrices
Collect from literature expression dataset composed of over 1000 conditions,
including environmental stresses, profiles of deletion mutants and natural
processes such as cell cycle.
Element Egc of the gene expression matrix contains the log-expression change of
gene g {1, ..., NG} under the experimental conditions c {1, ..., NC} where NG and
NC denote the total number of genes and conditions, respectively.
Introduce 2 normalized expression matrices EGgc and ECgc with zero mean and unit
variance with respect to genes and conditions
EGgc
E 
gc 2
G
gG
gG
0
ECgc
1
E 
where ...x denote the average with respect to x.
gc 2
C
cC
cC
0
1
Ihmels et al. Nat Genetics 31, 370 (2002)
21. Lecture WS 2004/05
Bioinformatics III
32
Experiment signature SC
The input set consists of NI genes:


GI  g1 ,..., g N I  G
Score each experimental condition by the average expression change over the
genes of the input set. The condition score is:
gc
sc  EG
gGI
The experiment signature SC contains those conditions whose absolute score is
statistically significant:

SC  c  C : sc  sc
cC
 t C C

Here use tC = 2.0 as the condition threshold level and the standard deviation
expected for random fluctuations of
C 
1
NI
Ihmels et al. Nat Genetics 31, 370 (2002)
21. Lecture WS 2004/05
Bioinformatics III
33
Gene Signature SG
In the next step, score all genes by the weighted average change in the expression
with the experimental signature. The gene score is:
s g  sc ECgc
cS c
The gene signature SG contains those genes whose absolute score is statistically
significant:

SG  g  G : s g  s g
gG
 tG G

Here use tG = 3.0 as the gene threshold level and the measured standard deviation
G.
Ihmels et al. Nat Genetics 31, 370 (2002)
21. Lecture WS 2004/05
Bioinformatics III
34
Fusion of signatures
Apply signature algorithm to reference input set GIref and to a set of input sets {GI(i)}
that are obtained from GIref ( identify robust modules!)
Each set contains a fraction of the „wanted“ genes in GI(i) and some unrelated
genes that were selected at random.
The result is a reference signature Sref and a collection of modified signatures {Si}.
The overlap between any of these signatures and the reference signature is defined
as
S S
OLref

i
i
ref
Si  S ref
where |...| refers to the size of a set and  denotes intersection.
Ihmels et al. Nat Genetics 31, 370 (2002)
21. Lecture WS 2004/05
Bioinformatics III
35
Fusion of signatures
All signatures Si whose overlap with the reference signature exceeds a certain
threshold are included in the set of recurrent signatures

R  Si : OLref
 tR
i

The threshold tR must be chosen to be large enough to discriminate against random
fluctuations, but small enough to include a significant fraction of signatures.
Here, tR = 70%.
A module is obtained by selecting only those genes that appear in at least 80% of
all signatures in R.
Ihmels et al. Nat Genetics 31, 370 (2002)
21. Lecture WS 2004/05
Bioinformatics III
36
Fusion of signatures
Generate modules from recurrent signatures:
To fuse pairs of recurrent signatures {Si, Sj} into transcription modules:
For each pair, compute the intersect Pij = Si  Sj of genes appearing in both
signatures as well as the overlap
OLij 
Pij
Si  S j
Select the pair signature Pref with the largest associated overlap OLref as the „seed“
of a new module.
Assign all pair signatures Pij whose overlap with Pref exceeded a certain fraction tR
of OLref to the set of recurrent signatures R :
R  Pij : OL Pij , Pref   t R OLref 
Ihmels et al. Nat Genetics 31, 370 (2002)
21. Lecture WS 2004/05
Bioinformatics III
37
Fusion of signatures
Obtain gene content and scores of the associated module from R.
Remove the pairs that were assigned to R from the total „pool“ of pair signatures
{Pij}.
To avoid identification of more, less-coherent realizations of the same module,
remove also those pairs from R that would have been assigned to R for a
somewhat lower value of threshold tR unless they had a significant overlap (~75%)
with any other pair signature.
This process is iterated until all sets are assigned.
Ihmels et al. Nat Genetics 31, 370 (2002)
21. Lecture WS 2004/05
Bioinformatics III
38
Numerical test
Apply algorithm to set of Ncore genes that
are known to be co-regulated.
Then add Nrand randomly selected genes.
The addition of many random genes leaves
the output of the signature algorithm
essentially unchanged.
In detail: A reference set of Ncore co-regulated genes was composed of genes encoding either ribosomal
proteins (dashed lines) or proteins involved in amino-acid biosynthesis (dashed/dotted line).
The recurrent signature method was applied to this set as follows. First, a collection of input sets was
derived by randomly adding genes to the reference set. Second, the signature algorithm was applied to
the reference set and to the derived sets; this generates a reference signature and a collection of
perturbed signatures, respectively. Last, the overlaps between the reference signature and the perturbed
signatures were calculated. Shown is the average overlap as a function of the number of genes added to
the reference set. The different lines correspond to different choices of Ncore, shown in parentheses.
Ihmels et al. Nat Genetics 31, 370 (2002)
21. Lecture WS 2004/05
Bioinformatics III
39
Correlation between genes of the same metabolic pathway
Distribution of the average correlation
between genes assigned to the same
metabolic pathway in the KEGG database.
The distribution corresponding to random
assignment of genes to metabolic
pathways of the same size is shown for
comparison. Importantly, only genes
coding for enzymes were used in the
random control.
Interpretation: pairs of genes associated
with the same metabolic pathway show a
similar expression pattern.
However, typically only a set of the
genes assigned to a given
pathway are coregulated.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
40
Correlation between genes of the same metabolic pathway
Genes of the glycolysis pathway
(according KEGG) were clustered
and ordered based on the correlation
in their expression profiles.
Shown here is the matrix of their
pair-wise correlations.
The cluster of highly correlated
genes (orange frame) corresponds
to genes that encode the central
glycolysis enzymes.
The linear arrangement of these
genes along the pathway is shown at
right.
Of the 46 genes assigned to the
glycolysis pathway in the KEGG
database, only 24 show a correlated
expression pattern.
In general, the coregulated genes
belong to the central pieces of
pathways.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
41
Coexpressed enzymes often catalyze linear chain of reactions
Coregulation between enzymes
associated with central metabolic
pathways. Each branch
corresponds to several enzymes.
In the cases shown, only one of the
branches downstream of the
junction point is coregulated with
upstream genes.
Interpretation: coexpressed
enzymes are often arranged in a
linear order, corresponding to a
metabolic flow that is directed in a
particular direction.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
42
Co-regulation at branch points
To examine more systematically whether coregulation enhances the linearity of
metabolic flow, analyze the coregulation of enzymes at metabolic branch-points.
Search KEGG for metabolic compounds that are involved in exactly 3 reactions.
Only consider reactions that exist in S.cerevisae.
3-junctions can integrate metabolic flow (convergent junction)
or allow the flow to diverge in 2 directions (divergent junction).
In the cases where several reactions are catalyzed by the same enzymes, choose
one representative so that all junctions considered are composed of precisely 3
reactions catalyzed by distinct enzymes.
Each 3-junction is categorized according to the correlation pattern found between
enzymes catalyzing its branches. Correlation coefficients > 0.25 are considered
significant.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
43
Coregulation pattern in three-point junctions
All junctions corresponding to metabolites that participate in exactly 3
reactions (according to KEGG) were identified and the correlations
between the genes associated with each such junction were calculated.
The junctions were grouped according to the directionality of the
reactions, as shown.
Divergent junctions, which allow the flow of metabolites in two
alternative directions, predominantly show a linear coregulation pattern,
where one of the emanating reaction is correlated with the incoming
reaction (linear regulatory pattern) or the two alternative outgoing
reactions are correlated in a context-dependent manner with a distinct
isozyme catalyzing the incoming reaction (linear switch).
By contrast, the linear regulatory pattern is significantly less abundant in
convergent junctions, where the outgoing flow follows a unique
direction, and in conflicting junctions that do not support metabolic flow.
Most of the reversible junctions comply with linear regulatory patterns.
Indeed, similar to divergent junctions, reversible junctions allow
metabolites to flow in two alternative directions. Reactions were
counted as coexpressed if at least two of the associated genes were
significantly correlated (correlation coefficient >0.25). As a random
control, we randomized the identity of all metabolic genes and repeated
the analysis.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
In the majority of divergent
junctions, only one of the
emanating branches is significantly
coregulated with the incoming
reaction that synthesizes the
metabolite.
44
Co-regulation at branch points: conclusions
The observed co-regulation patterns correspond to a linear metabolic flow, whose
directionality can be switched in a condition-specific manner.
When analyzing junctions that allow metabolic flow in a larger number of
directions, there also only a few important branches are coregulated with the
incoming branch.
Therefore: transcription regulation is used to enhance the linearity of metabolic
flow, by biasing the flow toward only a few of the possible routes.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
45
Connectivity of metabolites
The connectivity of a given metabolite
is defined as the number of reactions
connecting it to other metabolites.
Shown are the distributions of
connectivity between metabolites in an
unrestricted network () and in a
network where only correlated
reactions are considered ().
In accordance with previous results
(Jeong et al. 2000) , the connectivity
distribution between metabolites
follows a power law (log-log plot).
In contrast, when coexpression is
used as a criterion to distinguish
functional links, the connectivity
distribution becomes exponential
(log-linear plot).
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
46
Differential regulation of isozymes
Observe that isozymes at junction points are often preferentially
coexpressed with alternative reactions.
 investigate their role in the metabolic network more systematically.
Two possible functions of isozymes
associated with the same metabolic
reaction.
An isozyme pair could provide redundancy which may be needed for buffering
genetic mutations or for amplifying metabolite production. Redundant isozymes are
expected to be coregulated.
Alternatively, distinct isozymes could be dedicated to separate biochemical
pathways using the associated reaction. Such isozymes are expected to be
differentially expressed with the two alternative processes.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
47
Differential regulation of isozymes in central metabolic PW
Arrows represent metabolic
pathways composed of a sequence
of enzymes.
Coregulation is indicated with the
same color (e.g., the isozyme
represented by the green arrow is
coregulated with the metabolic
pathway represented by the green
arrow).
 Most members of isozyme pairs
are separately coregulated with
alternative processes.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
48
Differential regulation of isozymes
Regulatory pattern of all gene pairs
associated with a common metabolic
reaction (according to KEGG).
All such pairs were classified into several
classes:
(1) parallel, where each gene is
correlated with a distinct connected
reaction (a reaction that shares a
metabolite with the reaction catalyzed by
the respective gene pair);
(2) selective, where only one of the
enzymes shows a significant correlation
with a connected reaction; and
(3) converging, where both enzymes
were correlated with the same reaction.
Correlations coefficients >0.25 were
considered significant. To be
counted as parallel, rather than
converging, we demanded that the
correlation with the alternative
reaction be <80% of the correlation
with the preferred reaction.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
49
Differential regulation of isozymes: interpretation
The primary role of isozyme multiplicity is to allow for differential regulation of
reactions that are shared by separated processes.
Dedicating a specific enzyme to each pathway may offer a way of independently
controlling the associated reaction in response to pathway-specific requirements,
at both the transcriptional and the post-transcriptional levels.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
50
Genes coexpressed with metabolic pathways
Identify the coregulated subparts of each metabolic pathway and identify relevant
experimental conditions that induce or repress the expression of the pathway
genes.
Also associate additional genes showing similar expression profiles with each
pathway using the signature algorithm.
Input: set of genes, some of which are expected to be coregulated.
Output: coregulated part of the input and additional coregulated genes together
with the set of conditions where the coregulation is realized.
Numerous genes were found that are not directly involved in enzymatic steps:
- transporters
- transcription factors
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
51
Co-expression of transporters
Transporter genes are
co-expressed with the relevant
metabolic pathways providing
the pathways with its metabolites.
Co-expression is marked in green.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
52
Co-regulation of transcription factors
Transcription factors are often co-regulated with their regulated pathways. Shown
here are transcription factors which were found to be co-regulated in the analysis.
Co-regulation is shown by color-coding such that the transcription factor and the
associated pathways are of the same color.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
53
Hierarchical modularity in the metabolic network
Sofar: co-expression analysis revealed a strong tendency toward coordinated
regulation of genes involved in individual metabolic pathways.
Does transcription regulation also define a higher-order metabolic organization, by
coordinated expression of distinct metabolic pathways?
Based on observation that feeder pathways (which synthesize metabolites) are
frequently coexpressed with pathways using the synthesized metabolites.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
54
Feeder-pathways/enzymes
Feeder pathways or genes
co-expressed with the
pathways they fuel. The
feeder pathways (light blue)
provide the main pathway
(dark blue) with metabolites
in order to assist the main
pathway, indicating that coexpression extends beyond
the level of individual
pathways.
These results can be
interpreted in the following
way: the organism will
produce those enzymes that
are needed.
21. Lecture WS 2004/05
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
Bioinformatics III
55
Hierarchical modularity in the metabolic network
Derive hierarchy by applying an iterative
signature algorithm to the metabolic pathways,
and decreasing the resolution parameter
(coregulation stringency) in small steps.
Each box contains a group of coregulated genes
(transcription module). Strongly associated
genes (left) can be associated with a specific
function, whereas moderately correlated
modules (right) are larger and their function is
less coherent.
The merging of 2 branches indicates that the
associated modules are induced by similar
conditions.
All pathways converge to one of 3 low-resolution
modules: amino acid biosynthesis, protein
synthesis, and stress.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
56
Hierarchical modularity in the metabolic network
Although amino acids serve as building blocks for proteins, the expression of genes
mediating these 2 processes is clearly uncoupled!
This may reflect the association of rapid cell growth (which triggers enhanced
protein synthesis) with rich growth conditions, where amino acids are readily
available and do not need to be synthesized.
Amino acid biosynthesis genes are only required when external amino acids are
scarce.
In support of this view, a group of amino acid transporters converged to the protein
synthesis module, together with other pathways required for rapid cell growth
(glucose fermentation, nucleotide synthesis and fatty acid synthesis).
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
57
Global network properties
Jeong et al. showed that the structural connectivity between metabolites imposes a
hierarchical organization of the metabolic network. That analysis was based on
connectivity between substrates, considering all potential connections.
Here, analysis is based on coexpression of enzymes.
In both approaches, related metabolic pathways were clustered together!
There are, however, some differences in the particular groupings (not discussed
here),
and importantly, when including expression data the connectivity pattern of
metabolites changes from a power-law dependence to an exponential one
corresponding to a network structure with a defined scale of connectivity.
This reflects the reduction in the complexity of the network.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
58
Summary
Transcription regulation is prominently involved in shaping the metabolic network of
S. cerevisae.
1
Transcription leads the metabolic flow toward linearity.
2
Individual isozymes are often separately coregulated with distinct processes,
providing a means of reducing crosstalk between pathways using a common
reaction.
3
Transcription regulation entails a higher-order structure of the metabolic
network.
It exists a hierarchical organization of metabolic pathways into groups of
decreasing expression coherence.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
21. Lecture WS 2004/05
Bioinformatics III
59