Gene Expression Analysis, DNA Chips and Genetic Networks

Transcript Gene Expression Analysis, DNA Chips and Genetic Networks

Biological Network Analysis:
Introduction to Metabolic
Networks
Tomer Shlomi
Winter 2008
Lecture Outline
1. Cellular metabolism
2. Metabolic network models
3. Constraint-based modeling
4. Optimization methods
1. Cellular metabolism
Metabolism (I)
Metabolism is the totality of all the chemical
reactions that operate in a living organism.
Catabolic reactions
Breakdown and produce energy
Anabolic reactions
Use energy and build up essential
cell components
Metabolism (II)
“Metabolism is the process involved in
the maintenance of life. It is
comprised of a vast repertoire of
enzymatic reactions and transport
processes used to convert thousands
of organic compounds into the
various molecules necessary to
support cellular life” Kenneth et al.
2003
Why study metabolism? (I)
1.
Basic science - it’s the essence of life..
2. Tremendous importance in Medicine
a. In born errors of metabolism cause acute
symptoms and even death on early age
b. Metabolic diseases (obesity, diabetics) are
major sources of morbidity and mortality.
c. Metabolic enzymes and their regulators
gradually becoming viable drug targets
Why study metabolism? (II)
3. Bioengineering applications
a. Design strains for production of biological
products of interest
b. Generation of bio- fuels
4. Probably the best understood of all
cellular networks: metabolic, PPI,
regulatory, signaling
Metabolites and Biochemical
Reactions
•
Metabolite - an organic substance:
– Sugars – glucose, galactose, lactose, etc’
– Carbonhydrates – glycogen, glucan, etc’
– Amino-acids – histidine, proline, methionine, etc’
– Nucleotides – cytosine, guanine, etc’
– Lipids
– Chemical energy carriers – ATP, NADH, etc’
– Atoms – oxygen, hydrogen
•
Biochemical reaction: the process in which one or more substrate
molecules are converted (usually with the help of an enzyme) to product
molecules
Glucose + ATP
Glucokinase
Glucose-6-Phosphate + ADP
Metabolic Networks
• A set of reactions and the corresponding metabolites
• A directed hyper-graph representation
– Nodes - represent metabolites
– Edges - represent biochemical reactions
Metabolites (I)
The 744 reactions of E.coli small-molecule metabolism involve a total
of 791 different substrates.
On average, each reaction contains 4.0 substrates.
Number of
reactions
containing
varying numbers
of substrates
(reactants plus
products).
18. Lecture WS
Metabolites (II)
Each distinct substrate occurs in an average of 2.1 reactions.
18. Lecture WS
Bioinformatics III
11
Reactions Catalyzed by More Than one Enzyme
Diagram showing the number of reactions
that are catalyzed by one or more enzymes.
Most reactions are catalyzed by one enzyme,
some by two, and very few by more than two
enzymes.
For 84 reactions, the corresponding enzyme is not yet encoded in EcoCyc.
What may be the reasons for isozyme redundancy?
(1) the enzymes that catalyze the same reaction are homologs and have
duplicated (or were obtained by horizontal gene transfer),
acquiring some specificity but retaining the same mechanism (divergence)
(2) the reaction is easily „invented“; therefore, there is more than one protein
family that is independently able to perform the catalysis (convergence).
Enzymes that catalyze more than one reaction
Genome predictions usually assign a single enzymatic function.
However, E.coli is known to contain many multifunctional enzymes.
Of the 607 E.coli enzymes, 100 are multifunctional, either having the same
active site and different substrate specificities or different active sites.
Number of enzymes that catalyze one or
more reactions. Most enzymes catalyze
one reaction; some are multifunctional.
The enzymes that catalyze 7 and 9 reactions are purine nucleoside
phosphorylase and nucleoside diphosphate kinase.
Pathways (I)
EcoCyc describes 131 pathways:
energy metabolism
nucleotide and amino acid biosynthesis
secondary metabolism
Length distribution of
EcoCyc pathways
Pathways vary in length from a
single reaction step to 16 steps
with an average of 5.4 steps.
Ouzonis, Karp, Genome Res. 10, 568 (2000)
18. Lecture WS
Bioinformatics III
14
Pathways (II)
However, there is no precise
biological definition of a
pathway.
The partitioning of the
metabolic network into
pathways (including the wellknown examples of
biochemical pathways) is
somehow arbitrary.
These decisions of course
also affect the distribution
of pathway lengths.
Pathway in the Context of a System
Reactions participating in more than one pathway
The 99 reactions belonging to multiple
pathways appear to be the intersection
points in the complex network of chemical
processes in the cell.
E.g. the reaction present in 6 pathways corresponds to
the reaction catalyzed by malate dehydrogenase, a
central enzyme in cellular metabolism.
2. Metabolic Network Models
Metabolic Network Models
• The application of computational methods to predict
the network behavior usually requires additional
data other than the network topology
• A ‘GS metabolic network model’ is a collection of
such data:
–
–
–
–
–
Reaction stoichiometry
Reaction directionality
Cellular localization
Transport and exchange reactions
Gene-protein-reaction association
Metabolic Network Model:
Reaction Stoichiometry
• Stoichiometry - the quantitative relationships of the
reactants and products in reactions
1 Glucose + 1 ATP <-> 1 Glucose-6-Phosphate + 1 ADP
Metabolic Network Model:
Reaction Directionality
• Biochemical studies may test the reversibility of enzymatic
reactions
• But the directionality can differ between in vitro and in vivo
due to different temperature, pH, ionic strength, and
metabolite concentrations.
1 Glucose + 1 ATP -> 1 Glucose-6-Phosphate + 1 ADP
• A subset of the reactions in a model is uni-directional and
the remaining reactions are bi-directional
Metabolic Network Model:
Cellular Localization (I)
Metabolic Network Model:
Cellular Localization (II)
• Algorithms: PSORT and SubLoc to predict the cellular
localization of proteins based on nucleotide or amino acid
sequences
• High-throughput experimental approaches such as
immunofluorescence and GFP tagging of individual proteins.
Cytoplasm: 1 Glucose + 1 ATP -> 1 Glucose-6-Phosphate + 1 ADP
Metabolic Network Model:
Transport and Exchange Reactions
• An extra-cellular compartment is also included in the model
• Transport reaction move metabolites between compartments
(across membrane boundaries)
– Glucose[c] <-> Glucose[e]
• Exchange reaction move metabolites across the model
boundary
– Glucose[e] <->
• Uptake = in
• Secretion = out
Gene-Protein-Reaction (GPR)
Association (I)
• Formulated via Boolean logic
• Sdh protein made up of 4 peptides, catalyzes 2 reactions
Gene-Protein-Reaction (GPR)
Association (II)
• A protein complex made up of 3 proteins catalyzes a single
reaction
Gene-Protein-Reaction (GPR)
Association (III)
• Isozymes – alternative enzymes that catalyze the same
reaction
Metabolic Network Models
• A ‘GS metabolic network model’ is a collection of:
– A metabolic network
– Reaction stoichiometry
– Reaction directionality
– Cellular localization
– Transport and exchange reactions
– Gene-protein-reaction association
Model Reconstruction Process (I)
Model Reconstruction Process (II)
• Performed mainly in Bernhard Palsson’s lab in UCSD.
• Model naming convention:
Reconstruction of E. coli models
Available Metabolic Models
3. Kinetic modeling
Stoichiometric Matrix (I)
• Stoichiometric matrix – network topology with stoichiometry
of biochemical reactions (denoted S)
• A Metabolite that exists in multiple compartments is
represented with multiple rows in the matrix
• How would transport and exchange reactions represented?
Stoichiometric Matrix (II)
Kinetic Modeling: Definition
•
•
•
•
Predict changes in metabolite concentrations
m – metabolite concentrations vector - mol/mg
S – stoichiometric matrix
v – reaction rates vector
- mol/(mg*h)
dm
 S  v  S  f (m , k )
dt
Reaction rate equation
Kinetic
parameters
• Requires knowledge of m, f and k!
A set of Ordinary Differential
Equations (ODE)
Kinetic Modeling: Reaction Rate
Equations (I)
• Consider the reaction:
S->P
• A simple rate equation (Michaelis-Menten) is:
v Vmax
[s]

K M  [s]
• In this case, we have only 2 kinetic parameters – vmax and Km
Kinetic Modeling: Reaction Rate
Equations (II)
• Consider the reaction:
S + E <-> P + E
• A more complex Michaelis-Menten equation:
• In this case, we have only 4 kinetic parameters – vmax+, vmax-,
KmS, and KmP,
Kinetic Modeling: Reaction Rate
Equations (III)
• Reaction rate equations also depends (via k) on:
–
–
–
–
Regulation: effectors, inhibitors
Enzyme concentration
Surrounding reactions and molecules
pH, ion-balance, molecule-gradients, energy potentials
• Kinetics are problematic
– Obtained from test tube tests of purified enzymes
– Measurement doesn’t apply on cell environment
• Most of these parameters are unknown!
4. Constraint-based modeling
Constraint-based modeling (CBM)
(I)
• Assumes a quasi steady-state
– No changes in metabolite concentrations (within the system)
– Metabolite production and consumption rates are equal
• Representing the ‘average’ flow in the network over a long
enough period of time
dm
 S v  0
dt
• The reaction rate vector v is referred to as a ‘steady-state
flux distribution’
• No need for information on metabolite concentrations,
reaction rate equations, and kinetic parameters
CBM (II)
• In most cases, S is underdetermined, and there exist a space of
possible flux distributions v that satisfy: S v  0
• The idea in CBM is to employ a set of constraints to limit the
space of possible solutions to those more likely/correct
– Mass balance is enforced by the above equation
– Thermodynamic: irreversibility of reactions
– Enzymatic capacity: bounds on enzyme rates
– Availability of nutrients
Solution space
Correct
solutions
CBM (III)
• The solution space decreases with the addition of more
constraints
Mass balance
S·v = 0
n
Subspace of R
Thermodynamic
vi > 0
Convex cone
Capacity
vi < vmax
Bounded convex cone
CBM Example (I)
CBM Example (II)
CBM Example (III)
Determination of Likely Flux
Distributions
• In most cases lack of constraints provide a space of
solutions
• How to identify plausible solutions within this space?
• Optimization methods (next lesson)
– Maximal biomass production rate
– Minimal ATP production rate
– Minimal nutrient uptake rate
• Exploring the solution space (the following lesson)
– Extreme pathways
– Elementary modes
4. Optimization methods
Flux Balance Analysis (I)
• An optimization method for finding a feasible flux distribution
that enables maximal growth rate of the organism
• Based on the assumption that evolution optimizes microbes
growth rate
• To enable maximal growth rate the essential biomass
precursors (metabolites) should be synthesized
in the maximal rate
• Add to the model a pseudo ‘growth reaction’
representing the metabolites required for
producing 1g of the organism’s biomass
• These precursors are removed from the
metabolic network in the corresponding ratios:
41.1 ATP + 18.2 NADH + 0.2 G6P… -> biomass
For example: Biomass reaction of E. coli
Other Possible Objective Functions
Flux Balance Analysis (II)
• Searches for a steady-state flux distribution v:
S∙v=0
0
 
0
0
 
• Satisfying thermodynamic and capacity constraints:
vmin≤v ≤vmax
• With maximal growth rate
Max vbiomass
Flux Balance Analysis (II)
• Searches for a steady-state flux distribution v:
S∙v=0
•
0
 
0
0
 
How do we find this
flux
distribution
v?
Satisfying thermodynamic and capacity constraints:
Linear Programming
vmin≤v ≤vmax
• With maximal growth rate
Max vbiomass
Linear Programming Basics (I)
Linear Programming Basics (III)
Linear Programming: Types of
Solutions (I)
Linear Programming: Types of
Solutions (II)
FBA and LP: Single solution
• Assume that b2 is the ‘biomass’ reaction which we maximize
• Let b1≤5 (i.e. the maximal uptake rate of A is bounded by 5)
• One optimal solution exist in which b2=5
FBA and LP: Unbounded
• Assume that b2 is the ‘biomass’ reaction which we maximize
• Let b1≤∞ (i.e. the maximal uptake rate of A is unbounded)
• No optimal solution exist
• B2 can be as high as we want
FBA and LP: Solution space (I)
• Assume that b2 is the ‘biomass’ reaction which we maximize
• Let b1≤5
• There are many possible optimal solutions in which b2=5
• Different solutions reflect the activity of alternative
pathways:
v1+v2=b1≤5
FBA and LP: Solution space (II)
Max vbiomass
S∙v=0
vmin≤v ≤vmax
=c
• The LP solution space is convex! (bounded within the original
feasible solution space)
S∙v=0
vmin≤v ≤vmax
vbiomass=c
FBA and LP: Solution space (III)
• The convex solution space can be further analyze
• For example, finding the optimal growth solution with
minimal nutrient uptake
Min vmet_uptake
S∙v=0
vmin≤v ≤vmax
vbiomass=c
References:
• Price ND, Papin JA, Schilling CH, Palsson BO. 2003. Genomescale microbial in silico models: the constraints-based
approach. Trends Biotechnol 21(4):162-9.

Gene Expression Analysis, DNA Chips and Genetic Networks

Transcript Gene Expression Analysis, DNA Chips and Genetic Networks

Directory