Transcript General

Computational Exploration
of Metabolic Networks with
Pathway Tools
Part 2: APIs & Examples
Randy Gobbel, Ph.D.
Bioinformatics Research Group
SRI International
[email protected]
http://BioCyc.org/
Computing with Pathway
Tools: APIs
 Generic
SRI International
Bioinformatics
functions with a consistent naming
scheme
 Basic frame access functions
 Built-in functions for analysis and global statistics
 Simultaneous access to multiple KBs
 Cross-species comparisons
 Specialized KBs


MetaCyc
SchemaBase
Computing with Pathway
Tools: APIs
SRI International
Bioinformatics
 PerlCyc
interface
 Library of Perl functions for querying PGDBs via socket
connection
 Database access functions


Select_Organism, All_Pathways
Functions for performing inference / hardwired queries



Genes_Of_Reaction, Genes_Of_Pathway
Transcription_Unit_Transcription_Factors
Enzyme_P
 JavaCyc
interface also in progress
 http://aracyc.stanford.edu/~mueller/perlcyc/
 Lisp
API
 http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
Perlcyc and Javacyc
 Interface
SRI International
Bioinformatics
to running Pathway Tools image through
TCP
 Names are translated to Perl and Java
conventions
 Object references are supported by means of
unique frame names
SRI International
Bioinformatics
Pathway Tools API
Functions

get_class_all_instances(Class)
 Returns the instances of Class

Key Pathway Tools classes:



Genetic-Elements
Genes
Proteins



Polypeptides
Protein-Complexes
Pathways






Reactions
Compounds-And-Elements
Enzymatic-Reactions
Transcription-Units
Promoters
DNA-Binding-Sites
Pathway Tools API Functions
SRI International
Bioinformatics
 Notation
Frame.Slot means a specified slot of a
specified frame
 get_slot_value(Frame
Slot)
 Returns first value of Frame.Slot
 get_slot_values(Frame Slot)
 Returns all values of Frame.Slot
 slot_has_value_p(Frame Slot)
 Returns true if Frame.Slot has at least one value
 member_slot_value_p(Frame Slot Value)
 Returns true if Value is one of the values of Frame.Slot
Additional Pathway Tools
Functions – Semantic
Inference Layer
 Built-in
SRI International
Bioinformatics
functions encode commonly used queries
that compute indirect DB relationships
 genes_of_pathway, substrates_of_pathway
 all_transcription_factors, regulon_of_protein
 See http://bioinformatics.ai.sri.com/ptools/ptoolsfns.html for more information
Computing with Pathway
Tools:
Flat Files
SRI International
Bioinformatics
 Two
file formats: tab-delimited, attribute-value
 One file for each format, each datatype
 Specification:

http://bioinformatics.ai.sri.com/ptools/flatfile-format.html
 Examples:
Pathways.col – Pathways and genes encoding enzymes
 Enzymes.col – Enzymes and reactions they catalyze
 Pathways.dat – Full data on each pathway
 Reactions.dat – Full data on each reaction

Example Flat File
SRI International
Bioinformatics
UNIQUE-ID - P107-PWY
TYPES - Energy-Metabolism
COMMON-NAME - RuMP cycle and formaldehyde assimilation
REACTION-LIST - FORMATEDEHYDROG-RXN
REACTION-LIST - FORMALDEHYDE-DEHYDROGENASE-RXN
REACTION-LIST - 6PGLUCONDEHYDROG-RXN
REACTION-LIST - R84-RXN
REACTION-LIST - PGLUCISOM-RXN
REACTION-LIST - R12-RXN
REACTION-LIST - R10-RXN
SYNONYMS - ribulose-monophosphate cycle
SYNONYMS - formaldehyde oxidation
//
Example Flat File –
Reactions.dat
UNIQUE-ID - R84-RXN
TYPES - EC-1.1.1
EC-NUMBER - 1.1.1.IN-PATHWAY - P122-PWY
IN-PATHWAY - P107-PWY
LEFT - GLC-6-P
LEFT - NAD
OFFICIAL-EC? - NO
RIGHT - 6-P-GLUCONATE
RIGHT - NADH
RIGHT - PROTON
//
SRI International
Bioinformatics
Example Flat File –
Compounds.dat
UNIQUE-ID - GLC-6-P
TYPES - Carbohydrate-Derivatives
COMMON-NAME - glucose-6-phosphate
CAS-REGISTRY-NUMBERS - 56-73-5
CHEMICAL-FORMULA - (C 6)
CHEMICAL-FORMULA - (H 13)
CHEMICAL-FORMULA - (O 9)
CHEMICAL-FORMULA - (P 1)
MOLECULAR-WEIGHT - 260.137
SYNONYMS - D-glucose-6-P
SYNONYMS - glucose-6-P
SYNONYMS - α-D-glucose-6-phosphate
SYNONYMS - α-D-glucose-6-P
SYNONYMS - D-glucose-6-phosphate
//
SRI International
Bioinformatics
Bioinformatics Results:
Algorithms
SRI International
Bioinformatics
 Query
and visualization environment for genome
and pathway information
 PathoLogic
algorithm predicts the metabolic
network of an organism from its genome
 Algorithm
for global characterization of a
metabolic network
 Algorithms
under development for qualitative
modeling of the cell
The Pathway Tools KB
as a "virtual cell"
 Detailed
SRI International
Bioinformatics
representation of proteins, including
subunits
 Protein complexes and modifications
 Links from genome, through proteins, to
pathways and superpathways
Computing with the
Metabolic Network
SRI International
Bioinformatics
 Comparative
analysis of metabolic networks
 Visualization of expression data
 Correlation
of metabolism and transport
 Connectivity analysis of metabolic network
 Forward
propagation of metabolites
 Verification of known growth media with
metabolic network
Computational Exploration
of PGDBs
SRI International
Bioinformatics
 Infer
metabolic network from genome
 Bioinformatics 18:705 2002
 Global properties of the metabolic network
 Genome Research 10:568 2000
 Global properties of the genetic network
 Comparison
 Consistency
of whole metabolic networks
of a PGDB with respect to known
growth-media requirements
 Search for gaps in metabolic network
 Pacific Symp Biocomputing 2001:471
Example Studies


Relationship of protein subunits to gene positions
Global properties of the E. coli metabolic network
 Reactions catalyzed by more than one enzyme
 Enzymes that catalyze more than one reaction
 Reactions participating in more than one pathway



SRI International
Bioinformatics
Automatic detection of intersection points in the metabolic network
Nutrient analyses
 Forward propagation: Given a set of nutrients, what compounds will be
produced by the metabolic network?
 Backtracking: Given a forward propagation result, and a set of essential
compounds that are not included in that result, what precursors must be
supplied to produce those compounds?
Operon prediction
Protein subunits and linked
genes
 Question:
SRI International
Bioinformatics
are protein subunits coded by
neighboring genes?
 Proteins are linked to genes, gene positions are recorded in
the KB
 Procedure
 Fetch all protein complexes
 Subunits are stored in the ‘components’ slot
 Each component has a ‘gene’ slot
 Genes have ‘left-end-position’ and ‘right-end-position’ slots
 Results
 Protein subunits of >90% of heteromeric enzymes are
encoded by neighboring genes
Global properties: How
many reactions are
catalyzed by more than one
enzyme?
SRI International
Bioinformatics
 Procedure
get_class_all_instances(‘Reactions’)
 We are interested only in reactions with at least one value in
their ‘enzymatic-reaction’ slot
 result = reactions with more than one value for their
‘enzymatic-reaction’ slot
 Results
 About 10% of reactions are catalyzed by more than one
enzyme
 Two classes of multi-enzyme reactions



Homologous enzymes
“Easy” reactions
Global properties: Multifunctional
enzymes (how many enzymes
catalyze more than one reaction?)
SRI International
Bioinformatics
 Procedure
get_class_all_instances(‘Proteins’)
 result = proteins with more than one value in the ‘catalyzes’
slot
 Results
 100 out of 607 enzymes catalyze multiple reactions
 This is significantly more than predicted by genome
sequencing projects

Global properties: Reactions
in multiple pathways
SRI International
Bioinformatics
 Procedure
get_class_all_instances(‘Reactions’)
 result = reactions with more than one value in the ‘inpathway’ slot
 Significance
 Reactions that appear in multiple pathways correspond to
intersection points in the metabolic network


Could be used to identify candidate reactions for drug targets
Metabolic Overview Queries
 Species
comparison
 Highlight reactions that are



Shared/not-shared with
Any-one/All-of
A specified set of species
 Overlay
expression data
 Absolute or relative expression levels
 Reaction colors reflects expression level
SRI International
Bioinformatics
SRI International
Bioinformatics
A
E
SRI International
Bioinformatics
SRI International
Bioinformatics
C. crescentus Cell Cycle Gene
Expression
Global Consistency
Checking of Biochemical
Network
SRI International
Bioinformatics
 Given:
A PGDB for an organism
 A set of initial metabolites

 Infer:

What set of products can be synthesized by the smallmolecule metabolism of the organism
 Can
known growth medium yield known essential
compounds?
 Pacific Symposium on Biocomputing p471 2001
SRI International
Bioinformatics
Algorithm:
Forward Propagation
Nutrient
set
Products
Metabolite
set
PGDB
reaction
pool
Reactants
“Fire”
reactions
Results
SRI International
Bioinformatics
 Phase
I: Forward propagation
 21 initial compounds yielded only half of 38 essential
compounds for E. coli
 Phase
II: Manually identify
 Bugs in EcoCyc (e.g., two objects for tryptophan)
 Missing initial protein substrates (e.g., ACP)
 Missing pathways in EcoCyc
 Phase
III: Forward propagation with 11 more initial
metabolites
 Yielded all 38 essential compounds
SRI International
Bioinformatics
Initial Metabolites
(Total: 21 compounds)
Nutrients (8)
(M61 Minimal growth
medium)
Nutrients (10)
(Growth conditions)
Bootstrap Compounds
(3)
+
2+
2+
+
H , Fe , Mg , K , NH3,
22SO4 , PO4 , Glucose
Water, Oxygen, Trace
elements (Mn2+, Co2+,
Mo2+, Ca2+, Zn2+, Cd2+,
Ni2+, Cu2+)
ATP, NADP, CoA
SRI International
Bioinformatics
Nutrient-Related Analysis:
Validation of the EcoCyc Database
Results on EcoCyc:
Phase I:
• Essential compounds
• produced
• not produced
19
19
• Total compounds
• produced:
(28%)
• Reactions
• Fired
(31%)
Missing Essential Compounds
Due To
 Bugs
SRI International
Bioinformatics
in EcoCyc
 Narrow
conceptualization of the problem
 Protein substrates
 Incomplete
biochemical knowledge
SRI International
Bioinformatics
Nutrient-Related Analysis:
Validation of the EcoCyc Database
Results on EcoCyc:
Phase II (After adding 11 extra metabolites):
• Essential compounds
• produced
• not produced
• Total compounds
• produced:
• not produced:
• Reactions
• Fired
• Not fired
38
0
(49%)
(51%)
(58%)
(42%)
Operon Prediction
 Based
SRI International
Bioinformatics
on the method of Moreno-Hagelsieb et al.
Bioinformatics 18 Suppl. 1 (2002)
 Distance between genes
 Functional classification
 Correctly predicts 75% of transcription units, 65% of operons
 Additional information available in PGDB
 Pathways
 Protein complexes
 Transporters
 Improved prediction performance: 80% of transcription units,
69% of operons
 Detailed paper in preparation
Visualization of Genetic
Network
SRI International
Bioinformatics
 Operon
display window
 Transcription factor display window
 Highlight regulon on Overview diagram
 Paint expression data onto Overview diagram
 Database adapter mechanism: MAGE-ML intermediate form

Adapter defined for SMD
Animation
 User specified mapping of color ranges
 Import of SAM files (next release)


List of significantly +/- genes
 Display
full genetic network (later release)
SRI International
Bioinformatics
Acknowledgements
SRI
Peter Karp, Suzanne Paley,
Pedro Romero, John Pick,
Randy Gobbel, Cindy Krieger,
Martha Arnaud
EcoCyc Project
 Julio Collado-Vides, Ian
Paulsen, Monica Riley, Milton
Saier
MetaCyc Project
 Sue Rhee, Lukas Mueller,
Peifen Zhang, Chris Somerville
Stanford
 Gary Schoolnik, Harley
McAdams, Lucy Shapiro, Russ
Altman, Iwei Yeh

Funding
sources:
 NIH National Center for
Research Resources
 NIH National Institute of
General Medical
Sciences
 NIH National Human
Genome Research
Institute
 Department of Energy
Microbial Cell Project
 DARPA BioSpice, UPC
BioCyc.org