Transcript General
Computational Exploration
of Metabolic Networks with
Pathway Tools
Part 2: APIs & Examples
Randy Gobbel, Ph.D.
Bioinformatics Research Group
SRI International
[email protected]
http://BioCyc.org/
Computing with Pathway
Tools: APIs
Generic
SRI International
Bioinformatics
functions with a consistent naming
scheme
Basic frame access functions
Built-in functions for analysis and global statistics
Simultaneous access to multiple KBs
Cross-species comparisons
Specialized KBs
MetaCyc
SchemaBase
Computing with Pathway
Tools: APIs
SRI International
Bioinformatics
PerlCyc
interface
Library of Perl functions for querying PGDBs via socket
connection
Database access functions
Select_Organism, All_Pathways
Functions for performing inference / hardwired queries
Genes_Of_Reaction, Genes_Of_Pathway
Transcription_Unit_Transcription_Factors
Enzyme_P
JavaCyc
interface also in progress
http://aracyc.stanford.edu/~mueller/perlcyc/
Lisp
API
http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
Perlcyc and Javacyc
Interface
SRI International
Bioinformatics
to running Pathway Tools image through
TCP
Names are translated to Perl and Java
conventions
Object references are supported by means of
unique frame names
SRI International
Bioinformatics
Pathway Tools API
Functions
get_class_all_instances(Class)
Returns the instances of Class
Key Pathway Tools classes:
Genetic-Elements
Genes
Proteins
Polypeptides
Protein-Complexes
Pathways
Reactions
Compounds-And-Elements
Enzymatic-Reactions
Transcription-Units
Promoters
DNA-Binding-Sites
Pathway Tools API Functions
SRI International
Bioinformatics
Notation
Frame.Slot means a specified slot of a
specified frame
get_slot_value(Frame
Slot)
Returns first value of Frame.Slot
get_slot_values(Frame Slot)
Returns all values of Frame.Slot
slot_has_value_p(Frame Slot)
Returns true if Frame.Slot has at least one value
member_slot_value_p(Frame Slot Value)
Returns true if Value is one of the values of Frame.Slot
Additional Pathway Tools
Functions – Semantic
Inference Layer
Built-in
SRI International
Bioinformatics
functions encode commonly used queries
that compute indirect DB relationships
genes_of_pathway, substrates_of_pathway
all_transcription_factors, regulon_of_protein
See http://bioinformatics.ai.sri.com/ptools/ptoolsfns.html for more information
Computing with Pathway
Tools:
Flat Files
SRI International
Bioinformatics
Two
file formats: tab-delimited, attribute-value
One file for each format, each datatype
Specification:
http://bioinformatics.ai.sri.com/ptools/flatfile-format.html
Examples:
Pathways.col – Pathways and genes encoding enzymes
Enzymes.col – Enzymes and reactions they catalyze
Pathways.dat – Full data on each pathway
Reactions.dat – Full data on each reaction
Example Flat File
SRI International
Bioinformatics
UNIQUE-ID - P107-PWY
TYPES - Energy-Metabolism
COMMON-NAME - RuMP cycle and formaldehyde assimilation
REACTION-LIST - FORMATEDEHYDROG-RXN
REACTION-LIST - FORMALDEHYDE-DEHYDROGENASE-RXN
REACTION-LIST - 6PGLUCONDEHYDROG-RXN
REACTION-LIST - R84-RXN
REACTION-LIST - PGLUCISOM-RXN
REACTION-LIST - R12-RXN
REACTION-LIST - R10-RXN
SYNONYMS - ribulose-monophosphate cycle
SYNONYMS - formaldehyde oxidation
//
Example Flat File –
Reactions.dat
UNIQUE-ID - R84-RXN
TYPES - EC-1.1.1
EC-NUMBER - 1.1.1.IN-PATHWAY - P122-PWY
IN-PATHWAY - P107-PWY
LEFT - GLC-6-P
LEFT - NAD
OFFICIAL-EC? - NO
RIGHT - 6-P-GLUCONATE
RIGHT - NADH
RIGHT - PROTON
//
SRI International
Bioinformatics
Example Flat File –
Compounds.dat
UNIQUE-ID - GLC-6-P
TYPES - Carbohydrate-Derivatives
COMMON-NAME - glucose-6-phosphate
CAS-REGISTRY-NUMBERS - 56-73-5
CHEMICAL-FORMULA - (C 6)
CHEMICAL-FORMULA - (H 13)
CHEMICAL-FORMULA - (O 9)
CHEMICAL-FORMULA - (P 1)
MOLECULAR-WEIGHT - 260.137
SYNONYMS - D-glucose-6-P
SYNONYMS - glucose-6-P
SYNONYMS - α-D-glucose-6-phosphate
SYNONYMS - α-D-glucose-6-P
SYNONYMS - D-glucose-6-phosphate
//
SRI International
Bioinformatics
Bioinformatics Results:
Algorithms
SRI International
Bioinformatics
Query
and visualization environment for genome
and pathway information
PathoLogic
algorithm predicts the metabolic
network of an organism from its genome
Algorithm
for global characterization of a
metabolic network
Algorithms
under development for qualitative
modeling of the cell
The Pathway Tools KB
as a "virtual cell"
Detailed
SRI International
Bioinformatics
representation of proteins, including
subunits
Protein complexes and modifications
Links from genome, through proteins, to
pathways and superpathways
Computing with the
Metabolic Network
SRI International
Bioinformatics
Comparative
analysis of metabolic networks
Visualization of expression data
Correlation
of metabolism and transport
Connectivity analysis of metabolic network
Forward
propagation of metabolites
Verification of known growth media with
metabolic network
Computational Exploration
of PGDBs
SRI International
Bioinformatics
Infer
metabolic network from genome
Bioinformatics 18:705 2002
Global properties of the metabolic network
Genome Research 10:568 2000
Global properties of the genetic network
Comparison
Consistency
of whole metabolic networks
of a PGDB with respect to known
growth-media requirements
Search for gaps in metabolic network
Pacific Symp Biocomputing 2001:471
Example Studies
Relationship of protein subunits to gene positions
Global properties of the E. coli metabolic network
Reactions catalyzed by more than one enzyme
Enzymes that catalyze more than one reaction
Reactions participating in more than one pathway
SRI International
Bioinformatics
Automatic detection of intersection points in the metabolic network
Nutrient analyses
Forward propagation: Given a set of nutrients, what compounds will be
produced by the metabolic network?
Backtracking: Given a forward propagation result, and a set of essential
compounds that are not included in that result, what precursors must be
supplied to produce those compounds?
Operon prediction
Protein subunits and linked
genes
Question:
SRI International
Bioinformatics
are protein subunits coded by
neighboring genes?
Proteins are linked to genes, gene positions are recorded in
the KB
Procedure
Fetch all protein complexes
Subunits are stored in the ‘components’ slot
Each component has a ‘gene’ slot
Genes have ‘left-end-position’ and ‘right-end-position’ slots
Results
Protein subunits of >90% of heteromeric enzymes are
encoded by neighboring genes
Global properties: How
many reactions are
catalyzed by more than one
enzyme?
SRI International
Bioinformatics
Procedure
get_class_all_instances(‘Reactions’)
We are interested only in reactions with at least one value in
their ‘enzymatic-reaction’ slot
result = reactions with more than one value for their
‘enzymatic-reaction’ slot
Results
About 10% of reactions are catalyzed by more than one
enzyme
Two classes of multi-enzyme reactions
Homologous enzymes
“Easy” reactions
Global properties: Multifunctional
enzymes (how many enzymes
catalyze more than one reaction?)
SRI International
Bioinformatics
Procedure
get_class_all_instances(‘Proteins’)
result = proteins with more than one value in the ‘catalyzes’
slot
Results
100 out of 607 enzymes catalyze multiple reactions
This is significantly more than predicted by genome
sequencing projects
Global properties: Reactions
in multiple pathways
SRI International
Bioinformatics
Procedure
get_class_all_instances(‘Reactions’)
result = reactions with more than one value in the ‘inpathway’ slot
Significance
Reactions that appear in multiple pathways correspond to
intersection points in the metabolic network
Could be used to identify candidate reactions for drug targets
Metabolic Overview Queries
Species
comparison
Highlight reactions that are
Shared/not-shared with
Any-one/All-of
A specified set of species
Overlay
expression data
Absolute or relative expression levels
Reaction colors reflects expression level
SRI International
Bioinformatics
SRI International
Bioinformatics
A
E
SRI International
Bioinformatics
SRI International
Bioinformatics
C. crescentus Cell Cycle Gene
Expression
Global Consistency
Checking of Biochemical
Network
SRI International
Bioinformatics
Given:
A PGDB for an organism
A set of initial metabolites
Infer:
What set of products can be synthesized by the smallmolecule metabolism of the organism
Can
known growth medium yield known essential
compounds?
Pacific Symposium on Biocomputing p471 2001
SRI International
Bioinformatics
Algorithm:
Forward Propagation
Nutrient
set
Products
Metabolite
set
PGDB
reaction
pool
Reactants
“Fire”
reactions
Results
SRI International
Bioinformatics
Phase
I: Forward propagation
21 initial compounds yielded only half of 38 essential
compounds for E. coli
Phase
II: Manually identify
Bugs in EcoCyc (e.g., two objects for tryptophan)
Missing initial protein substrates (e.g., ACP)
Missing pathways in EcoCyc
Phase
III: Forward propagation with 11 more initial
metabolites
Yielded all 38 essential compounds
SRI International
Bioinformatics
Initial Metabolites
(Total: 21 compounds)
Nutrients (8)
(M61 Minimal growth
medium)
Nutrients (10)
(Growth conditions)
Bootstrap Compounds
(3)
+
2+
2+
+
H , Fe , Mg , K , NH3,
22SO4 , PO4 , Glucose
Water, Oxygen, Trace
elements (Mn2+, Co2+,
Mo2+, Ca2+, Zn2+, Cd2+,
Ni2+, Cu2+)
ATP, NADP, CoA
SRI International
Bioinformatics
Nutrient-Related Analysis:
Validation of the EcoCyc Database
Results on EcoCyc:
Phase I:
• Essential compounds
• produced
• not produced
19
19
• Total compounds
• produced:
(28%)
• Reactions
• Fired
(31%)
Missing Essential Compounds
Due To
Bugs
SRI International
Bioinformatics
in EcoCyc
Narrow
conceptualization of the problem
Protein substrates
Incomplete
biochemical knowledge
SRI International
Bioinformatics
Nutrient-Related Analysis:
Validation of the EcoCyc Database
Results on EcoCyc:
Phase II (After adding 11 extra metabolites):
• Essential compounds
• produced
• not produced
• Total compounds
• produced:
• not produced:
• Reactions
• Fired
• Not fired
38
0
(49%)
(51%)
(58%)
(42%)
Operon Prediction
Based
SRI International
Bioinformatics
on the method of Moreno-Hagelsieb et al.
Bioinformatics 18 Suppl. 1 (2002)
Distance between genes
Functional classification
Correctly predicts 75% of transcription units, 65% of operons
Additional information available in PGDB
Pathways
Protein complexes
Transporters
Improved prediction performance: 80% of transcription units,
69% of operons
Detailed paper in preparation
Visualization of Genetic
Network
SRI International
Bioinformatics
Operon
display window
Transcription factor display window
Highlight regulon on Overview diagram
Paint expression data onto Overview diagram
Database adapter mechanism: MAGE-ML intermediate form
Adapter defined for SMD
Animation
User specified mapping of color ranges
Import of SAM files (next release)
List of significantly +/- genes
Display
full genetic network (later release)
SRI International
Bioinformatics
Acknowledgements
SRI
Peter Karp, Suzanne Paley,
Pedro Romero, John Pick,
Randy Gobbel, Cindy Krieger,
Martha Arnaud
EcoCyc Project
Julio Collado-Vides, Ian
Paulsen, Monica Riley, Milton
Saier
MetaCyc Project
Sue Rhee, Lukas Mueller,
Peifen Zhang, Chris Somerville
Stanford
Gary Schoolnik, Harley
McAdams, Lucy Shapiro, Russ
Altman, Iwei Yeh
Funding
sources:
NIH National Center for
Research Resources
NIH National Institute of
General Medical
Sciences
NIH National Human
Genome Research
Institute
Department of Energy
Microbial Cell Project
DARPA BioSpice, UPC
BioCyc.org