2011-06-Stanford-Syn-Biox
Download
Report
Transcript 2011-06-Stanford-Syn-Biox
EcoCyc, MetaCyc, and the
Pathway Tools Software
Peter D. Karp, Ph.D.
Bioinformatics Research Group
SRI International
[email protected]
http://www.ai.sri.com/pkarp/talks/
BioCyc.org
EcoCyc.org, MetaCyc.org
1
SRI International Bioinformatics
MetaCyc Family of
Pathway/Genome Databases
1,700+
databases from multiple institutions
Cover all domains of life with microbial emphasis
All
DBs derived from MetaCyc via computational
pathway prediction
Common
schema
Common controlled
vocabularies
Common methodologies
Archives of Toxicology 2011
2
SRI International Bioinformatics
Curated Databases Within the MetaCyc
Family
3
Database
Organism
Organization
Curated From
MetaCyc
Multiorganism
SRI
26,000
EcoCyc
E. coli
SRI
21,000
HumanCyc
H. sapiens
SRI
AraCyc
A. thaliana
Carnegie Instit.
2,282
YeastCyc
S. cerevisiae
Stanford Univ
565
MouseCyc
M. musculus
Jackson Labs
SRI International Bioinformatics
BioCyc Collection of 1,100
Pathway/Genome Databases
Database (PGDB) –
combines information about
Pathways, reactions, substrates
Enzymes, transporters
Genes, replicons
Transcription factors/sites, promoters,
operons
Pathway/Genome
Tier
1: Literature-Derived PGDBs
MetaCyc
EcoCyc -- Escherichia coli K-12
Tier
2: Computationally-derived DBs,
Some Curation -- 28 PGDBs
HumanCyc, BsubCyc
Mycobacterium tuberculosis
Tier
3: Computationally-derived DBs,
No Curation -- The remainder
4
SRI International Bioinformatics
EcoCyc Project – EcoCyc.org
E. coli Encyclopedia
Review-level Model-Organism Database for E. coli
Tracks evolving annotation of the E. coli genome and cellular networks
The two paradigms of EcoCyc
“Multi-dimensional annotation of the E. coli K-12 genome”
Positions of genes; functions of gene products – 76% / 66% exp
Gene Ontology terms; MultiFun terms
Gene product summaries and literature citations
Evidence codes
Multimeric complexes
Metabolic pathways
Regulation of gene expression and of protein activity
Karp, Gunsalus, Collado-Vides, Paulsen
Nuc. Acids Res. 35:7577 2007
5
ASM News 70:25 2004
Science 293:2040
SRI International Bioinformatics
URL: EcoCyc.org
EcoCyc = E.coli Dataset +
Pathway/Genome Navigator
Pathways: 260
EcoCyc v15.0
Reactions:
Metabolic: 1446
Transport: 287
Compounds: 1,830
Citations: 21,000
Proteins: 4,479
Complexes: 895
RNAs: 285
Genes: 4,489
6
Regulation:
Operons: 3,409
Trans Factors: 206
Promoters: 1,878
TF Binding Sites: 2,394
Reg Interactions: 5345
SRI International Bioinformatics
EcoCyc on the iPhone
7
SRI International Bioinformatics
EcoCyc on the iPhone
8
SRI International Bioinformatics
PortEco.org
EcoCyc
+ PortEco = E. coli model-organism
database
Query
multiple E. coli databases simultaneously
E. coli gene expression archive
E. coli Wiki
~40 E. coli and Shigella databases available at
BioCyc.org
9
SRI International Bioinformatics
MetaCyc: Metabolic Encyclopedia
Describe a representative sample of every experimentally
determined metabolic pathway
Describe properties of metabolic enzymes
Literature-based DB with extensive references and
commentary
Pathways, reactions, enzymes, substrates
MetaCyc vs BioCyc: Experimentally elucidated pathways
Jointly developed by
P. Karp, R. Caspi, C. Fulcher, SRI International
L. Mueller, A. Pujar, Boyce Thompson Institute
S. Rhee, P. Zhang, Carnegie Institution
Nucleic Acids Research 2010
10
SRI International Bioinformatics
Applications of MetaCyc
Reference
source on metabolic pathways and
enzymes
Predict
pathways from genomes
Metabolic
engineering
Find desired metabolic pathways and reactions
Find enzymes with desired activities, regulatory properties
Determine cofactor requirements
11
SRI International Bioinformatics
MetaCyc Data -- Version 15.4
12
Pathways
1,747
Reactions
9,460
Enzymes
7,424
Small Molecules
9,188
Organisms
2,170
Citations
29,900
SRI International Bioinformatics
Pathway Tools Software
16
SRI International Bioinformatics
Pathway Tools Software
Annotated
Genome
Genome-Scale
Flux Model
+
PathoLogic
Pathway/Genome
Database
Pathway/Genome
Navigator
Pathway/Genome
Editors
Briefings in Bioinformatics 11:40-79 2010
17
SRI International Bioinformatics
Pathway Tools Software: PathoLogic
Computational
creation of new Pathway/Genome
Databases
Transforms
genome into Pathway Tools schema
and layers inferred information above the genome
Predicts
operons
Predicts metabolic network
Predicts which genes code for missing enzymes
in metabolic pathways
Infers transport reactions from transporter names
18
SRI International Bioinformatics
Pathway Tools Software:
Pathway/Genome Editors
Interactively update PGDBs
with graphical editors
Support geographically
distributed teams of
curators with object
database system
Gene and protein editor
Reaction editor
Compound editor
Pathway editor
Operon editor
Publication editor
19
SRI International Bioinformatics
Pathway Tools Software:
Pathway/Genome Navigator
Querying and visualization of:
Pathways
Reactions
Metabolites
Genes/Proteins/RNA
Regulatory interactions
Chromosomes
Two modes of operation:
Web mode
Desktop mode
Most functionality shared, but each
has unique functionality
20
SRI International Bioinformatics
Cellular Overview Diagram
Combines
metabolic map and transporters
Automatically generated for each organism
Zoomable, queryable
Web-based and desktop
BioCyc.org
Tools Cellular Overview
Tools Regulatory Overview
Fastest with Safari, Chrome, Firefox
23
SRI International Bioinformatics
24
SRI International Bioinformatics
25
SRI International Bioinformatics
26
SRI International Bioinformatics
Omics Data Graphing on Cellular Overview
27
SRI International Bioinformatics
28
SRI International Bioinformatics
29
SRI International Bioinformatics
Genome Overview
30
SRI International Bioinformatics
Genome Poster
31
SRI International Bioinformatics
Regulatory Overview and Omics Viewer
Show
regulatory relationships among gene
groups
32
SRI International Bioinformatics
Genome Browser
ChIP-Chip Data Shown in Graph Track
33
SRI International Bioinformatics
Enrichment Analysis
“My experiments yielded a set of genes/metabolites.
What do they have in common?”
Given
a set of genes:
What GO terms are statistically over-represented in that set?
What metabolic pathways are over-represented?
What transcriptional regulators are over-represented?
Given
a set of metabolites:
What metabolic pathways are statistically over-represented in
that set?
34
SRI International Bioinformatics
Automated Generation of
Metabolic Flux Models from
PGDBs
Joint work with Mario Latendresse
35
SRI International Bioinformatics
Goals
Decrease
the time required to construct FBA
models from 9-12 months to several weeks
Create
richer FBA models that are tightly coupled
to genome and regulatory information
Make
36
FBA models and results more transparent
SRI International Bioinformatics
Approach: Derive FBA Models from
PGDBs
37
Store and update metabolic model within Pathway Tools
Export to constraint solver for model execution/solving
Fast generation of metabolic model from annotated genome
Pathway Tools schema
Associate a wealth of information with each metabolic model
Unique identifiers and controlled vocabulary for model components
Tools for querying and visualization of metabolic models
Tools for model debugging and analysis
Reaction balance checking
Dead-end metabolite analysis
Visualize reaction flux using cellular overview
Multiple gap filling
SRI International Bioinformatics
FBA Model Execution
Runs
SCIP solver on .lp file
Konrad-Zuse-Zentrum für Informationstechnik Berlin
Interpret
SCIP output
Determine if SCIP found a solution
Map fluxes to PGDB reactions
Display
40
resulting fluxes on the Cellular Overview
SRI International Bioinformatics
Model Debugging via Multiple Gap
Filling
Most
FBA models are not initially solvable
because of incomplete or incorrect information
Use
meta-optimization to postulate alterations to a
model to render it solvable
Each
alteration has an associated cost; minimize
cost of alterations
Formulate
41
as MILP and submit to SCIP
SRI International Bioinformatics
Multiple Gap Filling of FBA Models
Reaction
gap filling (Kumar et al, BMC Bioinf 2007 8:212):
Reverse directionality of selected reactions
Add a minimal number of reactions from MetaCyc to the
model to enable a solution
Reaction cost is a function of reaction taxonomic range
Metabolite
gap filling: Postulate additional
nutrients and secretions
Partial solutions: Identify maximal subset of
biomass components for which model can yield
positive production rates
42
SRI International Bioinformatics
46
SRI International Bioinformatics
Comparative Analysis
Via Cellular Overview
Comparative genome browser
Comparative pathway table
Comparative analysis reports
Compare reaction complements
Compare pathway complements
Compare transporter complements
47
SRI International Bioinformatics
Advanced Query Form
Intuitive
construction of complex database
queries of SQL power
48
SRI International Bioinformatics
Work in Progress
Computation
of reaction atom mappings
Program
to generate metabolic pathways that
synthesize target compound from feedstock
compound
49
SRI International Bioinformatics
How to Learn More
BioCyc.org
Help menu
BioCyc
Webinars
Biocyc.org/webinar.shtml
Publications
page
Biocyc.org/publications.shtml
Tutorials
held at SRI
Next week: FBA
50
SRI International Bioinformatics