2011-06-Stanford-Syn-Biox

Download Report

Transcript 2011-06-Stanford-Syn-Biox

EcoCyc, MetaCyc, and the
Pathway Tools Software
Peter D. Karp, Ph.D.
Bioinformatics Research Group
SRI International
[email protected]
http://www.ai.sri.com/pkarp/talks/
BioCyc.org
EcoCyc.org, MetaCyc.org
1
SRI International Bioinformatics
MetaCyc Family of
Pathway/Genome Databases
 1,700+
databases from multiple institutions
 Cover all domains of life with microbial emphasis
 All
DBs derived from MetaCyc via computational
pathway prediction
 Common
schema
 Common controlled
vocabularies
 Common methodologies
Archives of Toxicology 2011
2
SRI International Bioinformatics
Curated Databases Within the MetaCyc
Family
3
Database
Organism
Organization
Curated From
MetaCyc
Multiorganism
SRI
26,000
EcoCyc
E. coli
SRI
21,000
HumanCyc
H. sapiens
SRI
AraCyc
A. thaliana
Carnegie Instit.
2,282
YeastCyc
S. cerevisiae
Stanford Univ
565
MouseCyc
M. musculus
Jackson Labs
SRI International Bioinformatics
BioCyc Collection of 1,100
Pathway/Genome Databases
Database (PGDB) –
combines information about
 Pathways, reactions, substrates
 Enzymes, transporters
 Genes, replicons
 Transcription factors/sites, promoters,
operons
Pathway/Genome
Tier
1: Literature-Derived PGDBs
 MetaCyc
 EcoCyc -- Escherichia coli K-12
Tier
2: Computationally-derived DBs,
Some Curation -- 28 PGDBs
 HumanCyc, BsubCyc
 Mycobacterium tuberculosis
Tier
3: Computationally-derived DBs,
No Curation -- The remainder
4
SRI International Bioinformatics
EcoCyc Project – EcoCyc.org

E. coli Encyclopedia
 Review-level Model-Organism Database for E. coli
 Tracks evolving annotation of the E. coli genome and cellular networks
 The two paradigms of EcoCyc

“Multi-dimensional annotation of the E. coli K-12 genome”
 Positions of genes; functions of gene products – 76% / 66% exp
 Gene Ontology terms; MultiFun terms
 Gene product summaries and literature citations
 Evidence codes
 Multimeric complexes
 Metabolic pathways
 Regulation of gene expression and of protein activity
Karp, Gunsalus, Collado-Vides, Paulsen
Nuc. Acids Res. 35:7577 2007
5
ASM News 70:25 2004
Science 293:2040
SRI International Bioinformatics
URL: EcoCyc.org
EcoCyc = E.coli Dataset +
Pathway/Genome Navigator
Pathways: 260
EcoCyc v15.0
Reactions:
Metabolic: 1446
Transport: 287
Compounds: 1,830
Citations: 21,000
Proteins: 4,479
Complexes: 895
RNAs: 285
Genes: 4,489
6
Regulation:
Operons: 3,409
Trans Factors: 206
Promoters: 1,878
TF Binding Sites: 2,394
Reg Interactions: 5345
SRI International Bioinformatics
EcoCyc on the iPhone
7
SRI International Bioinformatics
EcoCyc on the iPhone
8
SRI International Bioinformatics
PortEco.org
 EcoCyc
+ PortEco = E. coli model-organism
database
 Query
multiple E. coli databases simultaneously
 E. coli gene expression archive
 E. coli Wiki
 ~40 E. coli and Shigella databases available at
BioCyc.org
9
SRI International Bioinformatics
MetaCyc: Metabolic Encyclopedia


Describe a representative sample of every experimentally
determined metabolic pathway
Describe properties of metabolic enzymes

Literature-based DB with extensive references and
commentary
Pathways, reactions, enzymes, substrates

MetaCyc vs BioCyc: Experimentally elucidated pathways

Jointly developed by
 P. Karp, R. Caspi, C. Fulcher, SRI International
 L. Mueller, A. Pujar, Boyce Thompson Institute
 S. Rhee, P. Zhang, Carnegie Institution

Nucleic Acids Research 2010
10
SRI International Bioinformatics
Applications of MetaCyc
 Reference
source on metabolic pathways and
enzymes
 Predict
pathways from genomes
 Metabolic
engineering
 Find desired metabolic pathways and reactions
 Find enzymes with desired activities, regulatory properties
 Determine cofactor requirements
11
SRI International Bioinformatics
MetaCyc Data -- Version 15.4
12
Pathways
1,747
Reactions
9,460
Enzymes
7,424
Small Molecules
9,188
Organisms
2,170
Citations
29,900
SRI International Bioinformatics
Pathway Tools Software
16
SRI International Bioinformatics
Pathway Tools Software
Annotated
Genome
Genome-Scale
Flux Model
+
PathoLogic
Pathway/Genome
Database
Pathway/Genome
Navigator
Pathway/Genome
Editors
Briefings in Bioinformatics 11:40-79 2010
17
SRI International Bioinformatics
Pathway Tools Software: PathoLogic
 Computational
creation of new Pathway/Genome
Databases
 Transforms
genome into Pathway Tools schema
and layers inferred information above the genome
 Predicts
operons
 Predicts metabolic network
 Predicts which genes code for missing enzymes
in metabolic pathways
 Infers transport reactions from transporter names
18
SRI International Bioinformatics
Pathway Tools Software:
Pathway/Genome Editors

Interactively update PGDBs
with graphical editors

Support geographically
distributed teams of
curators with object
database system

Gene and protein editor
Reaction editor
Compound editor
Pathway editor
Operon editor
Publication editor





19
SRI International Bioinformatics
Pathway Tools Software:
Pathway/Genome Navigator

Querying and visualization of:
 Pathways
 Reactions
 Metabolites
 Genes/Proteins/RNA
 Regulatory interactions
 Chromosomes

Two modes of operation:
 Web mode
 Desktop mode
 Most functionality shared, but each
has unique functionality
20
SRI International Bioinformatics
Cellular Overview Diagram
 Combines
metabolic map and transporters
 Automatically generated for each organism
 Zoomable, queryable
 Web-based and desktop
 BioCyc.org
Tools  Cellular Overview
 Tools  Regulatory Overview
 Fastest with Safari, Chrome, Firefox

23
SRI International Bioinformatics
24
SRI International Bioinformatics
25
SRI International Bioinformatics
26
SRI International Bioinformatics
Omics Data Graphing on Cellular Overview
27
SRI International Bioinformatics
28
SRI International Bioinformatics
29
SRI International Bioinformatics
Genome Overview
30
SRI International Bioinformatics
Genome Poster
31
SRI International Bioinformatics
Regulatory Overview and Omics Viewer
 Show
regulatory relationships among gene
groups
32
SRI International Bioinformatics
Genome Browser
ChIP-Chip Data Shown in Graph Track
33
SRI International Bioinformatics
Enrichment Analysis
“My experiments yielded a set of genes/metabolites.
What do they have in common?”
 Given
a set of genes:
 What GO terms are statistically over-represented in that set?
 What metabolic pathways are over-represented?
 What transcriptional regulators are over-represented?
 Given
a set of metabolites:
 What metabolic pathways are statistically over-represented in
that set?
34
SRI International Bioinformatics
Automated Generation of
Metabolic Flux Models from
PGDBs
Joint work with Mario Latendresse
35
SRI International Bioinformatics
Goals
 Decrease
the time required to construct FBA
models from 9-12 months to several weeks
 Create
richer FBA models that are tightly coupled
to genome and regulatory information
 Make
36
FBA models and results more transparent
SRI International Bioinformatics
Approach: Derive FBA Models from
PGDBs






37
Store and update metabolic model within Pathway Tools
Export to constraint solver for model execution/solving
Fast generation of metabolic model from annotated genome
Pathway Tools schema
 Associate a wealth of information with each metabolic model
 Unique identifiers and controlled vocabulary for model components
Tools for querying and visualization of metabolic models
Tools for model debugging and analysis
 Reaction balance checking
 Dead-end metabolite analysis
 Visualize reaction flux using cellular overview
 Multiple gap filling
SRI International Bioinformatics
FBA Model Execution
 Runs
SCIP solver on .lp file
 Konrad-Zuse-Zentrum für Informationstechnik Berlin
 Interpret
SCIP output
 Determine if SCIP found a solution
 Map fluxes to PGDB reactions
 Display
40
resulting fluxes on the Cellular Overview
SRI International Bioinformatics
Model Debugging via Multiple Gap
Filling
 Most
FBA models are not initially solvable
because of incomplete or incorrect information
 Use
meta-optimization to postulate alterations to a
model to render it solvable
 Each
alteration has an associated cost; minimize
cost of alterations
 Formulate
41
as MILP and submit to SCIP
SRI International Bioinformatics
Multiple Gap Filling of FBA Models
 Reaction
gap filling (Kumar et al, BMC Bioinf 2007 8:212):
 Reverse directionality of selected reactions
 Add a minimal number of reactions from MetaCyc to the
model to enable a solution
 Reaction cost is a function of reaction taxonomic range
 Metabolite
gap filling: Postulate additional
nutrients and secretions
 Partial solutions: Identify maximal subset of
biomass components for which model can yield
positive production rates
42
SRI International Bioinformatics
46
SRI International Bioinformatics
Comparative Analysis

Via Cellular Overview

Comparative genome browser

Comparative pathway table

Comparative analysis reports
 Compare reaction complements
 Compare pathway complements
 Compare transporter complements
47
SRI International Bioinformatics
Advanced Query Form
 Intuitive
construction of complex database
queries of SQL power
48
SRI International Bioinformatics
Work in Progress
 Computation
of reaction atom mappings
 Program
to generate metabolic pathways that
synthesize target compound from feedstock
compound
49
SRI International Bioinformatics
How to Learn More
 BioCyc.org
Help menu
 BioCyc
Webinars
 Biocyc.org/webinar.shtml
 Publications
page
 Biocyc.org/publications.shtml
 Tutorials
held at SRI
 Next week: FBA
50
SRI International Bioinformatics