get-slot-values(Frame Slot) - Bioinformatics Research Group at SRI
Download
Report
Transcript get-slot-values(Frame Slot) - Bioinformatics Research Group at SRI
The Pathway Tools Schema
Motivations for Understanding
Schema
SRI International
Bioinformatics
Pathway
Tools visualizations and analyses
depend upon the software being able to find
precise information in precise places within a
Pathway/Genome DB
When
writing complex queries to PGDBs, those
queries must name classes and slots within the
schema
A
Pathway/Genome Database is a web of
interconnected objects; each object represents a
biological entity
Reference
Pathway
SRI International
Bioinformatics
Tools User’s Guide, Volume I
Appendix A: Guide to the Pathway Tools Schema
SRI International
Bioinformatics
Web of Relationships for One Enzyme
TCA Cycle
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
Sdh-flavo
Sdh-Fe-S
Sdh-membrane-1
Sdh-membrane-2
sdhA
sdhB
sdhC
sdhD
Frame Data Model
Frame
SRI International
Bioinformatics
Data Model -- organizational structure for a
PGDB
Knowledge
base (KB, Database, DB)
Frames
Slots
Facets
Annotations
Knowledge Base
Collection
SRI International
Bioinformatics
of frames and their associated slots,
values, facets, and annotations
AKA: Database, PGDB
Can
be stored within
An Oracle DB
A disk file
A Pathway Tools binary program
Frames
SRI International
Bioinformatics
Entities with which facts are associated
Kinds of frames:
Classes: Genes, Pathways, Biosynthetic Pathways
Instances (objects): trpA, TCA cycle
Classes:
Superclass(es)
Subclass(es)
Instance(s)
A symbolic frame name (id, key) uniquely identifies each
frame
Frame IDs
Naming
SRI International
Bioinformatics
conventions for frame IDs
Uniqueness of frame IDs
Frame IDs must be unique within a PGDB
Goal: Same frame ID within different PGDBs should refer to
the same biological entity
Because many frames are imported from MetaCyc, this helps
ensure consistency of frame names
Frame IDs for newly created frames (not imported) are
generated by Pathway Tools
Those frame IDs contain a PGDB-specific identifier
Example: CPLXzz-nnnn CPLXB3-0035
Slots
SRI International
Bioinformatics
Encode
attributes/properties of a frame
Integer, real number, string, symbols
Represent
relationships between frames
The value of a slot is the identifier of another frame
Every
slot is described by a “slot frame” in a KB
that defines meta information about that slot
SRI International
Bioinformatics
Slot Links
TCA Cycle
in-pathway
Succinate + FAD = fumarate + FADH2
reaction
Enzymatic-reaction
catalyzes
Succinate dehydrogenase
component-of
Sdh-flavo
Sdh-Fe-S
Sdh-membrane-1
Sdh-membrane-2
product
sdhA
sdhB
sdhC
sdhD
Slots
SRI International
Bioinformatics
Number
of values
Single valued
Multivalued: sets, bags
Slot
values
Any LISP object: Integer, real, string, symbol (frame name)
Slotunits
define properties of slots: datatypes,
classes, constraints
Two
slots are inverses if they encode opposite
relationships
Slot Product in class Genes
SRI International
Bioinformatics
Representation of Function
TCA Cycle
EC#
Keq
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
Cofactors
Inhibitors
Molecular wt
pI
Sdh-flavo
Sdh-Fe-S
Sdh-membrane-1
Sdh-membrane-2
sdhA
sdhB
sdhC
sdhD
Left-end-position
Monofunctional Monomer
Pathway
Reaction
Enzymatic-reaction
Monomer
Gene
SRI International
Bioinformatics
SRI International
Bioinformatics
Bifunctional Monomer
Pathway
Reaction
Reaction
Enzymatic-reaction
Enzymatic-reaction
Monomer
Gene
Monofunctional Multimer
SRI International
Bioinformatics
Pathway
Reaction
Enzymatic-reaction
Multimer
Monomer
Monomer
Monomer
Monomer
Gene
Gene
Gene
Gene
Pathway and Substrates
Reactant-1
Pathway
left
in-pathway
Reactant-2
Reaction
Product-1
Product-2
SRI International
Bioinformatics
right
Reaction
Reaction
Reaction
Transcriptional Regulation
trp
apoTrpR
trpLEDCBA
Int005
site001
Int001
pro001
Int003
trpL
trpE
trpD
trpC
trpB
trpA
SRI International
Bioinformatics
TrpR*trp
RpoSig70
Principle Classes
SRI International
Bioinformatics
Class names are capitalized, plural, separated by dashes
Genetic-Elements, with subclasses:
Chromosomes
Plasmids
Genes
Transcription-Units
RNAs
rRNAs, snRNAs, tRNAs, Charged-tRNAs
Proteins, with subclasses:
Polypeptides
Protein-Complexes
Principle Classes
Reactions,
with subclasses:
Transport-Reactions
Enzymatic-Reactions
Pathways
Compounds-And-Elements
SRI International
Bioinformatics
Slots in Multiple Classes
Common-Name
Synonyms
Comment
Citations
DB-Links
SRI International
Bioinformatics
Genes Slots
Component-Of
SRI International
Bioinformatics
(links to replicon, transcription
unit)
Left-End-Position
Right-End-Position
Centisome-Position
Transcription-Direction
Product
Proteins Slots
Molecular-Weight-Seq
Molecular-Weight-Exp
pI
Locations
Modified-Form
Unmodified-Form
Component-Of
SRI International
Bioinformatics
Polypeptides Slots
Gene
SRI International
Bioinformatics
Protein-Complexes Slots
Components
SRI International
Bioinformatics
Reactions Slots
EC-Number
Left,
Right
DeltaG0
Keq
Spontaneous?
SRI International
Bioinformatics
Enzymatic-Reactions Slots
Enzyme
Reaction
Activators
Inhibitors
Physiologically-Relevant
Cofactors
Prosthetic-Groups
Alternative-Substrates
Alternative-Cofactors
SRI International
Bioinformatics
Pathways Slots
Reaction-List
Predecessors
Primaries
SRI International
Bioinformatics
GKB Editor
Browse
Tools
GKB
SRI International
Bioinformatics
class hierarchy and slot definitions
-> Ontology Browser
Editor described at
http://www.ai.sri.com/~gkb/user-man.html
Pathway Tools
Data Access Mechanisms
Introduction
MANY
APIs
SRI International
Bioinformatics
ways to access and update PGDBs
in Java, Perl, and Lisp
Import/export
Registry
Import
of files in many formats
of Pathway/Genome Databases
PGDB data into BioWarehouse
Updating
a PGDB from an external genome DB
Pathway Tools APIs
Support
SRI International
Bioinformatics
programmatic queries and updates to
PGDBs
APIs
in Java, Perl, and Lisp all provide access to
a common set of procedures:
Generic Frame Protocol -- Ocelot object database API
Additional Pathway Tools functions
For
more information see
http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
Generic Frame Protocol (GFP)
A
SRI International
Bioinformatics
library of procedures for accessing Ocelot DBs
GFP
specification:
http://www.ai.sri.com/~gfp/spec/paper/paper.html
A
small number of GFP functions are sufficient for
most complex queries
Knowledge
of Pathway Tools schema is critical
for using the APIs:
Appendix I of Pathway Tools User’s Guide, Vol I
Generic Frame Protocol
get-class-all-instances (Class)
Returns the instances of Class
Key Pathway Tools classes:
Genetic-Elements
Genes
Proteins
Polypeptides (a subclass of Proteins)
Protein-Complexes (a subclass of Proteins)
Pathways
Reactions
Compounds-And-Elements
Enzymatic-Reactions
Transcription-Units
Promoters
DNA-Binding-Sites
SRI International
Bioinformatics
Generic Frame Protocol
SRI International
Bioinformatics
Notation Frame.Slot means a specified slot of a specified
frame
get-slot-value(Frame Slot)
Returns first value of Frame.Slot
get-slot-values(Frame Slot)
Returns all values of Frame.Slot as a list
slot-has-value-p(Frame Slot)
Returns T if Frame.Slot has at least one value
member-slot-value-p(Frame Slot Value)
Returns T if Value is one of the values of Frame.Slot
print-frame(Frame)
Prints the contents of Frame
Generic Frame Protocol
coercible-to-frame-p
SRI International
Bioinformatics
(Thing)
Returns T if Thing is the name of a frame, or a frame object
save-kb
Saves the current KB
Generic Frame Protocol –
Update Operations
SRI International
Bioinformatics
put-slot-value(Frame Slot Value)
Replace the current value(s) of Frame.Slot with Value
put-slot-values(Frame Slot Value-List)
Replace the current value(s) of Frame.Slot with Value-List, which must be a list of
values
add-slot-value(Frame Slot Value)
Add Value to the current value(s) of Frame.Slot, if any
remove-slot-value(Frame Slot Value)
Remove Value from the current value(s) of Frame.slot
replace-slot-value(Frame Slot Old-Value New-Value)
In Frame.Slot, replace Old-Value with New-Value
remove-local-slot-values(Frame Slot)
Remove all of the values of Frame.Slot
Additional Pathway Tools Functions –
Semantic Inference Layer
SRI International
Bioinformatics
Semantic
inference layer defines built-in
functions to compute commonly required
relationships in a PGDB
http://bioinformatics.ai.sri.com/ptools/ptoolsfns.html
Internal note
Note:
SRI International
Bioinformatics
Refer to local copy of ptools-fns.html to go
through the semantic inference layer fns
File Import/Export Capabilities
SRI International
Bioinformatics
PGDBs
can be exported in whole or part to:
SBML – Systems Biology Markup Language – sbml.org
Import supported by many simulation packages
File -> Export -> Selected Reactions to SBML File
Pathway Tools Attribute-Value format and column-delimited
format files
http://brg.ai.sri.com/ptools/flatfile-format.shtml
Dump entire PGDB to a suite of files: File -> Export -> Entire DB to Flat
Files
Dump selected frames to a single file: File -> Export -> Selected Frames
to File
Import/Export
Import from attribute-value or column-delimited files
File -> Import -> Frames From File
Import/Export to/from internal Pathway Tools format that
allows pathways, reactions, enzymes, and compounds to be
easily moved between Pathway Tools installations
SRI International
Bioinformatics
Edit -> Add Pathway to File Export List
File -> Export -> Selected Pathways to File
File -> Import -> Pathways from File
Import/Export to/from MDL molfile format
Edit -> Import compound structure from molfile
Edit -> Export compound structure to molfile
Miscellaneous Exports
SRI International
Bioinformatics
Overview -> Highlight -> Save to File
Overview -> Highlight -> Load from File
Gene / Protein Sequence / Save to file
Chromosome -> Show Sequence of a Segment of Replicon
SRI International
Bioinformatics
Napster Comes to Bioinformatics
Public
sharing of Pathway/Genome Databases
PGDB registry maintained by SRI at URL
http://biocyc.org/registry.html
Registry
operations
List contents of registry
Download PGDBs listed in the registry
Register PGDBs you have created
Registry Details
SRI International
Bioinformatics
Why
register your PGDB?
Declare existence of your PGDB in a central location
Facilitate download by other scientists
Why download a PGDB?
Desktop Navigator provides more functionality than Web
Comparative operations
Programmatic querying and processing of PGDB
Registration
process
Registered PGDBs have open availability by default
Authors can provide their own license agreements
Registered PGDBs reside on authors’ FTP site
BioWarehouse
Biospice.org
SRI International
Bioinformatics
New Import/Export Tools
Suggestions?
Volunteers?
SRI International
Bioinformatics
Updating a PGDB From an
External Genome DB
Example:
SRI International
Bioinformatics
AraCyc forms a pathway module to the
TAIR DB
TAIR
is authoritative source for gene and geneproduct information
Update
AraCyc to reflect updates in TAIR
Proposed Approach
SRI International
Bioinformatics
Export TAIR to PathoLogic files
Build AraCyc2 from those PathoLogic files – automated
PathoLogic only
Compare AraCyc1 (A1) to AraCyc2 (A2)
A. Import new genes/proteins from A2 to A1
B. Delete from A1 genes/proteins not found in A2
C. Rename genes/proteins whose names changed from A2 to A1
Run name matcher on A1’
Check for pathways with no enzymes and report them so user can keep any that
otherwise PathoLogic will delete
What about enzymes that were assigned to a pathway by the hole filler?
Re-run pathway predictor
Remember what pathways user deletes so they are not re-predicted by
PathoLogic
Consider movement of genes from contig to chromosome