get-slot-values(Frame Slot) - Bioinformatics Research Group at SRI

Download Report

Transcript get-slot-values(Frame Slot) - Bioinformatics Research Group at SRI

The Pathway Tools Schema
Motivations for Understanding
Schema
SRI International
Bioinformatics
 Pathway
Tools visualizations and analyses
depend upon the software being able to find
precise information in precise places within a
Pathway/Genome DB
 When
writing complex queries to PGDBs, those
queries must name classes and slots within the
schema
A
Pathway/Genome Database is a web of
interconnected objects; each object represents a
biological entity
Reference
 Pathway
SRI International
Bioinformatics
Tools User’s Guide, Volume I
 Appendix A: Guide to the Pathway Tools Schema
SRI International
Bioinformatics
Web of Relationships for One Enzyme
TCA Cycle
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
Sdh-flavo
Sdh-Fe-S
Sdh-membrane-1
Sdh-membrane-2
sdhA
sdhB
sdhC
sdhD
Frame Data Model
 Frame
SRI International
Bioinformatics
Data Model -- organizational structure for a
PGDB
 Knowledge
base (KB, Database, DB)
 Frames
 Slots
 Facets
 Annotations
Knowledge Base
 Collection
SRI International
Bioinformatics
of frames and their associated slots,
values, facets, and annotations
 AKA: Database, PGDB
 Can
be stored within
 An Oracle DB
 A disk file
 A Pathway Tools binary program
Frames
SRI International
Bioinformatics

Entities with which facts are associated

Kinds of frames:
 Classes: Genes, Pathways, Biosynthetic Pathways
 Instances (objects): trpA, TCA cycle

Classes:
 Superclass(es)
 Subclass(es)
 Instance(s)

A symbolic frame name (id, key) uniquely identifies each
frame
Frame IDs
 Naming
SRI International
Bioinformatics
conventions for frame IDs
 Uniqueness of frame IDs
 Frame IDs must be unique within a PGDB
 Goal: Same frame ID within different PGDBs should refer to
the same biological entity
 Because many frames are imported from MetaCyc, this helps
ensure consistency of frame names
 Frame IDs for newly created frames (not imported) are
generated by Pathway Tools


Those frame IDs contain a PGDB-specific identifier
Example: CPLXzz-nnnn CPLXB3-0035
Slots
SRI International
Bioinformatics
 Encode
attributes/properties of a frame
 Integer, real number, string, symbols
 Represent
relationships between frames
 The value of a slot is the identifier of another frame
 Every
slot is described by a “slot frame” in a KB
that defines meta information about that slot
SRI International
Bioinformatics
Slot Links
TCA Cycle
in-pathway
Succinate + FAD = fumarate + FADH2
reaction
Enzymatic-reaction
catalyzes
Succinate dehydrogenase
component-of
Sdh-flavo
Sdh-Fe-S
Sdh-membrane-1
Sdh-membrane-2
product
sdhA
sdhB
sdhC
sdhD
Slots
SRI International
Bioinformatics
 Number
of values
 Single valued
 Multivalued: sets, bags
 Slot
values
 Any LISP object: Integer, real, string, symbol (frame name)
 Slotunits
define properties of slots: datatypes,
classes, constraints
 Two
slots are inverses if they encode opposite
relationships
 Slot Product in class Genes
SRI International
Bioinformatics
Representation of Function
TCA Cycle
EC#
Keq
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
Cofactors
Inhibitors
Molecular wt
pI
Sdh-flavo
Sdh-Fe-S
Sdh-membrane-1
Sdh-membrane-2
sdhA
sdhB
sdhC
sdhD
Left-end-position
Monofunctional Monomer
Pathway
Reaction
Enzymatic-reaction
Monomer
Gene
SRI International
Bioinformatics
SRI International
Bioinformatics
Bifunctional Monomer
Pathway
Reaction
Reaction
Enzymatic-reaction
Enzymatic-reaction
Monomer
Gene
Monofunctional Multimer
SRI International
Bioinformatics
Pathway
Reaction
Enzymatic-reaction
Multimer
Monomer
Monomer
Monomer
Monomer
Gene
Gene
Gene
Gene
Pathway and Substrates
Reactant-1
Pathway
left
in-pathway
Reactant-2
Reaction
Product-1
Product-2
SRI International
Bioinformatics
right
Reaction
Reaction
Reaction
Transcriptional Regulation
trp
apoTrpR
trpLEDCBA
Int005
site001
Int001
pro001
Int003
trpL
trpE
trpD
trpC
trpB
trpA
SRI International
Bioinformatics
TrpR*trp
RpoSig70
Principle Classes
SRI International
Bioinformatics

Class names are capitalized, plural, separated by dashes

Genetic-Elements, with subclasses:
 Chromosomes
 Plasmids
Genes
Transcription-Units
RNAs
 rRNAs, snRNAs, tRNAs, Charged-tRNAs
Proteins, with subclasses:
 Polypeptides
 Protein-Complexes




Principle Classes
 Reactions,
with subclasses:
 Transport-Reactions
 Enzymatic-Reactions
 Pathways
 Compounds-And-Elements
SRI International
Bioinformatics
Slots in Multiple Classes
 Common-Name
 Synonyms
 Comment
 Citations
 DB-Links
SRI International
Bioinformatics
Genes Slots
 Component-Of
SRI International
Bioinformatics
(links to replicon, transcription
unit)
 Left-End-Position
 Right-End-Position
 Centisome-Position
 Transcription-Direction
 Product
Proteins Slots
 Molecular-Weight-Seq
 Molecular-Weight-Exp
 pI
 Locations
 Modified-Form
 Unmodified-Form
 Component-Of
SRI International
Bioinformatics
Polypeptides Slots
 Gene
SRI International
Bioinformatics
Protein-Complexes Slots
 Components
SRI International
Bioinformatics
Reactions Slots
 EC-Number
 Left,
Right
 DeltaG0
 Keq
 Spontaneous?
SRI International
Bioinformatics
Enzymatic-Reactions Slots
 Enzyme
 Reaction
 Activators
 Inhibitors
 Physiologically-Relevant
 Cofactors
 Prosthetic-Groups
 Alternative-Substrates
 Alternative-Cofactors
SRI International
Bioinformatics
Pathways Slots
 Reaction-List
 Predecessors
 Primaries
SRI International
Bioinformatics
GKB Editor
 Browse
 Tools
 GKB
SRI International
Bioinformatics
class hierarchy and slot definitions
-> Ontology Browser
Editor described at
 http://www.ai.sri.com/~gkb/user-man.html
Pathway Tools
Data Access Mechanisms
Introduction
 MANY
 APIs
SRI International
Bioinformatics
ways to access and update PGDBs
in Java, Perl, and Lisp
 Import/export
 Registry
 Import
of files in many formats
of Pathway/Genome Databases
PGDB data into BioWarehouse
 Updating
a PGDB from an external genome DB
Pathway Tools APIs
 Support
SRI International
Bioinformatics
programmatic queries and updates to
PGDBs
 APIs
in Java, Perl, and Lisp all provide access to
a common set of procedures:
 Generic Frame Protocol -- Ocelot object database API
 Additional Pathway Tools functions
 For
more information see
 http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
Generic Frame Protocol (GFP)
A
SRI International
Bioinformatics
library of procedures for accessing Ocelot DBs
 GFP
specification:
 http://www.ai.sri.com/~gfp/spec/paper/paper.html
A
small number of GFP functions are sufficient for
most complex queries
 Knowledge
of Pathway Tools schema is critical
for using the APIs:
 Appendix I of Pathway Tools User’s Guide, Vol I
Generic Frame Protocol

get-class-all-instances (Class)
 Returns the instances of Class

Key Pathway Tools classes:
 Genetic-Elements
 Genes
 Proteins
 Polypeptides (a subclass of Proteins)
 Protein-Complexes (a subclass of Proteins)
 Pathways
 Reactions
 Compounds-And-Elements
 Enzymatic-Reactions
 Transcription-Units
 Promoters
 DNA-Binding-Sites
SRI International
Bioinformatics
Generic Frame Protocol
SRI International
Bioinformatics

Notation Frame.Slot means a specified slot of a specified
frame

get-slot-value(Frame Slot)
 Returns first value of Frame.Slot
get-slot-values(Frame Slot)
 Returns all values of Frame.Slot as a list




slot-has-value-p(Frame Slot)
 Returns T if Frame.Slot has at least one value
member-slot-value-p(Frame Slot Value)
 Returns T if Value is one of the values of Frame.Slot
print-frame(Frame)
 Prints the contents of Frame
Generic Frame Protocol
 coercible-to-frame-p
SRI International
Bioinformatics
(Thing)
 Returns T if Thing is the name of a frame, or a frame object
 save-kb

Saves the current KB
Generic Frame Protocol –
Update Operations
SRI International
Bioinformatics

put-slot-value(Frame Slot Value)
 Replace the current value(s) of Frame.Slot with Value

put-slot-values(Frame Slot Value-List)
 Replace the current value(s) of Frame.Slot with Value-List, which must be a list of
values

add-slot-value(Frame Slot Value)
 Add Value to the current value(s) of Frame.Slot, if any

remove-slot-value(Frame Slot Value)
 Remove Value from the current value(s) of Frame.slot

replace-slot-value(Frame Slot Old-Value New-Value)
 In Frame.Slot, replace Old-Value with New-Value

remove-local-slot-values(Frame Slot)
 Remove all of the values of Frame.Slot
Additional Pathway Tools Functions –
Semantic Inference Layer
SRI International
Bioinformatics
 Semantic
inference layer defines built-in
functions to compute commonly required
relationships in a PGDB
 http://bioinformatics.ai.sri.com/ptools/ptoolsfns.html
Internal note
 Note:
SRI International
Bioinformatics
Refer to local copy of ptools-fns.html to go
through the semantic inference layer fns
File Import/Export Capabilities
SRI International
Bioinformatics
 PGDBs
can be exported in whole or part to:
 SBML – Systems Biology Markup Language – sbml.org


Import supported by many simulation packages

File -> Export -> Selected Reactions to SBML File
Pathway Tools Attribute-Value format and column-delimited
format files



http://brg.ai.sri.com/ptools/flatfile-format.shtml
Dump entire PGDB to a suite of files: File -> Export -> Entire DB to Flat
Files
Dump selected frames to a single file: File -> Export -> Selected Frames
to File
Import/Export

Import from attribute-value or column-delimited files


File -> Import -> Frames From File
Import/Export to/from internal Pathway Tools format that
allows pathways, reactions, enzymes, and compounds to be
easily moved between Pathway Tools installations




SRI International
Bioinformatics
Edit -> Add Pathway to File Export List
File -> Export -> Selected Pathways to File
File -> Import -> Pathways from File
Import/Export to/from MDL molfile format
 Edit -> Import compound structure from molfile
 Edit -> Export compound structure to molfile
Miscellaneous Exports




SRI International
Bioinformatics
Overview -> Highlight -> Save to File
Overview -> Highlight -> Load from File
Gene / Protein Sequence / Save to file
Chromosome -> Show Sequence of a Segment of Replicon
SRI International
Bioinformatics
Napster Comes to Bioinformatics
 Public

sharing of Pathway/Genome Databases
PGDB registry maintained by SRI at URL
http://biocyc.org/registry.html
 Registry
operations
 List contents of registry
 Download PGDBs listed in the registry
 Register PGDBs you have created
Registry Details
SRI International
Bioinformatics
 Why
register your PGDB?
 Declare existence of your PGDB in a central location
 Facilitate download by other scientists
 Why download a PGDB?
 Desktop Navigator provides more functionality than Web
 Comparative operations
 Programmatic querying and processing of PGDB
 Registration
process
 Registered PGDBs have open availability by default
 Authors can provide their own license agreements
 Registered PGDBs reside on authors’ FTP site
BioWarehouse
 Biospice.org
SRI International
Bioinformatics
New Import/Export Tools
 Suggestions?
 Volunteers?
SRI International
Bioinformatics
Updating a PGDB From an
External Genome DB
 Example:
SRI International
Bioinformatics
AraCyc forms a pathway module to the
TAIR DB
 TAIR
is authoritative source for gene and geneproduct information
 Update
AraCyc to reflect updates in TAIR
Proposed Approach



SRI International
Bioinformatics
Export TAIR to PathoLogic files
Build AraCyc2 from those PathoLogic files – automated
PathoLogic only
Compare AraCyc1 (A1) to AraCyc2 (A2)
A. Import new genes/proteins from A2 to A1
B. Delete from A1 genes/proteins not found in A2
C. Rename genes/proteins whose names changed from A2 to A1
 Run name matcher on A1’
 Check for pathways with no enzymes and report them so user can keep any that
otherwise PathoLogic will delete

What about enzymes that were assigned to a pathway by the hole filler?

Re-run pathway predictor
Remember what pathways user deletes so they are not re-predicted by
PathoLogic

Consider movement of genes from contig to chromosome
