No Slide Title

Download Report

Transcript No Slide Title

The Pathway Tools Schema
Motivations for Understanding
Schema
SRI International
Bioinformatics
 Pathway
Tools visualizations and analyses
depend upon the software being able to find
precise information in precise places within a
Pathway/Genome DB
 When
writing complex queries to PGDBs, those
queries must name classes and slots within the
schema
A
Pathway/Genome Database is a web of
interconnected objects; each object represents a
biological entity
Reference
 Pathway
SRI International
Bioinformatics
Tools User’s Guide, Volume I
 Appendix A: Guide to the Pathway Tools Schema
SRI International
Bioinformatics
Web of Relationships for One Enzyme
TCA Cycle
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
Sdh-flavo
Sdh-Fe-S
Sdh-membrane-1
Sdh-membrane-2
sdhA
sdhB
sdhC
sdhD
Frame Data Model
 Frame
SRI International
Bioinformatics
Data Model -- organizational structure for a
PGDB
 Knowledge
base (KB, Database, DB)
 Frames
 Slots
 Facets
 Annotations
Knowledge Base
 Collection
SRI International
Bioinformatics
of frames and their associated slots,
values, facets, and annotations
 AKA: Database, PGDB
 Can
be stored within
 An Oracle or MySQL DB
 A disk file
 Pathway Tools binary program
Frames
SRI International
Bioinformatics

Entities with which facts are associated

Kinds of frames:
 Classes: Genes, Pathways, Biosynthetic Pathways
 Instances (objects): trpA, TCA cycle

Classes:
 Superclass(es)
 Subclass(es)
 Instance(s)

A symbolic frame name (id, key) uniquely identifies each
frame
Slots
SRI International
Bioinformatics
 Encode
attributes/properties of a frame
 Integer, real number, string
 Represent
relationships between frames
 The value of a slot is the identifier of another frame
 Every
slot is described by a “slot frame” in a KB
that defines meta information about that slot
SRI International
Bioinformatics
Slot Links
TCA Cycle
in-pathway
Succinate + FAD = fumarate + FADH2
reaction
Enzymatic-reaction
catalyzes
Succinate dehydrogenase
component-of
Sdh-flavo
Sdh-Fe-S
Sdh-membrane-1
Sdh-membrane-2
product
sdhA
sdhB
sdhC
sdhD
Slots
SRI International
Bioinformatics
 Number
of values
 Single valued
 Multivalued: sets, bags
 Slot
values
 Any LISP object: Integer, real, string, symbol (frame name)
 Slotunits
define properties of slots: datatypes,
classes, constraints
 Two
slots are inverses if they encode opposite
relationships
 Slot Product in class Genes
SRI International
Bioinformatics
Representation of Function
TCA Cycle
EC#
Keq
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
Cofactors
Inhibitors
Molecular wt
pI
Sdh-flavo
Sdh-Fe-S
Sdh-membrane-1
Sdh-membrane-2
sdhA
sdhB
sdhC
sdhD
Left-end-position
Monofunctional Monomer
Pathway
Reaction
Enzymatic-reaction
Monomer
Gene
SRI International
Bioinformatics
SRI International
Bioinformatics
Bifunctional Monomer
Pathway
Reaction
Reaction
Enzymatic-reaction
Enzymatic-reaction
Monomer
Gene
Monofunctional Multimer
SRI International
Bioinformatics
Pathway
Reaction
Enzymatic-reaction
Multimer
Monomer
Monomer
Monomer
Monomer
Gene
Gene
Gene
Gene
Pathway and Substrates
Reactant-1
Pathway
left
in-pathway
Reactant-2
Reaction
Product-1
Product-2
SRI International
Bioinformatics
right
Reaction
Reaction
Reaction
Transcriptional Regulation
trp
apoTrpR
trpLEDCBA
Int005
site001
Int001
pro001
Int003
trpL
trpE
trpD
trpC
trpB
trpA
SRI International
Bioinformatics
TrpR*trp
RpoSig70
Principle Classes
SRI International
Bioinformatics

Class names are capitalized, plural, separated by dashes

Genetic-Elements, with subclasses:
 Chromosomes
 Plasmids
Genes
Transcription-Units
RNAs
 rRNAs, snRNAs, tRNAs, Charged-tRNAs
Proteins, with subclasses:
 Polypeptides
 Protein-Complexes




Principle Classes
 Reactions,
with subclasses:
 Transport-Reactions
 Enzymatic-Reactions
 Pathways
 Compounds-And-Elements
SRI International
Bioinformatics
Frame IDs of Instances
 Instance
frame ID conventions have evolved over
time
 Examples:

Pathways



TRPSYN-PWY, P23-PWY
Genes

AG10045
Monomers

SRI International
Bioinformatics
TRPA-MONOMER, AG10045-MONOMER
Slots in Multiple Classes
SRI International
Bioinformatics
 Common-Name
 Synonyms
 Names
(computed as union of Common-Name,
Synonyms)
 Comment
 Citations
 DB-Links
Genes Slots
 Component-Of
SRI International
Bioinformatics
(links to replicon, transcription
unit)
 Left-End-Position
 Right-End-Position
 Centisome-Position
 Transcription-Direction
 Product
Proteins Slots
 Molecular-Weight-Seq
 Molecular-Weight-Exp
 pI
 Locations
 Modified-Form
 Unmodified-Form
 Component-Of
SRI International
Bioinformatics
Polypeptides Slots
 Gene
SRI International
Bioinformatics
Protein-Complexes Slots
 Components
SRI International
Bioinformatics
Reactions Slots
SRI International
Bioinformatics
 EC-Number
 Left,
Right
 Substrates (computed as union of Left, Right)
 DeltaG0
 Keq
 Spontaneous?
Enzymatic-Reactions Slots
 Enzyme
 Reaction
 Activators
 Inhibitors
 Physiologically-Relevant
 Cofactors
 Prosthetic-Groups
 Alternative-Substrates
 Alternative-Cofactors
SRI International
Bioinformatics
Pathways Slots
 Reaction-List
 Predecessors
 Primaries
SRI International
Bioinformatics