Ontologies for Gene Expression
Download
Report
Transcript Ontologies for Gene Expression
SRI International
Bioinformatics
Ontologies for Gene Expression
History
of ontologies in bioinformatics
BioOntologies Consortium
Ontologies
for the biochemical networks that
control gene expression
Ontologies
Clear
SRI International
Bioinformatics
thinking about how to structure information
Clearly
understand each field in a database
Formal and informal definitions for database
elements
Type of value, range of values
Product field of Gene class can be a Protein or an RNA
Ability
to enforce data correctness
Ability to compute with database elements in a
reliable fashion
History of Ontologies in
Bioinformatics
SRI International
Bioinformatics
1994
Meeting on Interoperation of Molecular
Biology Databases (MIMBD-94)
BioOntologies
meetings in 1997, 1998, 1999, 2000,
2001
Ontology tutorials at ISMB conference
BioOntologies Consortium
BioOntologies Consortium
SRI International
Bioinformatics
Concerned
with ontology infrastructure for
bioinformatics
Exchange
of ontologies
Beware: All bioinformatics ontologies expressed in different
ontology language
Software
for constructing, interpreting, applying
ontologies
http://bioontology.ingenuity.com/
BioOntologies Consortium
ISMB-2000
SRI International
Bioinformatics
paper evaluating ontology exchange
languages for bioinformatics
BioOntologies Consortium
SRI International
Bioinformatics
ISMB-2000
paper evaluating ontology exchange
languages for bioinformatics
Define criteria for evaluating existing languages
No existing languages satisfy all criteria
Desired: XML syntax, frame semantics
1999:
Karp and Chaudhri develop XOL language
2000:
OIL/DAML succeeds XOL
BioOntologies Consortium –
Potential Interactions
SRI International
Bioinformatics
Standards
and tools
DAML/OIL
SRI’s GKB Editor ontology editor
Collaborate
Post
on ontology development
ontologies on BioOntologies web site
SRI International
Bioinformatics
Be Precise About Ontology Uses
Data
submission
Data exchange among databases
High-level database design
Mapping
from ontologies to database
management systems essential
Beware of flatfiles
Beware of XML
ArrayExpress
Ontology
MAML
SQL
for specifying experiments
import and export
query access
SRI International
Bioinformatics
EcoCyc Project Overview
E. coli Encyclopedia and model organism database
Tracks the evolving annotation of the E. coli genome
Over 3000 literature citations
Collaborative development via internet
Karp (SRI) -- Bioinformatics architect
Riley (MBL) -- Metabolic pathways, signal transduction
Saier (UCSD) and Paulsen (TIGR)-- Transport
Collado (UNAM)-- Regulation of gene expression
Ontology: 1000 biological classes
Database content: 16,000 instances
Over 3,300 registered users
SRI International
Bioinformatics
Encoding Transcriptional
Regulation in EcoCyc -- Goals
SRI International
Bioinformatics
Capture transcriptional regulatory mechanisms within a well
structured ontology
Provide a training set for inference of gene networks
Interpret gene-expression datasets in the context of known
regulatory mechanisms
Compute with regulatory mechanisms and pathways
Summary statistics
Pattern discovery
Complex queries
Consistency checking
Pathway Tools Extensions
for Transcriptional Regulation
SRI International
Bioinformatics
Integration
of RegulonDB (Collado et al.)
Regulation
ontology
Editing
New
tools for regulatory interactions
visualizations
EcoCyc Ontology for
Transcriptional Regulation
SRI International
Bioinformatics
Terminology:
Transcription Unit
Definition: A set of coding regions and associated control
regions that yield a single transcript
“Operons” must have more than one gene
Prokaryotic terminology
Key
features of ontology
Model gross structure of transcription units, transcription
factors, RNA polymerase
Model all molecular interactions as biochemical reactions
Binding of transcription factors to ligands and to DNA sites
Binding of RNA polymerase to promoter
SRI International
Bioinformatics
Ontology for Transcriptional
Regulation – Current Limitations
Focused
on prokaryotic regulation
Mechanisms
based on control of transcription
initiation only, e.g., no attenuation
Ontology for Regulatory
Interactions
SRI International
Bioinformatics
Common
slots
Citations, Comment, Common-Name, Synonyms
Class DNA-Regions
Left-End-Position, Right-End-Position, Relative-StartDistance
Class Transcription-Units
Components (Promoter, transcription-factor binding sites, genes,
terminator)
Class Promoters
Component-Of
Promoter-Strength-Exp, Promoter-Strength-Seq
Promoter-Evidence
Ontology for Regulatory
Interactions
Class DNA-Binding-Sites
Component-Of
Regulated-Promoter, Relative-Center-Distance
Type-Of-Evidence
Classes Protein-Complexes, Polypeptides
Components / Component-Of
Class Binding-Reactions
Reactants
Activators
Inhibitors
SRI International
Bioinformatics
EcoCyc Ontology for
Transcriptional Regulation
SRI International
Bioinformatics
One
DB object defined for each biological entity
and for each molecular interaction
trp
apoTrpR
trpLEDCBA
Int005
site001
Int001
pro001
Int003
trpL
trpE
trpD
trpC
trpB
trpA
TrpR*trp
RpoSig70
Integration of RegulonDB
RegulonDB
SRI International
Bioinformatics
has been loaded into EcoCyc
RegulonDB originally relational
Lisp loader tools developed for relational table dumps
Statistics:
528 transcription units
620 promoters
617 DNA binding sites
83 transcription factors
Consistency Checks on
RegulonDB Data
Find
transcription units containing:
Undefined components
No gene components
Genes that are not contiguous
Genes with conflicting transcription directions
SRI International
Bioinformatics
Interactive Editing Tools
SRI International
Bioinformatics
SRI
created interactive tools for creating and
modifying regulatory mechanisms
Ongoing
updates to RegulonDB occur in EcoCyc
Visualization Capabilities
Transcription
SRI International
Bioinformatics
units
Transcription unit containing a gene: araA
Details of a transcription unit
Regulons: CRP, NARL
Pathway control
Overview: show rxns controlled by a TF (CRP, FNR), show
other rxns controlled by same TF(s) (use a rxn in purine
biosyn)
Characterization of the E. coli
Genetic Network
551
transcription units include 1115 (25%) genes
Controlled
All
SRI International
Bioinformatics
by 86 transcription factors
experimentally determined
SRI International
Bioinformatics
Genes per Transcription Unit
250
233
Number of transcription units
200
144
150
100
76
50
40
26
6
8
6
7
4
5
2
2
1
2
0
2
8
9
10
11
12
13
14
15
0
1
2
3
4
5
Number of genes per transcription unit
SRI International
Bioinformatics
Binding Sites per Transcription Unit
160
144
140
Number of transcription units
120
100
80
63
60
47
40
20
20
10
10
5
2
2
1
1
1
8
9
10
11
12
0
1
2
3
4
5
6
7
Number of binding sites per transcription unit
SRI International
Bioinformatics
Transcription Factor Reach
30
26
Number of transcription factors
25
20
20
15
9
10
7
6
5
2
2
6
7
1
2
2
9
11
1
1
1
1
1
1
14
16
18
24
25
28
2
1
0
1
2
3
4
5
8
Num ber of transcription units regulated by given transcription factor
31
95
SRI International
Bioinformatics
Transcription Units per Pathway
30
25
Number of pathways
25
20
15
10
6
4
5
2
0
0
0
5
6
7
1
0
1
2
3
4
Num ber of operons per pathw ay
8
SRI International
Bioinformatics
Pathways per Transcription Unit
400
350
343 343
Number of operons
300
250
192
200
150
102
100
56
39
50
16
7
3
1
0
0
0
1
2
3
4
5
Num ber of pathw ays in an operon
6
7
Visualization of the Full
E. coli Genetic Network
Influences
SRI International
Bioinformatics
of transcription factors on other
transcription factors
50 of 85 TFs do not affect other TFs
Maximum network depth of 3
Only CRP has a branching factor greater than 2
No feedback loops other than autoregulation
Negative auto-regulation is the dominant form of
feedback