Ontologies for Gene Expression

Download Report

Transcript Ontologies for Gene Expression

SRI International
Bioinformatics
Ontologies for Gene Expression
 History
of ontologies in bioinformatics
 BioOntologies Consortium
 Ontologies
for the biochemical networks that
control gene expression
Ontologies
 Clear
SRI International
Bioinformatics
thinking about how to structure information
 Clearly
understand each field in a database
 Formal and informal definitions for database
elements
 Type of value, range of values
 Product field of Gene class can be a Protein or an RNA
 Ability
to enforce data correctness
 Ability to compute with database elements in a
reliable fashion
History of Ontologies in
Bioinformatics
SRI International
Bioinformatics
 1994
Meeting on Interoperation of Molecular
Biology Databases (MIMBD-94)
 BioOntologies
meetings in 1997, 1998, 1999, 2000,
2001
 Ontology tutorials at ISMB conference
 BioOntologies Consortium
BioOntologies Consortium
SRI International
Bioinformatics
 Concerned
with ontology infrastructure for
bioinformatics
 Exchange
of ontologies
 Beware: All bioinformatics ontologies expressed in different
ontology language
 Software
for constructing, interpreting, applying
ontologies
 http://bioontology.ingenuity.com/
BioOntologies Consortium
 ISMB-2000
SRI International
Bioinformatics
paper evaluating ontology exchange
languages for bioinformatics
BioOntologies Consortium
SRI International
Bioinformatics
 ISMB-2000
paper evaluating ontology exchange
languages for bioinformatics
 Define criteria for evaluating existing languages
 No existing languages satisfy all criteria
 Desired: XML syntax, frame semantics
 1999:
Karp and Chaudhri develop XOL language
 2000:
OIL/DAML succeeds XOL
BioOntologies Consortium –
Potential Interactions
SRI International
Bioinformatics
 Standards
and tools
 DAML/OIL
 SRI’s GKB Editor ontology editor
 Collaborate
 Post
on ontology development
ontologies on BioOntologies web site
SRI International
Bioinformatics
Be Precise About Ontology Uses
 Data
submission
 Data exchange among databases
 High-level database design
 Mapping
from ontologies to database
management systems essential
 Beware of flatfiles
 Beware of XML
ArrayExpress
 Ontology
 MAML
 SQL
for specifying experiments
import and export
query access
SRI International
Bioinformatics
EcoCyc Project Overview


E. coli Encyclopedia and model organism database
 Tracks the evolving annotation of the E. coli genome
 Over 3000 literature citations
Collaborative development via internet
 Karp (SRI) -- Bioinformatics architect
 Riley (MBL) -- Metabolic pathways, signal transduction
 Saier (UCSD) and Paulsen (TIGR)-- Transport
 Collado (UNAM)-- Regulation of gene expression

Ontology: 1000 biological classes
Database content: 16,000 instances

Over 3,300 registered users

SRI International
Bioinformatics
Encoding Transcriptional
Regulation in EcoCyc -- Goals




SRI International
Bioinformatics
Capture transcriptional regulatory mechanisms within a well
structured ontology
Provide a training set for inference of gene networks
Interpret gene-expression datasets in the context of known
regulatory mechanisms
Compute with regulatory mechanisms and pathways
 Summary statistics
 Pattern discovery
 Complex queries
 Consistency checking
Pathway Tools Extensions
for Transcriptional Regulation
SRI International
Bioinformatics
 Integration
of RegulonDB (Collado et al.)
 Regulation
ontology
 Editing
 New
tools for regulatory interactions
visualizations
EcoCyc Ontology for
Transcriptional Regulation
SRI International
Bioinformatics
 Terminology:
Transcription Unit
 Definition: A set of coding regions and associated control
regions that yield a single transcript
 “Operons” must have more than one gene
 Prokaryotic terminology
 Key
features of ontology
 Model gross structure of transcription units, transcription
factors, RNA polymerase
 Model all molecular interactions as biochemical reactions


Binding of transcription factors to ligands and to DNA sites
Binding of RNA polymerase to promoter
SRI International
Bioinformatics
Ontology for Transcriptional
Regulation – Current Limitations
 Focused
on prokaryotic regulation
 Mechanisms
based on control of transcription
initiation only, e.g., no attenuation
Ontology for Regulatory
Interactions
SRI International
Bioinformatics
 Common
slots
 Citations, Comment, Common-Name, Synonyms
 Class DNA-Regions
 Left-End-Position, Right-End-Position, Relative-StartDistance
 Class Transcription-Units


Components (Promoter, transcription-factor binding sites, genes,
terminator)
Class Promoters



Component-Of
Promoter-Strength-Exp, Promoter-Strength-Seq
Promoter-Evidence
Ontology for Regulatory
Interactions

Class DNA-Binding-Sites
 Component-Of
 Regulated-Promoter, Relative-Center-Distance
 Type-Of-Evidence

Classes Protein-Complexes, Polypeptides
 Components / Component-Of

Class Binding-Reactions
 Reactants
 Activators
 Inhibitors
SRI International
Bioinformatics
EcoCyc Ontology for
Transcriptional Regulation
SRI International
Bioinformatics
 One
DB object defined for each biological entity
and for each molecular interaction
trp
apoTrpR
trpLEDCBA
Int005
site001
Int001
pro001
Int003
trpL
trpE
trpD
trpC
trpB
trpA
TrpR*trp
RpoSig70
Integration of RegulonDB
 RegulonDB
SRI International
Bioinformatics
has been loaded into EcoCyc
 RegulonDB originally relational
 Lisp loader tools developed for relational table dumps
 Statistics:
528 transcription units
 620 promoters
 617 DNA binding sites
 83 transcription factors

Consistency Checks on
RegulonDB Data
 Find
transcription units containing:
 Undefined components
 No gene components
 Genes that are not contiguous
 Genes with conflicting transcription directions
SRI International
Bioinformatics
Interactive Editing Tools
SRI International
Bioinformatics
 SRI
created interactive tools for creating and
modifying regulatory mechanisms
 Ongoing
updates to RegulonDB occur in EcoCyc
Visualization Capabilities
 Transcription
SRI International
Bioinformatics
units
 Transcription unit containing a gene: araA
 Details of a transcription unit
 Regulons: CRP, NARL
 Pathway control
 Overview: show rxns controlled by a TF (CRP, FNR), show
other rxns controlled by same TF(s) (use a rxn in purine
biosyn)
Characterization of the E. coli
Genetic Network
 551
transcription units include 1115 (25%) genes
 Controlled
 All
SRI International
Bioinformatics
by 86 transcription factors
experimentally determined
SRI International
Bioinformatics
Genes per Transcription Unit
250
233
Number of transcription units
200
144
150
100
76
50
40
26
6
8
6
7
4
5
2
2
1
2
0
2
8
9
10
11
12
13
14
15
0
1
2
3
4
5
Number of genes per transcription unit
SRI International
Bioinformatics
Binding Sites per Transcription Unit
160
144
140
Number of transcription units
120
100
80
63
60
47
40
20
20
10
10
5
2
2
1
1
1
8
9
10
11
12
0
1
2
3
4
5
6
7
Number of binding sites per transcription unit
SRI International
Bioinformatics
Transcription Factor Reach
30
26
Number of transcription factors
25
20
20
15
9
10
7
6
5
2
2
6
7
1
2
2
9
11
1
1
1
1
1
1
14
16
18
24
25
28
2
1
0
1
2
3
4
5
8
Num ber of transcription units regulated by given transcription factor
31
95
SRI International
Bioinformatics
Transcription Units per Pathway
30
25
Number of pathways
25
20
15
10
6
4
5
2
0
0
0
5
6
7
1
0
1
2
3
4
Num ber of operons per pathw ay
8
SRI International
Bioinformatics
Pathways per Transcription Unit
400
350
343 343
Number of operons
300
250
192
200
150
102
100
56
39
50
16
7
3
1
0
0
0
1
2
3
4
5
Num ber of pathw ays in an operon
6
7
Visualization of the Full
E. coli Genetic Network
 Influences
SRI International
Bioinformatics
of transcription factors on other
transcription factors
 50 of 85 TFs do not affect other TFs
 Maximum network depth of 3
 Only CRP has a branching factor greater than 2
 No feedback loops other than autoregulation
 Negative auto-regulation is the dominant form of
feedback