Pathway - Internet Database Lab.
Download
Report
Transcript Pathway - Internet Database Lab.
RDF based on Integration of Pathway
Database and Gene Ontology
SNU OOPSLA LAB.
2005
DongHyuk Im
Contents
Introduction
Pathway Database
Enzyme Database
Gene Ontology
Related Works
Our Approach
Supporting Function
Data Transformation
Integration of KEGG, Enzyme, Gene Ontology
Querying using SeRQL
Pathway?
Most chemical reaction mechanisms are translated
from a compound(substrate) to a
compound(product) by enzyme acting
Importance
to comparison and analyze pathways in order to
understand the process of creating compounds and the
evolutive relevance between organisms
Drug Discovery
Pathway
Map : Glycolysis / Gluconeogenesis
Map : Aquifex aeolicus
Enzyme Database
EC number
Recommended name
Alternative names(if any)
Catalytic activity
Cofactors (if any)
Pointers to the SWISS-PORT entrie(s) that
correspond to the enzyme (if any)
Pointers to disease(s) associated with a deficiency of
the enzyme (if any)
Enzyme Hierarchy
[*]
Four levels
[1]
[2]
[3]
[2.1]
[2.2]
[2.3]
[2.2.1]
[2.2.2]
[2.2.3]
[2.2.2.1]
[2.2.2.2] [2.2.2.3]
EC number
Ex) 1.1.1.1 is a member
of the top level group
[1]
The leftmost number
identifies the highest
level
[2.4.2.3] –
[2.4.2.4](sibling) :
similar reactions in
pathway
Gene Ontology
KEGG
KEGG
To computerize all aspects of cellular functions in
terms of the pathway of interacting molecules or
genes
To maintain gene catalogs for all organisms and link
each gene product to a pathway component
To organize a database of all chemical compounds in
the cell and link each compound to a pathway
component
To develop computational technologies for pathway
comparison, reconstruction, and analysis
Why RDF Integration?
Pathway data model : DAG
RDF is a good model for representing pathway
RDF data model : DAG
Need integration of multiple knowledge sources available
from internet : one of the major problems in biologists
RDF is a good model for same standard
Enzyme, GO : hierarchy structure
RDF is a good model for representing hierarchy structure
GO annotation is important
Enzymes(proteins) in certain pathway need GO annotation
Related Works
KEGG: Kyoto Encyclopedia of Genes and Genomes ,
1999, Nucleic Acids Res.
YeastHub: a semantic web case for integrating data in
the life science domain, 2005, Bioinformatics
LIGAND: database of chemical compounds and
reactions in biological pathways, 2002, Nucleic Acids
Res.
Gene Ontology: tool for the unification biology, the
Gene Ontology Consortium, 2000, Nature Genetics.
Our System’s Supporting
KEGG
Search compound
Path prediction
Search Enzyme
Our system’s function to add
Integration Query (pathway+enzyme+GO)
Relaxation Query using GO hierarchy
Searching pathway using enzyme information
Search Compounds
target
Compound : C00668
Pathway Prediction Tool
compound
Relaxation query using enzyme hierarchy
Search Enzyme
Enzyme : 5.3.1.9
From Pathway to Gene Ontology
Select enzyme
Data Translation for Integration
GENOS Storage
XSLT
KGML Data
KEGG RDF Data
Adding GO ID
Enzyme RDF Data
XSLT : http://www.w3.org/2005/02/13-KEGG/
GO RDF Data
KEGG RDF Data(1/2)
<k:entry>
<Gene rdf:nodeID="_1">
<k:name rdf:resource="http://www.w3.org/2005/02/13-KEGG/aae#aq_186"/>
<k:reaction rdf:resource="http://www.w3.org/2005/02/13-KEGG/rn#R00710"/>
<k:link rdf:resource="http://www.genome.jp/dbget-bin/www_bget?aae+aq_186"/>
<k:graphics><Rectangle k:name="aldH1" k:fgcolor="#000000"
k:bgcolor="#BFFFBF" k:x="170" k:y="1018" k:width="45" k:height="17"/>
</k:graphics>
</Gene>
</k:entry>
<k:entry>
<Enzyme rdf:nodeID="_3">
<k:name rdf:resource="http://www.w3.org/2005/02/13-KEGG/ec#1.2.1.5"/>
<k:graphics>
<Rectangle k:name="1.2.1.5" k:fgcolor="#000000"
k:bgcolor="#FFFFFF" k:x="170" k:y="1039" k:width="45" k:height="17"/>
</k:graphics>
</Enzyme>
</k:entry>
Gene entry
Enzyme entry
No
information
Compound entry
<k:entry>
<Compound rdf:nodeID="_4">
<k:name rdf:resource="http://www.w3.org/2005/02/13-KEGG/cpd#C00033"/>
<k:link rdf:resource="http://www.genome.jp/dbget-bin/www_bget?compound+C00033"/>
<k:graphics>
<Circle k:name="C00033" k:fgcolor="#000000"
k:bgcolor="#FFFFFF" k:x="102" k:y="971" k:width="8" k:height="8"/>
</k:graphics>
</Compound>
</k:entry>
KEGG RDF Data(2/2)
Relation
<k:relation>
<ECrel>
<k:entry1 rdf:resource="_42"/>
<k:entry2 rdf:resource="_48"/>
<compound rdf:resource="_88"/>
</ECrel>
</k:relation>
Reaction
<k:reaction reversible="" rdf:about="http://www.w3.org/2005/02/13-KEGG/rn#R00710">
<k:substrate rdf:resource="http://www.w3.org/2005/02/13-KEGG/cpd#C00084"/>
<k:product rdf:resource="http://www.w3.org/2005/02/13-KEGG/cpd#C00033"/>
</k:reaction>
How to Process KEGG Pathway
Problem
GENOS(Sesame) does not support multiple graph
KEGG data consists of multiple documents
Ex) map00010.rdf, aae00010.rdf …
Solution
Using namespace, we can distinguish maps
When Storing pathway data, pathway’s map name is added
as a namespace in resource table of GENOS
Processing Pathway Data
<k:Pathway k:org="aae" k:number="00010" k:title="Glycolysis / Gluconeogenesis">
….
….
<k:entry>
<Gene rdf:nodeID="_1">
<k:name rdf:resource="http://www.w3.org/2005/02/13-KEGG/aae#aq_186"/>
<k:reaction rdf:resource="http://www.w3.org/2005/02/13-KEGG/rn#R00710"/>
<k:link rdf:resource="http://www.genome.jp/dbget-bin/www_bget?aae+aq_186"/>
<k:graphics><Rectangle k:name="aldH1" k:fgcolor="#000000"
k:bgcolor="#BFFFBF" k:x="170" k:y="1018" k:width="45" k:height="17"/>
</k:graphics>
</Gene>
</k:entry>
resources
table of GENOS
ID
NameSpace
Localname
1
…
…
2
…
Glycolysis/…
3
aae#00010
_1
4
…
aq_186
5
…
6
aae#00020
_1
map#00010
_1
7
8
9
….
conflict
triples table
of GENOS
Subject
Predicate
Object
…
…
…
3
…
…
6
…
…
8
…
…
…
…
…
Integrating Databases
Enzyme number
GO ID
Relaxation Querying using SeRQL
E1
subclassof
subclassof
E1.*
C2
C1
E1.*
SeRQL
SELECT C1,C2
FROM Path_EXP
WHERE E1 LIKE “1.*"
use Prefix
Dewey order
Ex. 1.1 and 1.2 are childrens of 1
Considering Performance
KEGG : Pathway List
aae:aq_018
aae:aq_020
aae:aq_021
….
….
….
….
eco:b1236
eco:b1236
eco:b1236
….
Genes
path:aae03010
path:aae03010
path:aae00400
path:eco00052
path:eco00500
path:eco00520
Map
using genes_index
Schedule
Implementation (~11/30)
Integrated Databases
Query Processor for pathway
Simple UI (Web :JSP)
Complete Paper (~12/10)