Chabalier - Bio

Download Report

Transcript Chabalier - Bio

Integrating and querying disease
and pathway ontologies:
building an OWL model and using
RDFS queries
Julie Chabalier, Olivier Dameron, Anita Burgun
EA 3888 – Conceptual Modeling of Biomedical Knowledge
Faculty of Medicine - University of Rennes 1
http://www.ea3888.univ-rennes1.fr
Introduction
Disease description in current medical ontologies
•
•
•
•
Clinical features
Etiology
Location
Morphology
Example: SNOMED Clinical Terms® (SNOMED CT®)
Definitional manifestation
Disease
associated morphology
causative agent
finding site
http://www.snomed.org/
EA 3888 – University of Rennes 1
Introduction
Characterization of diseases : biological knowledge required
• Genes
- A gene mutation may result in a disease
• Metabolic pathways
- A pathway may be shared by different phenotypes
• Biological processes
- Different processes may explain different grades of a disease
Biological knowledge  Absent from medical ontologies
EA 3888 – University of Rennes 1
Objectives
Integration of disease and pathway ontologies
• Ontology integration
- Identify candidate ontologies
- Get candidate ontologies in an adequate formalism
- Integrate formalized ontologies
• Querying the resulting ontology
- Consistency checking
- Exploiting biomedical knowledge
EA 3888 – University of Rennes 1
Candidate ontologies
KEGG Orthology (KO) hierarchy
• Organization of metabolic pathway and disease maps in the
KEGG knowledge base
• DAG of four levels
EA 3888 – University of Rennes 1
Candidate ontologies
the Gene Ontology
Gene Ontology (GO)
~ 20000 terms organized according to 3 hierarchies :
- Molecular Function
- Cellular Component
- Biological Process
 Used to enrich the KO pathway definitions
EA 3888 – University of Rennes 1
Candidate ontologies
SNOMED-CT: clinical description of diseases
Disorder of brain
Organic mental
disorder
Neoplasm of brain
Dementia
Intracranial
glioma
Alzheimer's
disease
findingSite
findingSite
Brain
structure
Cerebral
structure
 Used to enrich the KO disease definitions
EA 3888 – University of Rennes 1
Formalism
OWL as a common formalism
• Unambiguous combination of several ontologies (URI, namespaces)
• Defined semantics
• Expressiveness (e.g disjointness)
Getting candidate ontologies in OWL-DL
• KO: conversion of the 3 upper levels (available in text)
• GO: extraction of Biological Process hierarchy (available in OWL)
• SNOMED: extraction and conversion of the relevant concepts and
relations (from UMLS)
EA 3888 – University of Rennes 1
Ontology integration
Setting up relationships between ontologies
• Aligning: defining relationships between terms (is-a, part-of, etc.)
• Mapping: defining equivalence relationships between terms
EA 3888 – University of Rennes 1
Integration framework
GO
Biological
Processes
KO
Pathways
Diseases
SNOMED
Diseases
Pathway
descriptions
Disease and Pathway
descriptions
Disease
descriptions
BioMed Ontology
EA 3888 – University of Rennes 1
Mapping GO processes – KO pathways
GO
biological
processes
KO
Pathways
Diseases
SNOMED
Diseases
Metamap program*: lexical mapping (labels and synonyms)
GO: Metabolism
KO: Metabolism
GO: Macromolecule
metabolism
KO: Carbohydrate
metabolism
GO: Carbohydrate
metabolism
KO: Fructose and mannose
metabolism
*Aronson, A.R. (2001) Effective mapping of biomedical text to the UMLS Metathesaurus: the
MetaMap program, Proceedings of the
AMIA
Symp.,of17-21
EA 3888
– University
Rennes 1
Aligning GO processes – KO pathways
GO: atomic concepts
KO: composite concepts
GO: Carbohydrate
metabolism
KO: Carbohydrate
metabolism
GO: Cellular carbohydrate
metabolism
GO: Monosaccharide
metabolism
KO: Fructose and mannose
metabolism
Patterns to segment and recompose KO
terms before the mapping
KO: Fructose and mannose metabolism
GO: Hexose
metabolism
Fructose
GO: Fructose
metabolism
metabolism
GO: Mannose
metabolism
EA 3888 – University of Rennes 1
mannose metabolism
Mapping & aligning GO processes – KO pathways
GO: Carbohydrate
metabolism
KO: Carbohydrate
metabolism
GO: Cellular carbohydrate
metabolism
KO: Fructose and mannose
metabolism
GO: Monosaccharide
metabolism
GO: Hexose
metabolism
GO: Fructose
metabolism
GO: Mannose
metabolism
EA 3888 – University of Rennes 1
Mapping of KO diseases and SNOMED diseases
KO
Pathways
Diseases
GO
biological
processes
SNOMED
Diseases
Metamap program
KO: Human diseases
SN: Disorder of brain
KO: Neurodegenerative disorders
SN: Organic mental
disorder
KO: Alzheimer's disease
SN: Dementia
SN: Alzheimer's
disease
EA 3888 – University of Rennes 1
Alignment of pathways and diseases
GO
Biological
Processes
1
KO
Pathways
Diseases
2
SNOMED
Diseases
• Alignment: inferring relationships between :
1 - GO processes and KO diseases
2 - KO pathways and KO diseases
• Condition of alignment : if, at least, one gene is
involved in both a disease D and a pathway P :
D
hasPathway
P
EA 3888 – University of Rennes 1
Alignment of GO processes and KO diseases
GO
Biological
Processes
KO
Pathways
Diseases
1
hasPathway
2
GO id
Genes
GOA
KEGG mapping
(KEGG geneId - Uniprot id)
Uniprot id
EA 3888 – University of Rennes 1
SNOMED
Diseases
Alignment of KO pathways and KO diseases
GO
Biological
Processes
1
KO
Pathways
Diseases
2
SNOMED
Diseases
KO: Human diseases
KO: Neurodegenerative disorders
KO: Alzheimer's disease
KO: gene1
KO: gene2
KO: Metabolism
KO: Carbohydrate metabolism
KO: Glycolysis/Gluconeogenesis
KO: gene1
KO: gene3
EA 3888 – University of Rennes 1
hasPathway
Integration result
BioMed Ontology
13982 classes:
• 13555 classes from GO
• 281 classes from KO
- 252 pathways classes
- 19 disease classes
• 146 classes from SNOMED
EA 3888 – University of Rennes 1
Integration results
BioMed Ontology
• 144 KO pathways associated with GO processes (57%)
• 15 KO diseases associated with SNOMED Diseases (94%)
• 15 KO diseases associated with 836 distinct pathways (GO & KO)
 3144 hasPathway relationships
EA 3888 – University of Rennes 1
Querying the BioMed Ontology
Exploiting knowledge and checking consistency
• Taking into account the explicit relationships
• RDFS is sufficient
 RDF query language : SeRQL
• Implementation of SeRQL in Sesame is able to exploit
RDFS semantics
• Exploitation of explicit relationships
EA 3888 – University of Rennes 1
SeRQL queries
Example of an exploiting query
Which pathways are shared by 2 neurological disorders : glioma &
Alzheimer’s disease?
SELECT DISTINCT Pathway, label(PathwayName)
FROM
{kpath:ko05010} rdfs:subClassOf {SuperClass},
{SuperClass} rdf:type {owl:Restriction},
{SuperClass} owl:onProperty {ea3888hp:hasPathway},
{SuperClass} owl:someValuesFrom {Pathway},
{Pathway} rdfs:label {PathwayName}
INTERSECT
SELECT DISTINCT Pathway, label(PathwayName)
FROM
{kpath:ko05214} rdfs:subClassOf {SuperClass},
{SuperClass} rdf:type {owl:Restriction},
{SuperClass} owl:onProperty {ea3888hp:hasPathway},
{SuperClass} owl:someValuesFrom {Pathway},
{Pathway} rdfs:label {PathwayName}
EA 3888 – University of Rennes 1
Query results
Which pathways are shared by 2 neurological disorders :
glioma & Alzheimer’s disease?
 37 pathways:
MAPK signaling pathway
Focal adhesion
Insulin signaling pathway
Melanogenesis
B cell receptor signaling pathway
heart development
central nervous system development
axon guidance
peptidyl-serine phosphorylation
protein amino acid phosphorylation
cell cycle
cell-cell signaling
cell cycle arrest
lipid catabolic process
lipid metabolic process
ubiquitin cycle
transport
ErbB signaling pathway
Wnt signaling pathway
protein tetramerization
intracellular signaling cascade
protein modification process
glycogen metabolic process
anagen
induction of apoptosis
negative regulation of apoptosis
apoptosis
anti-apoptosis
Natural killer cell mediated cytotoxicity
cell proliferation
DNA replication
chromosome organization and biogenesis
calcium ion homeostasis
signal transduction
response to UV
negative regulation of cell growth
cytoskeleton organization and biogenesis
EA 3888 – University of Rennes 1
Query results
By leveraging the pathway hierarchy:
 66 pathways (37 + 29)
Alzheimer’s
disease
hasPathway
Intracellular protein
transport
Protein transport into
nucleus, translocation
EA 3888 – University of Rennes 1
hasPathway
Glioma
Query results
Example of a consistency query:
• Detect if a specific pathway and a more general one are
associated with a same disease
Disease1
hasPathway
Pathway1
hasPathway
Pathway2
Removal of redundant relationships
EA 3888 – University of Rennes 1
Conclusion
Biomed Ontology project
 Integration
• Automatic method of integration of biomedical ontologies
- Deals with the huge quantity of biomedical data
- Takes into account the frequent updates of biomedical sources
•
BioMed ontology
- Integrates 3 biomedical ontologies (KO, GO, SNOMED)
- Takes into account the formal evolution of the biomedical ontologies (OWL)
 Querying
• RDFS queries are enough:
- to detect some basic inconsistencies of the BioMed ontology
- to exploit the BioMed ontology
EA 3888 – University of Rennes 1
Perspectives
• Biological evaluation: study of glioma
• Increase the number of integrated biomedical sources (e.g.
OMIM, BioPax)
• Improve the mapping/alignment techniques by taking into
account the semantics in the patterns
• Associate a degree of confidence to the Disease/Pathway
relationships (based for example on the GO evidence code)
EA 3888 – University of Rennes 1
BioMed ontology project :
http://www.ea3888.univ-rennes1.fr/biomed_ontology/
[email protected]
[email protected]
[email protected]
EA 3888 – University of Rennes 1