Semantic Web Applications in Bioinformatics

Download Report

Transcript Semantic Web Applications in Bioinformatics

“Semantic Web” Applications
in Bioinformatics
Amr AL-Hossary
M.B.B.Ch
Agenda
• Web & Semantic Web
• RDF & DRF Schema
– Elements
– Schema
– Name space
– Queries
– Pain, Design Patters, And limits,
• Applications of SW in Bioinformatics
Web Today
• Documents for HUMANS
• Increasing dramatically
• Hard to process on semantic level
– e.g. searching for “give her a ring” doesn’t
return “engage her”.
• Solution (semantic web)
– annotating definitions
– abstract representation of classes & relations
What is the Semantic Web?
Myths about Semantic Web…
• is Top-Down
• needs ontologies at the beginning
• requires all information to be converted to
RDF
• must be centralized
• handles only binary relations
• requires the entire graph to exist on one
memory store
Truths
• A means of describing (a Web of) Data
• A system defining and incorporating
semantics
• A mechanism for making statements on
things
• A format for associating metadata
• A strategy for federating data systems
(with or without triplestore)
Needs
• shared definitions of knowledge
domains, i.e. ontologies,
• association of concepts to existing data,
• metadata information describing
information sources and contents,
• search tools able to make the best use of
this additional information.
RDF
• (Directed) graph data model
• Set of Binary relations (triples)
– Subject Predicate Object
• NOT like DBMS: Absence of a relation
does NOT mean it is not present.
• XML and RDF/OWL are inherently
different
– XML = thesaurus document structure
– RDF = thesaurus document content
Recombinant Data Space
• RDF is about Graphs rather than
statements
• Separate Graphs can be merged easily
into one aggregate graph
• Graphs can be filtered and pivoted,
without losing meaning
Recombinant Data Space
Medicine
Zaynab
Sun of
Studied
Amr
Amr
Ali
Amr
Works in
NU
NU
Works in
Abolhouda
Recombinant Data Space
Medicine
Zaynab
Amr
NU
Abolhouda
Ali
Recombinant Data Space
Medicine
Amr
NU
Abolhouda
RDF Elements
•
•
•
•
•
Resources R
Properties P
Literal Values L
Assertions "R P L" or "R P R"
Namespaces
RDF Schema
• RDF Schema (RDFS) is a vocabulary to create
vocabularies...
– Comparable to XML Schema or XML DTD
– Used to standardize which “tags” the creator of a
graph is allowed to use for annotating resources
• Introduces notions such as "Class" and
"Subclass„
• Helps define which relations a resource of a
certain type may have
RDFS Namespace Elements
• X rdf:type rdfs:class
– denotes that resource X is a class
• R rdf:type rdf:Property
– denotes that resource R is a property
• R rdfs:domain X
– denotes that the subject of R must be an X
• R rdfs:range Y
– denotes that the object of R must be a Y
Cited from http://www.nettab.org/2007/slides/Tutorial_Stoermer.pdf
Query Languages
• RDQL
• SERQL
• SPARQL (upcoming W3C Standard)
SPARQL Example
PREFIX nettab
<http://www.nettab.org/tutorial-ns#>
SELECT ?x ?y ?z
WHERE { ?x nettab:givesTalk ?z }
Matching triple:
Subject:
http://www.nettab.org/tutorial-ns#hst
Predicate:
http://www.nettab.org/tutorial-ns#givesTalk
Object:
http://www.know-who.net/talks/nettab.ppt
Cited from http://www.nettab.org/2007/slides/Tutorial_Stoermer.pdf
RDF Limitation (& Design Patterns)
• N-ary Relations
– It understands only Binary ralations
– Amr Leads Pulse in 2009
• Exceptions
– Human RBCs are Unnucleated
Example Applications of
Semantic Web in Bioinformatics
Vocabularies (Thesauri)
• Example Thesauri in Medicine
– UMLS
– SNOMED
– MESH
– Galen
OKKAM (for ENS)
Addison’s disease in medical
vocabularies
• Synonyms
–
–
–
–
–
–
–
–
Addisonian syndrome
Bronzed disease
Addison melanoderma
Asthenia pigmentosa
Primary adrenal deficiency
Primary adrenal insufficiency
Primary adrenocortical insufficiency
Chronic adrenocortical insufficiency
Eponym
Symptoms
Clinical
Varieties
Disease/Diagnosis
SNOMED International
Disease of Endocrine System
Disease of the Adrenal Gland
Addison’s Disease
Disease
MeSH
Endocrine Disease
Adrenal Gland Disease
Adrenal Gland Hypofunction
Addison’s Disease
AOD
Endocrine Disorder
Adrenal Disorder
Adrenal Cortical Disorder
Corticoadrenal insufficiency
Addison’s Disease
Read Codes
Endocrine Diseases
Disorder of Adrenal Gland
Hypoadrenalism
Adrenal Hypofunction
Corticoadrenal insufficiency
Addison’s Disease
Organizing concept
SNOMED
MeSH
ADO
Read Codes
UMLS
Endocrine Diseases
Adrenal Gland
Diseases
Adrenal Cortex
Diseases
Hypoadrenalism
Adrenal Gland
Hypofunction
Adrenal cortical
hypofunction
Addison’s Disease
UMLS vocabularies available in
RDF/OWL
• NCI Thesaurus (OWL)
– http://ncicb.nci.nih.gov/core/EVS
• Gene Ontology
– http://www.geneontology.org
• Repository of biomedical ontologies (OBO,
OWL)
– http://www.bioontology.org/ncbo/faces/index.xhtml
• User-defined Datatypes
–
–
–
–
Based on syntax used in Protégé
Semantics derived from XML Schema datatypes
For numbers: min, max, digits, fraction digits
For strings: length (min, max, equal), regular
• expression patterns
– Class (Teenager complete restriction (age
someValuesFrom (datatype(xsd:int
minInclusive(“13”^^xsd:int)
maxInclusive(“19”^^xsd:int)))))
Biological Pathway eXchange
(BioPAX)
• Represent:
– Metabolic pathways
– Signaling pathways
– Protein-protein, molecular interactions
– Gene regulatory pathways
– Genetic interactions
• Community effort: pathway databases
distribute pathway information in standard
format
cPath
• cPath is a database and software suite for
storing, visualizing, and analyzing biological
pathways
cPath Key Features
•
•
•
•
Identifier mapping system e.g. proteins
Scalable pathway data aggregation
Simple web interface for browse and query
Standard web service API for application
communication
• 100% open source
– Java, Tomcat, MySQL, Lucene, Struts, YUI
• Local installation and customization
iHOP (information Hyperlink Over Protein)
Adding value via text mining
Pathway Commons
A Genome – Phenome
Integrated Approach for
Mining Disease-Causal Genes
using Semantic Web
Gudivada Ranga Chandra
Email : [email protected]
Department of Biomedical Engineering/University
of Cincinnati
Division of Biomedical Informatics/ Cincinnati
Children’s Hospital Medical Center
Questions?
References
•
RDF standard and technologies (presentation)
Heiko Stoermer, University of Trento, Italy
–
•
The Unified Medical Language System (UMLS) and the Semantic Web (presentation)
Olivier Bodenreider, National Library of Medicine, USA
–
•
http://cbio.mskcc.org/cpath/
iHOP (information Hyperlink Over Protein)
–
•
http://www.biopax.org/
cPath: Demo Site
–
•
http://www.nlm.nih.gov/research/umls/
Biological Pathway eXchange (BioPAX)
–
•
http://www.okkam.org/
Unified Medical Language System (UMLS)
–
•
http://www.nettab.org/2007/slides/SemanticWeb_Neumann.pdf
OKKAM web site
–
•
http://www.nettab.org/2007/slides/Tutorial_Bodenreider.pdf
Semantic Web for Health Care and Life Science Interest Group: A Vision for Advancing
Research Communities (presentation)
Eric Neumann, Teranode Co., USA.
–
•
http://www.nettab.org/2007/slides/Tutorial_Stoermer.pdf
http://www.ihop-net.org/UniPub/iHOP/
Pathway Commons
–
http://www.pathwaycommons.org/pc/