HCLS$$CSHALS2009$$Tutorial$Cheung
Download
Report
Transcript HCLS$$CSHALS2009$$Tutorial$Cheung
BioRDF Overview and Update
By
Kei Cheung, Ph.D.
Yale Center for Medical Informatics
C-SHALS 2009, Boston, Massachusetts, February 25, 2009
BioRDF
Objectives
Enhance the HCLS KB
Increase the value and use of HCLS KB by
identifying scientific use case
Work on human-friendly user interface
Document and publish findings to help
accelerate/promote adoption of the Semantic Web
Participants
Universities, pharmaceutical companies, start-up
companies, government institutes, W3C, etc
BioRDF Activities/Tasks
Invited Talks
UMLS, NCBO, NIF, Biogateway, WikiNeuron, Gene Wiki, 3D Web Visualization,
BioSIOC/aTag, VoID
HCLS KB
Two instances of HCLS KB have been created
DERI (Virtuoso)
Free University in Berlin (Allegro Graph)
add receptors to the picture
aTags
SenseLab and TCMGeneDIT
Neuroscience use case
Neurocommons
Matthias Samwald, Kei Cheung
Query Federation
Kei Cheung, Rob Frost, Kingsley Idehen, Scott Marshall, Adrian Paschke , Eric
Prud’hommeaux, Matthias Samwald, Jun Zhao
Brain: Neuron and Synapse
Courtesy of NIDA
aTags
aTags
Very simple, generic way of expressing
biomedical statements
A short snippet of text + a list of ontology terms
used for describing the text
Using established vocabulary (SIOC, OBO
ontologies)
Encoded in RDFa (easy to embed in existing
HTML-based systems)
aTags
Transmitter T seems to activate receptor R
Receptor R is expressed in brain region B
Region B has strong axonal projections
into brain region B2
aTags
aTags will be created by
conversion of existing biomedical datasets
manual curation of data (highlight text snippet in
browser & click on del.icio.us – like bookmarklet)
Design philosophy: simplicity and practicality
Use existing resources
Play along with existing systems
(HTML content management, RDFa-enabled search
engines)
Query Federation
A Journey to Query Federation:
from SPARQL Endpoint to Linked Data
Application demo
Receptor explorer
Mismatch between Wikipedia and DBpedia
Comparison of Triplestores
Linked Data description and deployment
voiD
FeDeRate
URI
Receptor Explorer
Receptor
Genes
involved in
receptor
Publications about
gene
Clinical trials
referencing
publications
App
ESB/
SOA
VectorC Semantic Service Bus
HCLS KB
DERI
DBpedia
map
linkedct.org
Entrez
Gene
PubMed
Clinicaltrials.gov
SenseLa
b
Receptor
s
Bio2RDF
RDF
Wikipedia
PubMed
Copyright 2008 VectorC, LLC
Clinicaltrials.gov
web
sites
A Semantic Mismatch between Wikipedia and DBpedia
Wikipedia
DBpedia
Triplestore Comparison
Features
Virtuoso
Allegro Graph
Class Hierarchy
Inference
Linked Data
Deployment
Query Federation
Yes
Yes
Built-in support
3rd party software
(e.g., Pubby)
Built-in support
(Sesame and Oracle
only). For other
triplestores, a 3rd
party middleware
approach is required.
Linked Data Spaces
(SPARQL against
resource URI’s)
Federated Query
(FeDeRate)
Federated Query
FeDeRate
Local query 1
DBPedia
(RDF)
Local query 2
IUPHAR
(SQL)
Query
Mediation
Local query n
Federation Scenario
PREFIX db: <http://www.w3.org/2003/01/21-RDF-RDB-access/ns#SqlDB?properties=..%2Ftest%2F>
PREFIX re: <http://receptor.example/re#>
PREFIX dp: <http://receptor.example/dp#>
SELECT ?abstract ?code ?ligand ?hum_seq_id ?chr ?refseq
FROM NAMED db:IUPHAR.prop
FROM NAMED db:DBPedia.rdf
WHERE
{
# Get info from the (SQL) IUPHAR receptor tables.
GRAPH db:IUPHAR.prop {
?r
re:Code
?code .
?r
re:Ligand
?ligand .
?r
re:Human_nucleotide ?hum_seq_id }
# Get info from (RDF) DBPedia.
GRAPH db:DBPedia.rdf {
?p
dp:chromosome
?p dp:refseq
?p dp:symbol
?p db:abstract
}
?chr .
?refseq .
?symbol .
?abstract }
Example Join between IUAPHAR & DBPedia
(GABAB receptor)
IUPHAR
DBPedia
voiD: vocabulary of interlinked Datasets
Motivation
– Effective Dataset Selection
– Efficient Discovery of Datasets, by search engines or data
publishers
– SPARQL query optimisation and query federation
• Two high-level concepts
– Dataset: a dataset is published and maintained by a single provider
and accessible on the Web through de-referenceable URIs or a
SPARQL endpoint
– Linkset: a subset of a void:Dataset; store triples to express the
interlinking relationship between dataset
• voiD Vocabulary, http://rdfs.org/ns/void/html
• voiD User's Guide, http://rdfs.org/ns/void-guide
Biological Dataset in voiD Format
:senselabontology a void:Dataset ;
dcterms:title "SenseLab Neuron Ontology" ;
dcterms:description "Neuroscience ontology derived from the SenseLab NeuronDB
database.";
dcterms:license <> ;
# TODO
foaf:homepage
<http://neuroweb.med.yale.edu/senselab/> ;
void:exampleResource <http://purl.org/science/owl/sciencecommons/identified_by_pmid> ;
void:exampleResource <http://purl.org/ycmi/senselab/neuron_ontology.owl#has_Receptor> ;
void:exampleResource <http://purl.org/ycmi/senselab/neuron_ontology.owl#NMDA> ;
dcterms:creator :senselab ; ## this organization can be further defined
dcterms:source <http://purl.org/ycmi/senselab/neuron_ontology.owl#> ;
dcterms:subject <http://purl.org/ycmi/senselab/neuron_ontology.owl#Receptor> ;
dcterms:subject <http://dbpedia.org/resource/Receptor_(biochemistry)> ;
dcterms:subject <http://dbpedia.org/resource/Neurotransmitter_receptor> ;
dcterms:subject <http://dbpedia.org/resource/Sensory_receptor> ;
dcterms:source <doi:10.1093/bib/bbm018> ;
void:feature :owl ; ## this technical feature can be further defined
void:sparqlEndpoint <http://hcls.deri.org:8080/> ;
void:vocabulary <http://www.obofoundry.org/ro/ro.owl> .
voiD Deployment
Deploy a voiD file (in either Turtle, RDF/XML or RDFa
format) onto the Web server
Make it accessible to search engines, such as Sindice
(http://sindice.com/)
Publish a Semantic Sitemap file (sitemap.xml) on the server
“...... allows Data publishers to state where documents containing RDF data are
located, and to advertise alternative means to access it ......” [1]
Use the datasetURI property in the sitemap.xml to point to the
voiD description of a dataset, e.g.,
http://neuroweb.med.yale.edu/senselab/senselab-void.ttl#senselabontology
[1] http://sw.deri.org/2007/07/sitemapextension/
URI Issues
Proliferation of synonymous URI’s
http://dbpedia.org/resource/Dopamine_receptor
http://purl.org/ycmi/senselab/neuron_ontology.owl#Dopaminergic_Receptor
Potential problems
Performance
Maintenance
Possible solutions
Involvement of nomenclature committee (e.g., IUPHAR) and domain
authority (e.g., Neuroscience Information Framework or NIF)
Persistent/permanent URI scheme (e.g., PURL)
E.g., http://purl.org/nif/ontology/NIF-Molecule.owl#nifext_5832
Dereferenceable URI’s
A dereferenceable URI is a resource identification mechanism that uses the
HTTP protocol to obtain a representation of the resource it identifies
For Linked Data, the representation takes the form of an information
resource that describes the resource that the URI identifies.
Future Directions
Submit a paper describing the query federation work
to a journal or conference
Continue and extend current tasks: Query Federation
and aTag
Add new tasks
Expand the HCLS KB (both instances)
e.g., semantic wiki, workflow, user interface, …
e.g., new datasets such as UMLS
Collaborate with other task forces
e.g., LODD (natural alternative use case, Faviki) and
SWAN/SIOC
The End