HCLS$$CSHALS2009$$Tutorial$Cheung

Download Report

Transcript HCLS$$CSHALS2009$$Tutorial$Cheung

BioRDF Overview and Update
By
Kei Cheung, Ph.D.
Yale Center for Medical Informatics
C-SHALS 2009, Boston, Massachusetts, February 25, 2009
BioRDF

Objectives





Enhance the HCLS KB
Increase the value and use of HCLS KB by
identifying scientific use case
Work on human-friendly user interface
Document and publish findings to help
accelerate/promote adoption of the Semantic Web
Participants

Universities, pharmaceutical companies, start-up
companies, government institutes, W3C, etc
BioRDF Activities/Tasks

Invited Talks


UMLS, NCBO, NIF, Biogateway, WikiNeuron, Gene Wiki, 3D Web Visualization,
BioSIOC/aTag, VoID
HCLS KB

Two instances of HCLS KB have been created

DERI (Virtuoso)


Free University in Berlin (Allegro Graph)


add receptors to the picture
aTags


SenseLab and TCMGeneDIT
Neuroscience use case


Neurocommons
Matthias Samwald, Kei Cheung
Query Federation

Kei Cheung, Rob Frost, Kingsley Idehen, Scott Marshall, Adrian Paschke , Eric
Prud’hommeaux, Matthias Samwald, Jun Zhao
Brain: Neuron and Synapse
Courtesy of NIDA
aTags
aTags




Very simple, generic way of expressing
biomedical statements
A short snippet of text + a list of ontology terms
used for describing the text
Using established vocabulary (SIOC, OBO
ontologies)
Encoded in RDFa (easy to embed in existing
HTML-based systems)
aTags
Transmitter T seems to activate receptor R
Receptor R is expressed in brain region B
Region B has strong axonal projections
into brain region B2
aTags

aTags will be created by



conversion of existing biomedical datasets
manual curation of data (highlight text snippet in
browser & click on del.icio.us – like bookmarklet)
Design philosophy: simplicity and practicality


Use existing resources
Play along with existing systems
(HTML content management, RDFa-enabled search
engines)
Query Federation
A Journey to Query Federation:
from SPARQL Endpoint to Linked Data

Application demo

Receptor explorer

Mismatch between Wikipedia and DBpedia

Comparison of Triplestores

Linked Data description and deployment

voiD

FeDeRate

URI
Receptor Explorer
Receptor
Genes
involved in
receptor
Publications about
gene
Clinical trials
referencing
publications
App
ESB/
SOA
VectorC Semantic Service Bus
HCLS KB
DERI
DBpedia
map
linkedct.org
Entrez
Gene
PubMed
Clinicaltrials.gov
SenseLa
b
Receptor
s
Bio2RDF
RDF
Wikipedia
PubMed
Copyright 2008 VectorC, LLC
Clinicaltrials.gov
web
sites
A Semantic Mismatch between Wikipedia and DBpedia
Wikipedia
DBpedia
Triplestore Comparison
Features
Virtuoso
Allegro Graph
Class Hierarchy
Inference
Linked Data
Deployment
Query Federation
Yes
Yes
Built-in support
3rd party software
(e.g., Pubby)
Built-in support
(Sesame and Oracle
only). For other
triplestores, a 3rd
party middleware
approach is required.
Linked Data Spaces
(SPARQL against
resource URI’s)
Federated Query
(FeDeRate)
Federated Query
FeDeRate
Local query 1
DBPedia
(RDF)
Local query 2
IUPHAR
(SQL)
Query
Mediation
Local query n
Federation Scenario
PREFIX db: <http://www.w3.org/2003/01/21-RDF-RDB-access/ns#SqlDB?properties=..%2Ftest%2F>
PREFIX re: <http://receptor.example/re#>
PREFIX dp: <http://receptor.example/dp#>
SELECT ?abstract ?code ?ligand ?hum_seq_id ?chr ?refseq
FROM NAMED db:IUPHAR.prop
FROM NAMED db:DBPedia.rdf
WHERE
{
# Get info from the (SQL) IUPHAR receptor tables.
GRAPH db:IUPHAR.prop {
?r
re:Code
?code .
?r
re:Ligand
?ligand .
?r
re:Human_nucleotide ?hum_seq_id }
# Get info from (RDF) DBPedia.
GRAPH db:DBPedia.rdf {
?p
dp:chromosome
?p dp:refseq
?p dp:symbol
?p db:abstract
}
?chr .
?refseq .
?symbol .
?abstract }
Example Join between IUAPHAR & DBPedia
(GABAB receptor)
IUPHAR
DBPedia
voiD: vocabulary of interlinked Datasets

Motivation
– Effective Dataset Selection
– Efficient Discovery of Datasets, by search engines or data
publishers
– SPARQL query optimisation and query federation
• Two high-level concepts
– Dataset: a dataset is published and maintained by a single provider
and accessible on the Web through de-referenceable URIs or a
SPARQL endpoint
– Linkset: a subset of a void:Dataset; store triples to express the
interlinking relationship between dataset
• voiD Vocabulary, http://rdfs.org/ns/void/html
• voiD User's Guide, http://rdfs.org/ns/void-guide
Biological Dataset in voiD Format
:senselabontology a void:Dataset ;
dcterms:title "SenseLab Neuron Ontology" ;
dcterms:description "Neuroscience ontology derived from the SenseLab NeuronDB
database.";
dcterms:license <> ;
# TODO
foaf:homepage
<http://neuroweb.med.yale.edu/senselab/> ;
void:exampleResource <http://purl.org/science/owl/sciencecommons/identified_by_pmid> ;
void:exampleResource <http://purl.org/ycmi/senselab/neuron_ontology.owl#has_Receptor> ;
void:exampleResource <http://purl.org/ycmi/senselab/neuron_ontology.owl#NMDA> ;
dcterms:creator :senselab ; ## this organization can be further defined
dcterms:source <http://purl.org/ycmi/senselab/neuron_ontology.owl#> ;
dcterms:subject <http://purl.org/ycmi/senselab/neuron_ontology.owl#Receptor> ;
dcterms:subject <http://dbpedia.org/resource/Receptor_(biochemistry)> ;
dcterms:subject <http://dbpedia.org/resource/Neurotransmitter_receptor> ;
dcterms:subject <http://dbpedia.org/resource/Sensory_receptor> ;
dcterms:source <doi:10.1093/bib/bbm018> ;
void:feature :owl ; ## this technical feature can be further defined
void:sparqlEndpoint <http://hcls.deri.org:8080/> ;
void:vocabulary <http://www.obofoundry.org/ro/ro.owl> .
voiD Deployment


Deploy a voiD file (in either Turtle, RDF/XML or RDFa
format) onto the Web server
Make it accessible to search engines, such as Sindice
(http://sindice.com/)

Publish a Semantic Sitemap file (sitemap.xml) on the server
“...... allows Data publishers to state where documents containing RDF data are
located, and to advertise alternative means to access it ......” [1]

Use the datasetURI property in the sitemap.xml to point to the
voiD description of a dataset, e.g.,
http://neuroweb.med.yale.edu/senselab/senselab-void.ttl#senselabontology
[1] http://sw.deri.org/2007/07/sitemapextension/
URI Issues

Proliferation of synonymous URI’s


http://dbpedia.org/resource/Dopamine_receptor
 http://purl.org/ycmi/senselab/neuron_ontology.owl#Dopaminergic_Receptor
Potential problems


Performance
 Maintenance
Possible solutions


Involvement of nomenclature committee (e.g., IUPHAR) and domain
authority (e.g., Neuroscience Information Framework or NIF)
 Persistent/permanent URI scheme (e.g., PURL)
 E.g., http://purl.org/nif/ontology/NIF-Molecule.owl#nifext_5832
Dereferenceable URI’s


A dereferenceable URI is a resource identification mechanism that uses the
HTTP protocol to obtain a representation of the resource it identifies
For Linked Data, the representation takes the form of an information
resource that describes the resource that the URI identifies.
Future Directions



Submit a paper describing the query federation work
to a journal or conference
Continue and extend current tasks: Query Federation
and aTag
Add new tasks


Expand the HCLS KB (both instances)


e.g., semantic wiki, workflow, user interface, …
e.g., new datasets such as UMLS
Collaborate with other task forces

e.g., LODD (natural alternative use case, Faviki) and
SWAN/SIOC
The End