CambridgeSemanticWebGatherings$$Meeting$$2008-11

Download Report

Transcript CambridgeSemanticWebGatherings$$Meeting$$2008-11

Introduction to the W3C for Semantic
Web and Life Sciences Interest Group
Eric Prud’hommeaux
What is the Mission of HCLS IG?
The mission of HCLS is to develop, advocate for, and
support the use of Semantic Web technologies for
biological science, translational medicine and health
care. These domains stand to gain tremendous benefit
by adoption of Semantic Web technologies, as they
depend on the interoperability of information from many
domains and processes for efficient decision support.
Task Forces
• Terminology – Semantic Web representation of existing resources
•
Task lead - John Madden
• BioRDF – integrated neuroscience knowledge base
•
Task lead - Kei Cheung
• Linking Open Drug Data – aggregation of Web-based drug data
•
Task lead - Chris Bizer
• Scientific Discourse – building communities through networking
•
Task leads - Tim Clark, John Breslin
• Clinical Observations Interoperability – patient recruitment in trials
•
Task lead - Vipul Kashyap
• Other Projects: Clinical Decision Support, URI Workshop,
Collaborations with CDISC & HL7
Terminology: Overview
• Goal is to identify use cases and methods for extracting
Semantic Web representations from existing, standard
medical record terminologies, e.g. UMLS
• Methods should be reproducible and, to the extent
possible, not lossy
• Identify and document issues along the way related to
identification schemes, expressiveness of the relevant
languages
• Initial effort will start with SNOMED-CT and UMLS
Semantic Networks and focus on a particular subdomain (e.g. pharmacological classification)
BioRDF: Answering Questions
Goals: Get answers to questions posed to a body of
collective knowledge in an effective way
Knowledge used: Publicly available databases, and text
mining
Strategy: Integrate knowledge using careful modeling,
exploiting Semantic Web standards and technologies
BioRDF: Looking for Targets for Alzheimer’s
• Signal transduction pathways are
considered to be rich in “druggable”
targets
• CA1 Pyramidal Neurons are
known to be particularly damaged
in Alzheimer’s disease
• Casting a wide net, can we find
candidate genes known to be
involved in signal transduction and
active in Pyramidal Neurons?
Source: Alan Ruttenberg
BioRDF: Integrating Heterogeneous Data
PDSPki
Gene
Ontology
NeuronDB
Reactome
BAMS
Antibodies
Entrez
Gene
Allen Brain
Atlas
MESH
Literature
Mammalian
Phenotype
SWAN
AlzGene
BrainPharm
PubChem
Homologene
Source: Susie Stephens
BioRDF: SPARQL Query
Source: Alan Ruttenberg
BioRDF: Results: Genes, Processes
DRD1, 1812
ADRB2, 154
ADRB2, 154
DRD1IP, 50632
DRD1, 1812
DRD2, 1813
GRM7, 2917
GNG3, 2785
GNG12, 55970
DRD2, 1813
ADRB2, 154
CALM3, 808
HTR2A, 3356
DRD1, 1812
SSTR5, 6755
MTNR1A, 4543
CNR2, 1269
HTR6, 3362
GRIK2, 2898
GRIN1, 2902
GRIN2A, 2903
GRIN2B, 2904
ADAM10, 102
GRM7, 2917
LRP1, 4035
ADAM10, 102
ASCL1, 429
HTR2A, 3356
ADRB2, 154
PTPRG, 5793
EPHA4, 2043
NRTN, 4902
CTNND1, 1500
adenylate cyclase activation
adenylate cyclase activation
arrestin mediated desensitization of G-protein coupled receptor protein signaling pathway
dopamine receptor signaling pathway
dopamine receptor, adenylate cyclase activating pathway
dopamine receptor, adenylate cyclase inhibiting pathway
G-protein coupled receptor protein signaling pathway
G-protein coupled receptor protein signaling pathway
G-protein coupled receptor protein signaling pathway
G-protein coupled receptor protein signaling pathway
G-protein coupled receptor protein signaling pathway
G-protein coupled receptor protein signaling pathway
G-protein coupled receptor protein signaling pathway
G-protein signaling, coupled to cyclic nucleotide second messenger
G-protein signaling, coupled to cyclic nucleotide second messenger
G-protein signaling, coupled to cyclic nucleotide second messenger
G-protein signaling, coupled to cyclic nucleotide second messenger
G-protein signaling, coupled to cyclic nucleotide second messenger
glutamate signaling pathway
glutamate signaling pathway
glutamate signaling pathway
glutamate signaling pathway
integrin-mediated signaling pathway
negative regulation of adenylate cyclase activity
negative regulation of Wnt receptor signaling pathway
Notch receptor processing
Notch signaling pathway
serotonin receptor signaling pathway
transmembrane receptor protein tyrosine kinase activation (dimerization)
ransmembrane receptor protein tyrosine kinase signaling pathway
transmembrane receptor protein tyrosine kinase signaling pathway
transmembrane receptor protein tyrosine kinase signaling pathway
Wnt receptor signaling pathway
Many of the genes
are related to AD
through gamma
secretase
(presenilin) activity
Source: Alan Ruttenberg
LODD: Introduction
Use Semantic Web technologies to
1. publish structured data on the Web
2. set links between data from one data source to data within other data sources
Linked Data
Browsers
Linked Data
Mashups
Search
Engines
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
typed
links
A
typed
links
B
typed
links
C
typed
links
D
E
Source: Chris Bizer
LODD: Potential Links between Data Sets
Source: Chris Bizer
LODD: Data Set Evaluation
Source: Chris Bizer
LODD: Potential questions to answer
• Physicians and Pharmacists
• What are alternative drugs for a given indication (disease)?
• What are equivalent drugs (generic version of a brand name, or the
chemical name of a active ingredient)?
• Are there ongoing clinical trials for a drug?
• Patients
•
•
•
•
What background information is available about a drug?
What are the contraindications of a drug?
Which alternative drugs are available?
What are the results of clinical trials for a drug?
• Pharmaceutical Companies
•
•
What are other companies with drugs in similar areas?
Which companies have a similar therapeutic focus?
Source: Chris Bizer
LODD: Linked Version of ClinicalTrials.gov
• Total number of triples:
6,998,851
• Number of Trials:
61,920
• RDF links to other data
sources: 177,975
• Links to:
• DBpedia and YAGO
(from intervention and conditions)
• GeoNames (from locations)
• Bio2RDF.org's PubMed (from references)
Source: Chris Bizer
LODD: Mashing Clinical Trials and Geo
Classification
of Places
Geo
Coordinates
Source: Chris Bizer
Scientific Discourse: Overview
Source: Tim Clark
Scientific Discourse: Goals
• Provide a Semantic Web platform for scientific
discourse in biomedicine
•
Linked to
– key concepts, entities and knowledge
•
Specified
– by ontologies
•
Integrated with
– existing software tools
•
Useful to
– Web communities of working scientists
Source: Tim Clark
Scientific Discourse: Some Parameters
• Discourse categories: research questions, scientific assertions
or claims, hypotheses, comments and discussion, and evidence
• Biomedical categories: genes, proteins, antibodies, animal
models, laboratory protocols, biological processes, reagents,
disease classifications, user-generated tags, and bibliographic
references
• Driving biological project: cross-application of discoveries,
methods and reagents in stem cell, Alzheimer and Parkinson
disease research
• Informatics use cases: interoperability of web-based research
communities with (a) each other (b) key biomedical ontologies (c)
algorithms for bibliographic annotation and text mining (d) key
resources
Source: Tim Clark
Scientific Discourse: SWAN+SIOC
• SIOC
•
•
•
Represent activities and contributions of online communities
Integration with blogging, wiki and CMS software
Use of existing ontologies, e.g. FOAF, SKOS, DC
• SWAN
•
•
•
•
Represents scientific discourse (hypotheses, claims, evidence,
concepts, entities, citations)
Used to create the SWAN Alzheimer knowledge base
Active beta participation of 144 Alzheimer researchers
Ongoing integration into SCF Drupal toolkit
Source: Tim Clark
Scientific Discourse: SIOC Ontology
Source: John Breslin
Scientific Discourse: SWAN KB
Source: Tim Clark
COI: Bridging Bench to Bedside
• How can existing Electronic Health Records (EHR)
formats be reused for patient recruitment?
• Quasi standard formats for clinical data:
• HL7/RIM/DCM – healthcare delivery systems
• CDISC/SDTM – clinical trial systems
• How can we map across these formats?
• Can we ask questions in one format when the data is represented in
another format?
Source: Holger Stenzhorn
COI: Use Case
Pharmaceutical companies pay a lot to test drugs
Pharmaceutical companies express protocol in CDISC
-- precipitous gap –
Hospitals exchange information in HL7/RIM
Hospitals have relational databases
Source: Eric Prud’hommeaux
Inclusion Criteria
Type 2 diabetes on diet and exercise therapy or
monotherapy with metformin, insulin
secretagogue, or alpha-glucosidase inhibitors, or
a low-dose combination of these at 50%
maximal dose. Dosing is stable for 8 weeks prior
to randomization.
…
?patient takes meformin .
Source: Holger Stenzhorn
Exclusion Criteria
Use of warfarin (Coumadin), clopidogrel
(Plavix) or other anticoagulants.
…
?patient doesNotTake anticoagulant .
Source: Holger Stenzhorn
Criteria in SPARQL
?medication1 sdtm:subject ?patient ;
spl:activeIngredient ?ingredient1 .
?ingredient1 spl:classCode 6809 . #metformin
OPTIONAL {
?medication2 sdtm:subject ?patient ;
spl:activeIngredient ?ingredient2 .
?ingredient2 spl:classCode 11289 .
#anticoagulant
} FILTER (!BOUND(?medication2))
Source: Holger Stenzhorn
Getting Involved
• Benefits to getting involved include:
•
•
•
early access to use cases and best practice
influence standard recommends
cost effective exploration of new technology through collaboration
• Get involved by contacting the chairs:
•
[email protected]