HCLS$$ISWC$$Workshop$$Abstracts$HCLS-WS-ISWC06
Download
Report
Transcript HCLS$$ISWC$$Workshop$$Abstracts$HCLS-WS-ISWC06
HCLS Workshop @ ISWC
Eric Neumann and Tonya Hongsermeier
University of Georgia, Nov 6, 2006
W3C Semantic Web
for HealthCare and Life Sciences Interest Group
Launched Nov 2005: http://www.w3.org/2001/sw/hcls
Co-chairs: Dr. Tonya Hongsermeier (Partners HealthCare); Eric Neumann (Teranode)
Chartered to develop and support the use of SW technologies and practices
to improve collaboration, research and development, and innovation
adoption in the of Health Care and Life Science domains
Based on a foundation of semantically rich specifications that support
process and information interoperability
HCLS Objectives:
Core vocabularies and ontologies to support cross-community data integration and
collaborative efforts
Guidelines and Best Practices for Resource Identification to support integrity and
version control
Better integration of Scientific Publication with people, data, software, publications,
and clinical trials
HCLS Philosophy
• Share use-cases, applications, demonstrations,
experiences
• Expose collections as RDF using public tools
• Develop (where appropriate) core vocabularies
for data integration
HCLS Activities
•
•
•
•
•
BioRDF - data + NLP as RDF
BioONT - ontology coordination
Adaptive Clinical Protocols and Pathways
Drug Safety and Efficacy
Scientific Publishing - evidence
management
Outline
•
•
•
•
Basic Informatics Challenges
Bench-to-Bedside Applications
What is the Semantic Web?
Current Activities… Case Studies
Drug Discovery
and Medicine
• Health
• Practice
• Safety
• Prevention
• Privacy
• Knowledge
Hygieia, G. Klimt
Data Expansion
Large Data Sets
Variables >> Samples
Many New
Data Types
Combine
Which Formats?
Where Information Advances are Most
Needed
Supporting Innovative Applications in R&D
Translational Medicine (Biomarkers)
Molecular Mechanisms (Systems)
Data Provenance, Rich Annotation
Clinical Information
eHealth Records, EDC, Clinical Submission Documents
Safety Information, Pharmacovigilance, Adverse Events,
Biomarker data
Standards
Central Data Sources
• Genomics, Diseases, Chemistry, Toxicology
MetaData
• Ontologies
• Vocabularies
The Big Picture -
Hard to understand from
just a few Points of View
Complete view tells a very different Story
Distributed Nature of R&D
Silos of Data…
Data Integration:
Biology Requirements
Papers
Disease
Proteins
Genes
Retention
Policy
Assays
Compounds
Audit
Trail
Curation
Tools
Ontology Experiment
New Regulatory Issues Confronting
Pharmaceuticals
Tox/Efficacy
ADME Optim
from Innovation or Stagnation, FDA Report March 2004
Translational Medicine in Drug R&D
Early
Middle
Late
Cellular
Systems
Human
In Vitro Studies
Animal Studies
Clinical Studies
Disease Models (Therapeutic Relevance)
Toxicities
Target/System Efficacy
$500K
$5M
$500M
Translational Research
• Improve communication between basic and clinical science so
that more therapeutic insights may be derived from new
scientific ideas - and vice versa.
• Testing of theories emerging from preclinical experimentation
on disease-affected human subjects.
• Information obtained from preliminary human experimentation
can be used to refine our understanding of the biological
principles underpinning the heterogeneity of human disease and
polymorphism(s).
• http://www.translational-medicine.com/info/about
Reference NIH Digital Roadmap activity
HCLS Framework:
Biomedical Research
Molecular, Cellular and Systems Biology/Physiology
Organism as an integrated an interacting network of genes, proteins and
biochemical reactions
Human body as a system of interacting organs
Molecular Cell Biology/Genomic and Proteomic Research
Gene Sequencing, Genotyping, Protein Structures
Cell Signaling and other Pathways
Biomarker Research
Discovery of genes and gene products that can be used to measure
disease progression or impacts of drug
Pharmaco-genomics
Impact of genetic inheritance on
Drug Discovery and Translational Research
Use of preclinical research to identify promising drug candidates
HCLS Framework:
Clinical Research
Clinical Trials
Determination of efficacy, impact and safety of drugs for
particular diseases
Pharmaco-vigilance/ADE Surveillance
Monitoring of impacts of drugs on patients, especially safety and
adverse event related information
Patient Cohort Identification and Management
Identifying patient cohorts for drug trials is a challenging task
Translational Research
Test theories emerging from pre-clinical experimentation on
disease affected human subjects
Development of EHRs/EMRs for both clinical research and
practice
Currently EHRs/EMRs focussed on clinical workflow processes
Re-using that information for clinical research and trials is a
challenging task
Ecosystem: Goal State
/* Need to expand this with Biomedical Research + Clinical Practice */
Biomedical Research
Clinial Practice
/* Need to expand this to include Healthcare and Biomedical Research
Players as well… Show an integrated picture with “continuous” information
flow */
What is the Semantic Web ?
It’s Semantic
Webs
It’s Text
Extraction
It’s AI
It’s
Web 2.0
It’s Data
Tracking
It’s a Global
Conspiracy
http://www.w3.org/2006/Talks/0125-hclsig-em/
It’s
Ontologies
The Current Web
What the computer sees:
“Dumb” links
No semantics - <a href> treated
just like <bold>
Minimal machine-processable
information
The Semantic Web
Machine-processable semantic
information
Semantic context published –
making the data more
informative to both
humans and machines
Understanding the Semantic Web
• Vision
Some day in the future…
Today-> describing data
Subject
• Core Concept: TRIPLES…
<Patient HB2122>
• Specifications
RDF, OWL, GRDDL Coming soon: SPARQL, RIF
• Applications
Data Aggregation: Recombinant Data
Statements: Annotating things
• Practices
Everything gets a URI…
New definition of Data Interoperability:
• DTA: Data Transit Authority
Property
<shows_sign>
Object
<Disease Pneumococcal_Meningitis>
Application Space :
Semantic Web Drug DD
Therapeutics
Critical Path
Chem Lib
manufacturing
NDA
Production
Genomics
Clinical
Studies
HTS
eADME
Biology
Compound
Opt
DMPK
genes
Patent
informatics
URI - A key element
•
•
•
•
Uniform Resource Identifier
Specification used in HTML, XML, and RDF-OWL
Fundamental to RDF: It IS the only valid SW identifier!
Two forms:
HTTP- http://biopax.org/pathway/kreb_cycle.owl
URN- urn:lsid:biopax.org:pathway:kreb_cycle
• Resolution
Mapping retrievable data to a URI
Does not mean getting everything known about a URI
Not clear how to best handle versioning
See Alan’s slides…
REST-fulness
• REST is a term coined by Roy Fielding to describe an
architecture style of networked systems. REST is an acronym
standing for Representational State Transfer.
http://www.molbio.org/gene (get gene list)
http://www.molbio.org/gene/hugsk3b (get gene info)
• Can REST == URI, and if so, when?
Yes, if we agree return function is identical to URI
resolution
• Issues:
Should it return RDF always? - standardized
Resolution is only a subset of services, how do we handle
non-resolution services: are these URI’s as well?
Opportunities for Semantics in HealthCare
Enhanced interoperability via:
Semantic Tagging
Grounding of concepts in Standardized Vocabularies
Complex Definitions
Semantics-based Observation Capture
Inference on Diseases
Phenotypes
Genetics
Mechanisms
Semantics-based Clinical Decision Support
Guided Data Interpretation
Guided Ordering
Semantics-based Knowledge Management
Data Semantics in the Life Sciences
Pathways,
Biomarkers
Publications
Publications + data
Image +
Text
Text
Data Items
Text + data
items
Histology Profiling
Data Items
genomics
Systems
Biology
Complex
Objects with
Categorical/
Taxonomic
Data Items
Gene expression
Categorical
Taxonomic
Data Items
Complex
Objects
Clinical Findings
Composite
Objects with
Embedded
“process”
Clinical trials
Unstructured
Data Types
Structured
and Complex Data Types
RDB => RDF
Virtualized RDF
XML => RDF (GRDDL)
QuickTime™ and a
(LZW) decompressor
ded to see this picture.
XSL
XML
RDF
GRDDL
RDFa: Bridging the Hypertext and
Semantic Webs
<div xmlns:cc="http://web.resource.org/cc/"
xmlns:dc=”http://purl.org/dc/1.1/”
about=”photo2.jpg”>
This photo was taken by
<span property=”dc:creator”>Ben Adida</span>
and is licensed under a
<a rel=”cc:license”
href=”http://cc.org/licenses/by/2.5/”>
Creative Commons License
</a>.
</div>
photo2.jpg
dc:creator
Ben Adida
cc:license
licenses/by/2.5/
Example:
Knowledge
Aggregation
Courtesy of
BG-Medicine
Case Study: Omics
ApoA1 …
… is produced by the Liver
… is expressed less in Atherosclerotic Liver
… is correlated with DKK1
… is cited regarding Tangier’s disease
… has Tx Reg elements like HNFR1
Subject Verb Object
Knowledge Mining using Semantic Web
“Gene Prioritization through Data
Fusion”
Aerts et al, 2006, Nature
Use of quantitative and qualitative
information for statistical ranking.
Can be used to identify novel
genes involved in diseases
Potential Linked Clinical Ontologies
Clinical Obs
Disease
Descriptions
SNOMED
Applications CDISC
ICD10
RCRIM
(HL7)
Clinical Trials Disease
Models
ontology
Mechanisms
Pathways
(BioPAX)
IRB
Tox
Extant ontologies
Genomics
Molecules
Under development
Bridge concept
Case Study: BioPAX (Pathways)
<bp:PATHWAYSTEP rdf:ID="xDshToXGSK3bPathwayStep">
<bp:next-step rdf:resource="#xGSK3bToBetaCateninPathwayStep"/>
<bp:step-interactions>
<bp:MODULATION rdf:ID="xDshToXGSK3b">
<bp:keft rdf:resource="#xDsh"/>
<bp:right rdf:resource="#xGSK-3beta"/>
<bp:participants rdf:resource="#xGSK-3beta"/>
<bp:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
Dishevelled to GSK3beta</bp:name>
<bp:direction rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction >
<bp:control-type rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
INHIBITION</bp: control-type >
<bp: participants rdf:resource="#xDsh"/>
</bp: MODULATION >
</bp: step-interactions >
</bp: PATHWAYSTEP >
Case Study: BioPAX (Pathways)
<bp:PATHWAYSTEP rdf:ID="xDshToXGSK3bPathwayStep">
Modulation
<bp:next-step rdf:resource="#xGSK3bToBetaCateninPathwayStep"/>
<bp:step-interactions>
<bp:MODULATION rdf:ID="xDshToXGSK3b">
<bp:keft rdf:resource="#xDsh"/>
<bp:right rdf:resource="#xGSK-3beta"/>
<bp:participants rdf:resource="#xGSK-3beta"/>
<bp:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
Dishevelled to GSK3beta</bp:name>
<bp:direction rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction >
<bp:control-type rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
INHIBITION</bp: control-type >
<drug:affectedBy rdf:resource=”http://pharma.com/cmpd/CHIR99102"/>
<bp: participants rdf:resource="#xDsh"/>
</bp: MODULATION >
</bp: step-interactions >
</bp: PATHWAYSTEP >
affectedBy
CHIR99102
Case Study: Drug Discovery
Dashboards
Dashboards and Project Reports
Next generation browsers for semantic information via Semantic
Lenses
Renders OWL-RDF, XML, and HTML documents
Lenses act as information aggregators and logic style-sheets
add { ls:TheraTopic
hs:classView:TopicView
}
Drug Discovery Dashboard
http://www.w3.org/2005/04/swls/BioDash
Topic: GSK3beta Topic
Disease: DiabetesT2
Alt Dis: Alzheimers
Target: GSK3beta
Cmpd: SB44121
CE: DBP
Team: GSK3 Team
Person: John
Related Set
Path: WNT
Bridging Chemistry and Molecular Biology
Semantic Lenses: Different Views of the same
data
BioPax
Components
Target Model
urn:lsid:uniprot.org:uniprot:P49841
Apply Correspondence Rule:
if ?target.xref.lsid == ?bpx:prot.xref.lsid
then ?target.correspondsTo.?bpx:prot
Bridging Chemistry and Molecular Biology
•Lenses can aggregate, accentuate,
or even analyze new result sets
• Behind the lens, the data can be
persistently stored as RDF-OWL
• Correspondence does not need
to mean “same descriptive
object”, but may mean objects
with identical references
Pathway Polymorphisms
•Merge directly onto
pathway graph
•Identify targets with
lowest chance of genetic
variance
•Predict parts of pathways
with highest functional
variability
•Map genetic influence to
potential pathway elements
•Select mechanisms of
action that are minimally
impacted by polymorphisms
Non-synonymous
polymorphisms
from db-SNP
BioRDF Neuro Tasks
• Aggregate facts and models around Parkinson’s
Disease
• BIRN / Human Brain Project
• SWAN: scientific annotations and evidence
• NeuroCommons
• Use RDF and OWL to describe
’Brain Connectivity'
N
euronal data in SenseLab
BioRDF: Reagents
RDF resources that describes various kinds of
experimental reagents, starting with antibodies:
Initial RDF that captures: Gene, the fact that this is an antibody,
various kinds of pages about the antibody, such as vendor
documentation, and any other properties that are explicitly captured
in the source material
Work with the Ontology task force to identify appropriate ontologies
and vocabularies to use in the RDF.
Write queries against the RDF to answer questions of the sort posed
on the Alzforum's
BioRDF: NCBI
• NCBI Data: URIs and as RDF (Olivier Bodensreider)
• Terminology Integration: NLM’s UMLS, MESH
SNOMED…
Conclusions:
Key Semantic Web Principles
1.
2.
3.
4.
5.
6.
7.
8.
Plan for change
Free data from the application that created it
Lower reliance on overly complex Middleware
The value in "as needed" data integration
Big wins come from many little ones
The power of links - network effect
Open-world, open solutions are cost effective
Importance of "Partial Understanding"