SWTMedicalInformatics
Download
Report
Transcript SWTMedicalInformatics
Semantic Web Technologies: A
Paradigm for Medical Informatics
Chimezie Ogbuji (Owner, Metacognition LLC.)
http://metacognition.info/presentations/SWTMedicalInformatics.pdf
http://metacognition.info/presentations/SWTMedicalInformatics.ppt
Who I am
Circa 2001: Introduced to web standards and
Semantic Web technologies
2003-2011: Lead architect of CCF in-house
clinical repository project
2006-2011: Member representative of CCF in
World-wide Web Consortium (W3C)
◦ Editor of various standards and Semantic Web Health
Care and Life Sciences Interest Group chair
2011-2012: Senior Research Associate at CWRU
Center for Clinical Investigations
2012-current: Started business providing resource
and data management software for home
healthcare agencies (Metacognition LLC)
Medical Informatics Challenges
Semantic interoperability
◦ Exchange of data with common meaning between
sender and receiver
Most of the intended benefits of HIT depend
on interoperability between systems
Difficulties integrating patient record
systems with other information resources
are among the major issues hampering their
effectiveness
◦ Interoperability is a major goal for meaningful use
of Electronic Health Records (EHR)
Rodrigues et al. 2013; Kadry et al. 2010; Shortliffe and Cimino, 2006
Requirements and Solutions
Semantic interoperability requires:
◦ Structured data
◦ A common controlled vocabulary
Solutions emphasize the meaning of data
rather than how they are structured
◦ “Semantic” paradigms
Registries and Research DBs
Patient registries and clinical research
repositories capture data elements in a
uniform manner
The structure of the underlying data
needs to be able to evolve along with the
investigations they support
Thus, schema extensibility is important
Querying Interfaces
Standardized interfaces for querying
facilitate:
◦ Accessibility to clinical information systems
◦ Distributed querying of data from where they
reside
Requires:
◦ Semantically-equivalent data structures
Alternatively, data are centralized in data
warehouses
Austin et al. 2007, “Implementation of a query interface for a generic record server”
Biomedical Ontologies
Ontologies are artifacts that conceptualize a
domain as a taxonomy of classes and
constraints on relationships between their
members
Represented in a particular formalism
Increasingly adopted as a foundation for the
next generation of biomedical vocabularies
Construction involves representing a domain
of interest independent of behavior of
applications using an ontology
Important means towards achieving semantic
interoperability
Biomedical Ontology Communities
Prominent examples of adoption by life
science and healthcare terminology
communities:
◦ The Open Biological and Biomedical
Ontologies (OBO) Foundry
◦ Gene Ontology (GO)
◦ National Center for Biomedical Ontology
(NCBO) Bioportal
◦ International Health Terminology Standards
Development Organization (IHTSDO)
Semantic Web and Technologies
The Semantic Web is a vision of how the
existing infrastructure of the World-wide
Web (WWW) can be extended such
that machines can interpret the
meaning of data on it
Semantic Web technologies are the
standards and technologies that have
been developed to achieve the vision
An Analogy
(Technological) singularity is a theoretical
moment when artificial intelligence (AI)
will have progressed to a greater-thanhuman intelligence
Despite remaining in the realm of science
fiction, it has motivated many useful
developments along the way
◦ The use of ontologies for knowledge
representation and IBM Watson capabilities,
for example
Background: Graphs
Graphs are data structures comprising
nodes and edges that connect them
The edges can be directional
Either the nodes, the edges, or both can
be labeled
The labels provide meaning to the graphs
(edge labels in particular)
Node
edge
Node
Resource Description Framework
The Resource Description Framework
(RDF) is a graph-based knowledge
representation language for describing
resources
It’s edges are directional and both nodes
and edges are labeled
It uses Universal Resource Identifiers
(URI) for labeling
Foundation for Semantic Web
technologies
RDF: Continued
The edges are statements (triples) that go
from a subject to an object
Some objects are text values
Some subjects and objects can be left
unlabeled (Blank nodes)
◦ Anonymous resources: not important to label
them uniquely
The URI of the edge is the predicate
Predicates used together for a common
purpose are a vocabulary
Dr. X
author
treats
subject of record
Chime
full name
"Chimezie Ogbuji"
Subject: Dr. X (a URI)
Object: Chime
Predicate: treats
Vocabulary:
◦ treats, subject of record, author, and full name
RDF vocabularies
How meaning is interpreted from an RDF graph
There are vocabularies that constrain how
predicates are used
◦ Want a sense of treats where the subject is a clinician
and the object is a patient
There is a predicate relating resources to the
classes they are a member of (type)
There are vocabularies that define constraints on
class hierarchies
These comprise a basic RDF Schema (RDFS)
language
Represented as an RDF graph
Clinical Diagnosis
is a
Hypertension DX
Physician
is a
Person
type
type
Dr. X
subject of record
treats
is a
Patient
author
type
Chime
Ontologies for RDF
The Ontology Web Language (OWL) is
used to describe ontologies for RDF
graphs
More sophisticated constraints than RDFS
Commonly expressed as an RDF graph
Defines the meaning of RDF statements
through constraints:
◦ On their predicates
◦ On the classes the resources they relate
belong to
Clinical Diagnosis
is a
Hypertension DX
Physician
is a
Person
type
type
Dr. X
subject of record
treats
is a
Patient
author
type
Chime
Governed by OWL/ RDFS for domain
OWL Formats
Most common format for describing
ontologies
Distribution format of ontologies in the
NCBO BioPortal
SNOMED CT distributions include an
OWL representation
◦ RDF graphs can describe medical content in a
SNOMED CT-compliant way through the use of
this vocabulary
Validation and Deduction
OWL is based on a formal, mathematical
logic that can be used for validating the
structure of an ontology and RDF data
that conform to it (consistency checking)
Used to deduce additional RDF
statements implied by the meaning of a
given RDF graph (logical inference)
Logical reasoners are used for this
Inference
Can infer anatomical location from
SNOMED CT definitions
Hypertension DX
Systemic circulatory
system structure
type
type
finding site
Hypertension DX <-> 1201005 / “Benign essential hypertension (disorder)”
Querying RDF Graphs
SPARQL is the official query language for
RDF graphs
Comparable to relational query languages
◦ Primary difference: it queries RDF triples,
whereas SQL queries tables of arbitrary
dimensions
Includes various web protocols for querying
RDF graphs
Foundation of SPARQL is the triple pattern
(?clinician, treats, ?patient)
◦ ?clinician and ?patient are variables (like a
wildcard)
Which physicians have given essential hypertension diagnoses and to whom?
Hypertension DX
type
?physician
treats
author
?dx
subject of record
(?physician, author, ?dx)
(?physician, treats, ?patient)
(?dx, subject of record, ?patient)
(?dx, type, Hypertension DX)
?patient
?physician
?patient
?dx
Dr. X
Chime
…
SPARQL over Relational Data
Most common implementations convert
SPARQL to SQL and evaluate over:
◦ a relational databases designed for RDF
storage
◦ an existing relational database
There are products for both approaches
Former requires native storage of RDF
◦ Relational structure doesn’t change even as
RDF vocabulary does (schema extensibility)
Elliot et al. 2009, “A Complete Translation from SPARQL into Efficient SQL”
SPARQL over Existing Relation Data
“Virtual RDF view”
◦ Translation to SQL follows a given mapping
from existing relational structures to an RDF
vocabulary
◦ Allows non-disruptive evolution of existing
systems
◦ Well-suited as a standard querying interface
over clinical data repositories
◦ They can be queried as SPARQL, securely
over encrypted HTTP
Secure HTTP
Mapping and
Translation layer
Legacy / existing
applications
SQL
SPARQL
SQL
3rd party applications
Patient registry or
data repository
Relational
RDF (SNOMED CT perhaps)
Example: Cleveland Clinic
(SemanticDB)
Content repository and data production
system released in Jan. 2008
80 million (native) RDF statements
◦ Uses vocabulary from a patient record OWL
ontology for the registry
Based on
◦ Existing registry of heart surgery and CV
interventions
◦ 200,000 patient records
◦ Generating over 100 publications per year
Pierce et al. 2012, “SemanticDB: A Semantic Web Infrastructure for Clinical Research and Quality Reporting”
Cohort Identification
Interface developed in conjunction with
Cycorp
Leverage their logical reasoning system
(Cyc)
◦ Identifies cohorts using natural language (NL)
sentence fragments
◦ Converts fragments to SPARQL
◦ SPARQL is evaluated against RDF store
Example: Mayo Clinic (MCLSS)
Mayo Clinic Life Sciences System (MCLSS)
◦ Effort to represent Mayo Clinic EHR data as
RDF graphs
◦ Patient demographics, diagnoses, procedures,
lab results, and free-text notes
◦ Goal was to wrap MCLSS relational database
and expose as read-only, query-able RDF
graphs that conform to standard ontologies
◦ Virtual RDF view
Pathak et al. 2012, "Using Semantic Web Technologies for Cohort Identification from Electronic Health
Records for Clinical Research"
Example: Mayo Clinic (CEM)
Clinical Element Model (CEM)
◦ Represents logical structure of data in EHR
◦ Goal: translate CEM definitions into OWL and
patient (instance) data into conformant RDF
◦ Use tools (logical reasoners) to check
semantic consistency of the ontology, instance
data, and to extract new knowledge via
deduction
◦ Instance data validation:
correct number of linked components, value within
data range, existence of units, etc.
Tao et al. 2012, ”A semantic-web oriented representation of the clinical element model for secondary use
of electronic health records data"
Summary
Schema extensibility
◦ Use of RDF
Semantic Interoperability
◦ Domain modeling using OWL and RDFS
Standardized query interfaces
◦ Querying over SPARQL
Incremental, non-disruptive adoption
◦ Virtual RDF views
Main challenge: highly disruptive innovation