Ontologies - University of Connecticut

Download Report

Transcript Ontologies - University of Connecticut

Ontologies
CSE
5810
Prof. Steven A. Demurjian, Sr.
Computer Science & Engineering Department
The University of Connecticut
371 Fairfield Road, Box U-255
Storrs, CT 06269-2155
[email protected]
http://www.engr.uconn.edu/~steve
(860) 486 - 4818
ONTO-1
Motivation

CSE
5810

Ontologies – Biomedical and Clinical
 What are they?
 How are they Used?
What is Issue Facing Ontologies in Future?
 Each HIT System has its Own Ontology
 HIE Requires
 Integration of Patient Data
 Dealing with Semantic Differences (one EMR has
weight in lbs, one in kg)
 Reconciling Ontologies
– Each HIT System with Ontology for Same Info
– Ontology + Data Impacts Integration
– How do we Resolve Dramatic Differences?
ONTO-2
Placing Ontologies into Perspective

CSE

5810


Historical Evolution of WWW
Ontology
 Definition and Description
 RDF and OWL
Present Biomedical Ontology
Applications of Biomedical Ontologies
 Clinical Trials
 OASIS: Integration Technique
 Clinical Decision Support System
3
ONTO-3
Current Information Systems on WWW

CSE
5810


First Generation:
 Raw data which was pretty much hand-coded by
the user was published online
 For example, Static web pages
Second Generation:
 Dynamic content generation driven by MDA and
databases
 Machines generate the respective HTML
Third Generation: Semantic Web:
 Generating machine processable information
where the content is machine understandable,
enabling intelligent services such as information
brokers, search agents, information filters to
process domain related information.
ONTO-4
What other Advances have Taken Place?

CSE
5810

XML
 XML was designed to store and transport
data. XML was designed to be both human- and
machine-readable
 W3C recommended XML 1.0 on 2/10/1998
HTML5
 5th revision of html
 markup language used for structuring and
presenting content on the World Wide Web
 W3C published in October 2014
ONTO-5
What are Ontologies?

CSE
5810

Definition (from Philosophy) :
 Ontology is study of being or existence and forms
the basic subject matter of metaphysics. It seeks to
describe the basic categories and relationships of
being or existence to define entities and types of
entities within its framework.
Definition (from Computer Science):
 In Computer science , Ontology means
“specification of a conceptualization”.
It means “A data model that represents a set of
concepts within a domain and the relationships
between those concepts”.
ONTO-6
Advantages of Ontology

CSE
5810


Semantic way of representing knowledge of the
domain
Intelligent system can provide reasoning Systems to
make inferences within the Ontology
Two main Objectives
 Share the common structure of information
 Reuse the similar ontology in another domain
ONTO-7
Development of Ontology

CSE
5810




Determine the domain and Scope (Range) of the
knowledge
Look for an existing ontology in the similar domain
 Reuse without change (will it be possible?)
 Basis to evolve to domain-specific solution
Listing all of Terminologies or Concepts of domain
List all of classes and instances to be created in the
ontology
Create the properties which will relate these concepts
in the ontology
ONTO-8
Example of Ontology
CSE
5810
Wine
Australian Yellow
Tail
Individual
Class
Properties
Color
Yellow
Flavor
Delicate
Maker
Australia
German
ONTO-9
Parkinson’s Disease Management Ontology
CSE
5810
ONTO-10
Parkinson’s Disease Management Ontology
CSE
5810
ONTO-11
Parkinson’s Disease Management Ontology
CSE
5810
ONTO-12
Parkinson’s Treatment Ontology
CSE
5810
ONTO-13
Parkinson’s Treatment Ontology
CSE
5810
ONTO-14
Neurological-Disease Ontology
CSE
5810
ONTO-15
Neurological-Disease Ontology
CSE
5810
ONTO-16
Excerpt of Medical Condition Ontology
CSE
5810
ONTO-17
Patient Ontology
CSE
5810
ONTO-18
Skelton Ontology
CSE
5810
What is Phenotypic?
A phenotype is the composite of an
organism's observable characteristics or traits
ONTO-19
How do Ontologies Related to other Models?

CSE
5810
UML Model
Substance
Observation
Person
Name
Id:Integer
name: String
statusCode: String
effectiveTime:Date
repeatNumber: Int
Id:Integer
statusCode: String
name: String
value: String
Id: Integer
name: name
address: Address
bday: String
tel: String
family-name: String
given-name: String
prefix: String
suffix: String
Address
hasMedicalObservations
takesPrescribedMedication
Patient
Ethnicity: String
prefLang: String
race:String
Email: String
gender: String
getAllergies()
get_clinical_notes()
get_demographics()
get_medications()
get_immunizations()
Provider
deaNumber: String
npiNumber:String
Ethnicity: String
race:String
Email: String
gender: String
street: String
locality: String
region: String
country: String
ONTO-20
How do Ontologies Related to other Models?

Entity Relationship Diagram
CSE
5810
statusCode
value
Ethnicity
id
effectiveTime
id
prefLang
Observation
race
Patient
address
Substance
name
id
name
tel
effectiveTime
bday
statusCode
repeatNumber
Figure 3.3: Sample EHR Model in ERD.
ONTO-21
How do Ontologies Related to other Models?

CSE
5810
XML Schema
<xs:element name=“Patient">
<xs:element name=“Substance">
<xs:complexType>
<xs:complexType>
<xs:sequence>
<xs:sequence>
<xs:element name=“id" type="xs:integer"/>
<xs:element name=“id" type="xs:integer"/>
<xs:element name=“ethnicity" type="xs:string"/>
<xs:element name=“name" type="xs:string"/>
<xs:element name=“race" type="xs:string"/>
<xs:element name=“statusCode" type="xs:string"/>
……….
……….
<xs:element name=“tel" type=“xs:string"/>
<xs:element name=“repeatNumber" type=“xs:integer"/>
</xs:sequence>
</xs:sequence>
</xs:complexType>
</xs:complexType>
</xs:element>
</xs:element>
<xs:element name=“takesPrescribedMedication">
<xs:sequence>
<xs:element name=“Observation">
<xs:element ref =“Patient"/>
<xs:complexType>
<xs:element ref =“Substance"/>
<xs:sequence>
</xs:sequence>
<xs:element name=“id" type="xs:integer"/>
</xs:element>
<xs:element name=“name" type="xs:string"/>
<xs:element name=“hasMedicalObservation">
<xs:element name=“value" type="xs:string"/>
<xs:element name=“statusCode" type=“xs:string"/> <xs:sequence>
<xs:element ref =“Patient"/>
</xs:sequence>
<xs:element ref =“Observation"/>
</xs:complexType>
</xs:sequence>
</xs:element>
</xs:element>
ONTO-22
How do we Model Ontologies?

CSE
5810

Researchers proposed Semantic Web Stack
illustrating hierarchy of languages, where each layer
exploits and uses capabilities of the layers below
OWL and RDF belong the family of knowledge
representation language.
 RDF: Resource Description Framework
 http://www.w3.org/RDF/

OWL: Web Ontology Language
 http://www.w3.org/TR/owl-features/

RDF reminds of Semantic Networks which were
popular in 1970’s
ONTO-23
Introduction to RDF / OWL
CSE
5810
ONTO-24
RDF: Resource Description Framework

CSE
5810




RDF represents the knowledge in triples format:
Subject – Predicate – Object
For example,
Students – registerTo –
Classes
(Subject) (Predicate)
(Object)
One triple is RDF is referred as a statement
RDF is grammar based language has syntax similar to
XML
RDFS (RDF Schema) has syntax similar to RDF and
provide schema grammar to RDF. For example,
rdfs:Class, rdfs:subClassOf etc
ONTO-25
RDF: Resource Description Framework

CSE
5810
RDF syntax of the above example:
<rdfs:Class rdf:about="http://www.example.com/examle#Students"
rdfs:label="Students">
</rdfs:Class>
<rdfs:Class rdf:about="http://www.example.com/examle#Classes"
rdfs:label=“Classes">
</rdfs:Class>

All the concepts described in the RDF are identified
using an URI
 (ex. http://www.example.com/examle#Students).

RDF can be viewed as standardized framework for
providing metadata to domain concepts.
ONTO-26
OWL: Web Ontology Language

CSE
5810


OWL is placed on the top of the semantic web stack,
utilizing all the powerful features offered by the layers
below (RDF, RDFS, XML)
OWL design has been influenced by description logic
& knowledge representational paradigms
 SHIQ, Semantic Networks, Frames, SHOE,
DAML, OIL, DAML+OIL.
OWL provides richer semantic capabilities than its
predecessor RDF
 For example, in the previous example, the
predicate registerTo is of type rdf:Property.
ONTO-27
OWL: Web Ontology Language

CSE
5810


OWL differentiates between properties by defining
 owl:ObjectProperty – for connecting two concepts
(registerTo) and
 owl:DatatypeProperty - for connecting a concept
to a datatype (utilized from XML)
These two properties inherit from RDF property
OWL also defines owl:AnnotationProperty for
embedding metadata onto classes, rules and axioms
The following slide illustrates the use of OWL, RDF
and RDFS ( taken from cardiac ontology build in
OWL using protégé tool)
ONTO-28
OWL: Web Ontology Language
<owl:Class rdf:ID="Veins">
<rdfs:subClassOf>
<owl:Class rdf:ID="Heart"/>
</rdfs:subClassOf>
</owl:Class>
<Veins rdf:ID="Pulmonary_Vein"/>
CSE
5810
Heart
Vein
Pulmonary
Vein

Pulmonary Vein is sub-class of Vein which is subclass of Heart.

The next slide illustrates the OWL properties and
expressive power of OWL to restrict the domain and
range values accepted by these properties.
BioMedical Informatics
ONTO-29
OWL: Web Ontology Language
<owl:ObjectProperty rdf:ID="Complications">
<rdfs:domain rdf:resource="#Cardiology_Diseases"/>
<rdfs:range>
<owl:Class>
<owl:unionOf rdf:parseType="Collection">
<owl:Class rdf:about="#Cardiology_Complications"/>
<owl:Class rdf:about="#Cardiology_Diseases"/>
<owl:Class rdf:about="#Cardiology_Causes"/>
</owl:unionOf>
</owl:Class>
</rdfs:range>
</owl:ObjectProperty>
CSE
5810


The object property “Complications” can take domain
values from class “Cardiology_Diseases” and range
values from combination of classes
OWL combined with RDF/RDFS provides an
environment for developing domain ontologies by
organizing and describing
the domain concepts
BioMedical Informatics
ONTO-30
Disease Ontology
CSE
5810
Instances of
Mitral_Valve_Disorders
Hierarchical organization of Cardiology Diseases
ONTO-31
Disease Ontology
CSE
5810
Property Defined
Representation of “Mitral_Valve_Prolapse” knowledge using properties
and instances
ONTO-32
Implemented Ontology in OWL Format
…………..
CSE
5810
<Congenital_Heart_Disease rdf:ID="Atrial_septal_defect">
<Complications>
<Cardiac_Arrhythmias rdf:ID="Arrhythmia">
<Has_Intervention
rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>defibrillation</Has_Intervention>
<Have_Symptoms>
<Cardiology_Symptoms rdf:ID="Dyspnea"/>
</Have_Symptoms>
<Has_Diagnosis_Test>
<Cardiology_Diagnosis_Test
rdf:ID="Coronary_Angiography">
<Has_Synonyms
rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>coronary catheterization </Has_Synonyms>
………………..
ONTO-33
Bio-Medical Ontologies

CSE
5810
Review a Wide Range of Available Ontologies and
Standards:
 OpenCyc
 WordNet
 Galen
 UMLS
 SNOMED – CT
 FMA
 Gene Ontology
ONTO-34
Sample EHR Model in UML via HL7 CDA
CSE
5810
Vitals
Id: Integer
effectiveTime:
IVL_TS
Immunization
Procedure
Observation
Id:Integer
code: CD
statusCode: CS
effectiveTime: IVL_TS
product: CD
routeCode: CD
Id:Integer
code: CD
statusCode: CS
effectiveTime: IVL_TS
approachSiteCode:CD
targetSiteCode: CD
methodCode: CE
Id:Integer
statusCode: String
effectiveTime: IVL_TS
code: CD
value: ANY
targetSiteCode:CD
hasVitals
hasImmunizationRecords
hasMedicalObservations
perfomedProcedures
Visit
Id: Integer
pId: Integer
visitDate:Date
encounters
Substance
Administration
Patient
Provider
pId: Integer
providerId:
Integer
Person
Id: Integer
name: name
address: Address
bday: String
tel: String
Id:Integer
name: String
statusCode: CS
effectiveTime:IVL_TS
doseQuantity: IVL_PQ
routeCode:CE
repeatNumber:ANY
patientList
Name
Provider
Patient
deaNumber: String
npiNumber:String
Ethnicity: String
race:String
Email: String
gender: String
Ethnicity: String
prefLang: String
race:String
Email: String
gender: String
getAllergies()
get_clinical_notes()
get_demographics()
get_medications()
get_immunizations()
family-name: String
given-name: String
prefix: String
suffix: String
Address
street: String
locality: String
region: String
country: String
** CD, CE, CS, IVL_TS, ANY – HL7 CDA datatypes
ONTO-35
OWL Equivalent for Observation
<owl:Class rdf:Id=“IVL_TS”/>
CSE
5810
<owl:DatatypeProperty rdf:Id=“Low”/>
<owl:DatatypeProperty rdf:Id=“High”/>
<owl:DatatypeProperty rdf:Id=“width”/>
<owl:DatatypeProperty rdf:Id=“center”/>
<owl:DatatypeProperty rdf:Id=“lowClosed”/>
<owl:DatatypeProperty rdf:Id=“highClosed”/>
</owl:Class>
<owl:Class rdf:Id=“Observation”/>
<owl:DatatypeProperty rdf:Id=“id”/>
<owl:DatatypeProperty rdf:Id=“hasStatusCode”/>
<owl:Attribute rdf:Id=“hasEffectiveTime”/>
<owl:Attribute rdf:Id=“hasCode”/>
<owl:Attribute rdf:Id=“hasValue”/>
<owl:Attribute rdf:Id=“hasTargetSite”/>
</owl:Class>
ONTO-36
OWL Equivalent for Observation
<owl:Class rdf:Id=“CD”/>
CSE
5810
<owl:Attribute rdf:Id=“text”/>
<owl:DatatypeProperty rdf:Id=“code”/>
<owl:Attribute rdf:Id=“hasEffectiveTime”/>
<owl:DatatypeProperty rdf:Id=“codeSystem”/>
<owl:Domain rdf:Id=“Observation”/>
<owl:Range rdf:Id=“IVL_TS”/>
<owl:DatatypeProperty rdf:Id=“codeSystemName”/>
<owl:Attribute/>
<owl:DatatypeProperty rdf:Id=“codeSysteVersion”/> <owl:Attribute rdf:Id=“hasEffectiveTime”/>
<owl:Domain rdf:Id=“Observation”/>
<owl:DatatypeProperty rdf:Id=“displayName”/>
<owl:Range rdf:Id=“IVL_TS”/>
</owl:Class>
<owl:Attribute/>
<owl:Attribute rdf:Id=“hasCode”/>
<owl:Domain rdf:Id=“Observation”/>
<owl:Range rdf:Id=“CD”/>
<owl:Attribute/>
<owl:Attribute rdf:Id=“hasValue”/>
<owl:Domain rdf:Id=“Observation”/>
<owl:Range rdf:Id=“ANY”/>
<owl:Attribute/>
<owl:Attribute rdf:Id=“hasTargetSiteCode”/>
<owl:Domain rdf:Id=“Observation”/>
<owl:Range rdf:Id=“CD”/>
<owl:Attribute/>
ONTO-37
Sample OWL Ontology Model
CSE
5810
….
….
(b) Test Ontology Model
….
(a) Diagnosis Ontology Model
Class
Attribute
Association
(c) Anatomy Ontology Model
Datatype Attribute
ONTO-38
Ontology Example: Open Cyc

CSE
5810

Open Cyc is an Upper level ontology developed by
Cycorp Inc.
Open Cyc has 60,000 hand coded assertions that
capture “common sense language”, so that AI
algorithms can perform human like reasoning and
contains 6,000 concepts
ONTO-39
Example of Open Cyc
CSE
5810
ONTO-40
Ontology Example: Word Net

CSE
5810
WordNet is an electronic lexical database developed at
Princeton University that serves as a resource for
applications in natural language processing and
information retrieval.
cancer, malignant neoplastic disease: any malignant growth or tumor caused by
abnormal and uncontrolled cell division; it may spread to other parts of the body
through the lymphatic system or the blood stream
Cancer, Crab: (astrology) a person who is born while the sun is in Cancer
Cancer: a small zodiacal constellation in the northern hemisphere; between Leo and
Gemini
Cancer, Cancer the Crab, Crab: the fourth sign of the zodiac; the sun is in this sign
from about June 21 to July 22
Cancer, genus Cancer: type genus of the family Cancridae
ONTO-41
Unifies Medical Language System

CSE
5810
UMLS was developed for National Library of
Medicine
Disease is semantic type
with around 392 relations
(109 semantic relations
and 22 other relations).
Pneumonia categorized
under one semantic type
Disease, but has
hundreds of relations.
ONTO-42
Example Ontology: SNOMED-CT

CSE
5810
SNOMED stands for Systemized Nomenclature Of
Medicine Clinical Terms. SNOMED-CT is the
result of merging two ontologies: SNOMED-RT and
Clinical Terms.
ONTO-43
Example Ontology: Clinical Trials

CSE
5810


Low participation in Clinical Trials is the major
problem in Clinical and translational research area.
Matching the patient records to clinical trials is
presently a manual procedure and its tedious.
Need a Semantic Bridge between Clinical Ontologies
(SNOMED CT, etc ..) and raw patient data for
 retrieving matching patient records, clinical
guidelines and clinical decision support systems (
CDSS).
ONTO-44
Technical Challenges

CSE
5810


Challenges to be faced during real time scenario:
 Knowledge Engineering.
 Scalability
 Noisy or Incomplete Data
Knowledge Engineering
 Clinical Ontology has the concept “Drug”, which
described active composition of the various drugs
 However, patient record contains name of vendorspecific drugs list
Clinical Ontology describe the cause of the disorder.
The patient records only specify the presence or
absence of the disorder and where was the clinical
test conducted.
ONTO-45
Architecture of Solution
CSE
5810
Clinical Trials
Patient
Data
SNOMED-CT
Query
Ontology
ABox
Reasoner
TBox
ONTO-46
Implementation Approach

CSE
5810




Mapping Patient Data Terminology to SNOMED-CT

Using UMLS as intermediate target.

NLP mapping techniques

Manual Mapping
Map the raw patient data to SNOMED-CT
terminology.
 Example: Cerner Drug: Lactulose Syrup 20G/30ml
 SNOMED-CT: administeredSubstance
Allow user to specify which terms in the definition to
be matched.
Last Bullet Means Ontology Matching NOT Fully
Automated!
This is a Real Problem for Interoperating Data!
ONTO-47
Contrast in Representation
CSE
5810
 Example:
 SNOMED-CT: Disease1
hasAgent Virus007
Infection due to Bacteria001
Infection due to MicroBacteria007
Patient Record: Disease1 Positive.
 As there is not much information in the patient
record the query reasoner cannot find the records
with partial data.
ONTO-48
How are Observations Reconciled?
CSE
5810
Clinical Trials
Description
NCT00084266
Patients with MSRA
NCT00288808
Patients with warfarin
NCT00298870
Patients on steroids
NCT00304382
Patients with Pneumonia,source
of Blood or Sputum
Э associatedObservation MRSA
Э associatedObservation
Pneumococcal Penumonia
П
Э hasSpecimanSource Blood Ц Sputum
ONTO-49
Clinical Decision Support System
CSE 
5810

Clinical Decision Support Systems (CDSS) are
 Interactive computer programs
 Designed to assist physicians and other health
professionals with decision making tasks
Components of CDSS:
 Knowledge Base
 Rule Based Engine
 Case Base
 Business Models
ONTO-50
Example of Usage of Rules
CSE
5810
IF
“ RULE 1” &“RULE 2” &“RULE 3” …..
“Rule n”
THEN
“INTERVENTION 1 or Rule M”
IF
p.getGender() = “male”
& p.getAge()=34 & p.getBP() <140 &
p.getInsulinLevel()<20
THEN
“ Asthma Intervention Level 2”
Class Patinet
HasGender “male” П hasAge
“34” П hasBP MoreThan 140 П
hasInsulinLevel MoreThan 20
ONTO-51
Ontology Integration

CSE
5810



All ontologies developed have a common aim,
describing the domain knowledge
Integration of ontologies is becoming very critical
 Applications tend to use multiple ontologies
 Concepts in the various ontologies overlap or
same concept is described in multiple ways.
For example, the concept “Blood” is described as
differently
 “Fluid” in one ontology
 “Substance” in another ontology
 “semi-solid” in a third ontology
Need to Reconcile these Differences When
Attempting to “Combine” data that Originates from
Different Ontologies
ONTO-52
Example of Conflicting Ontologies
•
Ontology 1:
Disease References
Symptoms which
References Treatments
Hierarchy of:

CSE
5810

•
•
•

Disease
• Respiratory Disease
• Cardio Disease
• Nervous Disease
Symptoms
• General Symptoms
• Behavioral Symptoms
Treatment
• General Treatment
• Surgical Treatments
•
Ontology 2:
 Symptoms References
Diseases which
References Treatments
 Hierarchy of:
•
•
•
Symptoms
• General Symptoms
• Behavioral Symptoms
Disease
• Respiratory Disease
• Cardio Disease
• Nervous Disease
Treatment
• General Treatment
• Surgical Treatments
Previously Discussed Issues:
 How do you Integrate Ontologies Across HIT to Support HIE
and Virtual Chart?
 How do you Merge Data Intensive Conflicting Ontologies?
 How do you query from Inside Out?
ONTO-53
Ontology Integration

CSE

5810
Semantics vs Structural Integration ?
Difficulties of integration arise with similar, same and
complementary ontology integration.
Ontology B
ONTO-54
OASIS

Ontology Mapping and Integration Framework
CSE
5810
ONTO-55
Summary - Ontologies

CSE
5810



Ontology
 Definition and Descriptions
 Many Examples in Practice
 OWL and RDF
Biomedical Ontology
 Open Cyc
 WordNet
 SNOMED - CT
Application of Biomedical Ontology
 Clinical Trials
 OASIS: Integration Technique
 Clinical Decision Support System
Integration of Ontologies
ONTO-56