Linking levels of granularity and expressing contexts & views using

Download Report

Transcript Linking levels of granularity and expressing contexts & views using

Linking levels of granularity and expressing
contexts & views using formal ontologies:
Experience with the Digital Anatomist FMA &
other health & bio ontologies
Alan Rector
Medical Informatics Group, University of Manchester
www.cs.man.ac.uk/mig
www.opengalen.org
img.cs.man.ac.uk
mygrid.man.ac.uk
[email protected]
1
Manchester Medical Informatics Group
OpenGALEN
Messages
• Logic-based ontologies work to manage granularity, scale,
and context
– If you normalise (“untangle”) them
– And you make special provision for part-whole relations
• Developing formal ontologies is best done through
Intermediate Representations
– Formal ontologies are the “Assembly languages” of knowledge
representation
• Use specialised high level languages for applications – but define their semantics well
– Preserve work despite changes in underlying technology
– Focus on process not product
• “Just in time” ontologies
• Language (“terms”) must be separated from concepts
– Conflation  false problems
• Synonymy / homonymy etc. are linguistic issues
2
Manchester Medical Informatics Group
OpenGALEN
Why use Logic-based Ontologies?
because
Knowledge is Fractal!
&
Changeable!
3
Manchester Medical Informatics Group
OpenGALEN
Five Roles for Terminology/Ontologies
• “Software Engineering”
– Saying each thing in exactly one place
• In a way usable by computers
– “Coherence without uniformity”
– Evolvable
• Linkage between domains
–
–
–
–
–
Health and Bio Sciences
Macro, Micro, and Molecular scales
Contexts: Normal / abnormal; species; stage of development
Healthcare delivery and Clinical research
Patient Records and Decision Support
• Indexing Information
– Metadata and the semantic web
•www.semanticweb.org www.w3c.org
• Content of Databases and Patient Records
– Structural linkage within EPR/EHR & messages
– Content of EPR/EHR & messages
4
• Capturing information - the user interface
Manchester Medical Informatics Group
OpenGALEN
Logic based ontologies
• The descendants / partial formalisation of semantic nets, frame
systems, and object hierarchies via KL-ONE and KRL
• “is-kind-of” = “implies”
– “Dog is a kind of wolf”
means
“All dogs are wolves”
– Therefore logically computable
• Modern examples: OIL, DAML+OIL (“OWL”?)
– “OKBC meets Logic Based Ontologies”
– Underpinned by Description Logic Reasoners (FaCT, RACER, Cerebraolder GRAIL, K-REP)
• Others LOOM, CLASSIC, BACK, GRAIL,...
• www.ontoknowledge.org/oil www.semanticweb.org
• http://oiled.man.ac.uk
5
Manchester Medical Informatics Group
OpenGALEN
Logic-based Ontologies:
Conceptual Lego
gene
hand
protein
cell
extremity
expression
body
Lung
chronic
inflammation
acute
infection
bacterial
abnormal
deletion
normal
6
polymorphism
ischaemic
Manchester Medical Informatics Group
OpenGALEN
Logic-based Ontologies:
Conceptual Lego
“SNPolymorphism of CFTRGene causing Defect in MembraneTransport of ChlorideIon
causing Increase in Viscosity of Mucus in CysticFibrosis…”
“Hand which is
anatomically
normal”
7
Manchester Medical Informatics Group
OpenGALEN
What’s in a “Logic based ontology”?
• Primitive concepts - in a hierarchy
– Described but not defined
• Properties - relations between concepts
– Also in a hierarchy
• Descriptors - property-concept pairs
– qualified by “some”, “only”, “at least”, “at most”
• Defined concepts
– Made from primitive concepts and descriptors
• Axioms
– disjointness, further description of defined concepts
• A Reasoner
– to organise it for you
8
Manchester Medical Informatics Group
OpenGALEN
Logic Based Ontologies: A crash course
Primitives
Descriptions
Definitions
Reasoning
Validating
Thing
Feature
Structure
pathological
Heart
Thing
red
+ feature: pathological
MitralValve
Encrustation
* ALWAYS partOf: Heart
* ALWAYS feature: pathological
Structure
+ feature: pathological
+ involves: Heart
red
+ partOf: Heart
Encrustation
+ involves: MitralValve
9
+ (feature: pathological)
Manchester Medical Informatics Group
OpenGALEN
Bridging Bio and Health Informatics
• Define concepts with ‘pieces’ from different
scales and disciplines
– “Polymorphism which causes defect which causes disease”
10
Manchester Medical Informatics Group
OpenGALEN
Species
Bridging Scales
and context with
Ontologies
Genes
Protein
Function
Gene in Species
Disease
Protein coded by
gene in species
Function of
Protein coded by
gene in species
Disease caused by abnormality in
Function of
Protein coded by
gene in species
11
Manchester Medical Informatics Group
OpenGALEN
Using composition to express context
• Normal and abnormal
– Hand  isSubdivisionOf some UpperExtremity
– Hand & AnatomicallyNormal 
hasSubdivision exactly-5 fingers
• Homologies and Orthologies
– Thumb of Hand of Human 
hasFeature Opposable
– Thumb of Hand of NonHumanPrimate 
not hasFeature Opposable
12
Manchester Medical Informatics Group
OpenGALEN
Representing context and views by
variant properties
Organ
is_part_of
is_clinically_part_of
Pericardium
OrganPart
Heart
is_structurally_part_of
Disease of (is_part_of) Heart
CardiacValve
Disease of
Pericardium
13
Manchester Medical Informatics Group
OpenGALEN
The cost: Ontologies are not Thesauri
A Mixed Hierarchy (Not from Digital Anatomist)
organ
} kind
} part
} kind
} part
heart
heart valve
aortic valve
aortic valve cu sp
Works for navigation by humans
Works for “Disease of…’ and ‘Procedure on…’
Fails for “Surface of…”
How can the computer know the difference?
14
Manchester Medical Informatics Group
OpenGALEN
From a thesaurus to a logic-based ontology
Untangle part-whole and is-kind-of in anatomic ontology
Link Clinical Ontology with Anatomical ontology
Add rule that “Disorder of part  disorder of whole”
Reasoner can then create automatically:
A logic-based is-kind-of (subsumption) hierarchy
disorder of organ
disorder of heart
disorder of valve in heart
disorder of aortic valve in heart
disorder of cusp in aortic valve in heart
15
Manchester Medical Informatics Group
OpenGALEN
The Cost: Normalising (untangling) Ontologies
Structure
Structure
Part-whole
Function
Function
Part-whole
The Digital Anatomist FMA is very well untangled –
therefore a good starting point
16
Manchester Medical Informatics Group
OpenGALEN
The Cost: Normalising (untangling) Ontologies
Making each meaning explicit and separate
PhysSubstance
Protein
‘ProteinHormone
ProteinHormone’
Insulin
Enzyme
‘Enzyme’
Steroid
SteroidHormone
‘SteroidHormone’
‘Hormone’
Hormone
ProteinHormone^
‘ProteinHormone’
Insulin^
SteroidHormone^
‘SteroidHormone’
Catalyst
‘Catalyst’
‘Enzyme’
Enzyme^
...and helping keep argument
rational and meetings
short!
…
ActionRole
PhysiologicRole
HormoneRole
CatalystRole
…
Hormone
ProteinHormone
SteroidHormone
Catalyst
Enzyme
=
=
=
=
…
Substance
BodySubstance
Protein
Steroid
…
Substance & playsRole-HormoneRole
Protein & playsRole-HormoneRole
Steroid & playsRole-HormoneRole
Substance & playsRole CatalystRole
?=? Protein & playsRole-CatalystRole
17
Manchester Medical Informatics Group
OpenGALEN
Other benefits
• Limit combinatorial explosions
From “phrase book” to “dictionary + grammar”
Avoid the “exploding bicycle”
–
–
–
–
1980 - ICD-9 (E826) 8
1990 - READ-2 (T30..) 81
1995 - READ-3 87
1996 - ICD-10 (V10-19) 587
• V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle,
person on outside of vehicle, nontraffic accident, while working for income
– and meanwhile elsewhere in ICD-10
• W65.40 Drowning and submersion while in bath-tub, street and highway, while engaged
in sports activity
• X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or
engaging in other vital activities
18
Manchester Medical Informatics Group
OpenGALEN
Linking ontologies: to integrate or to Map:
Mapping Solution
Hypertension
CV
Diseases
...
Hypertension
(excluding
Pregnancy)
Diseases
of Pregnancy
Hypertension
in Pregnancy
Hypertension which
occurs-in pregnancy
...
19
Manchester Medical Informatics Group
OpenGALEN
Integrating rather than Cross Mapping
20
Manchester Medical Informatics Group
OpenGALEN
Special Requirements for Anatomy
Parts and Wholes
1. Disorders of the part are disorders of the whole,
– e.g. ‘Diseases of the aortic valve’ are ‘diseases of the heart’
– One solutions description logic solution: rewrite
• Disease of Heart 
Disease of (Heart OR Part of Heart)
• ‘SEP triples’ – Shulz and Hahn, U Freiburg
2. Subdivisions of layers are layers of subdivisions
– e.g. ‘The skin of the hand’ is a subdivision of
‘the skin of the upper extremity’
– No scalable DL solution known
•
Needs additional rules (Rousset) or extensive rewriting
– Intermediate representation essential
21
Manchester Medical Informatics Group
OpenGALEN
Fractal vs Bridgin principles
• Fractal
– Substances make up Structues
– Aggregations of things at one scale make up substances/structures
at the next
• E.g. Cells  Tissues; molecules  substances; …
– Processes act on Things or other processes
– …
• Local
– Atoms are bound in molecules
– Genes code for proteins (in many ways and variations)
22
Manchester Medical Informatics Group
OpenGALEN
Adding a High Level Language
Intermediate Representations
• Domain experts should work in a domain oriented
language
– A “View” for each application
• Enforcing standards
– Buffering differences
• Capture all of the information in one place
including
–
–
–
–
Metadata for provenance, editorial status, etc.
Comments for users
Comments for developers
Links to external knowledge sources
23
Manchester Medical Informatics Group
OpenGALEN
An Example from OpenGALEN
Simplicity for terminologists...
"Open fixation of a fracture of the neck of the left
femur"
MAIN fixing
ACTS_ON fracture
HAS_LOCATION neck of long bone
IS_PART_OF femur
HAS_LATERALITY left
HAS_APPROACH open
24
Manchester Medical Informatics Group
OpenGALEN
Simplicity for End Users...
Structured Data Entry
File
Edit
Help
FRACTURE SURGERY
Reduction
Fixation
Open
Open
Closed
Femur
Femur
Tibia
Fibula
Ankle
More...
Humerus
Radius
Ulna
Wrist
More...
Left
Left
Right
Shaft
Neck
Gt Troch
More...
25
Manchester Medical Informatics Group
OpenGALEN
…but with a solid, formal foundation
(that no one wants to see or work in)
(‘SurgicalProcess’ which
isMainlyCharacterisedBy (performance which
isEnactmentOf (‘SurgicalFixing’ which
hasSpecificSubprocess (‘SurgicalAccessing’
hasSurgicalOpenClosedness (SurgicalOpenClosedness which
hasAbsoluteState surgicallyOpen))
actsSpecificallyOn (PathologicalBodyStructure which <
involves Bone
hasUniqueAssociatedProcess FracturingProcess
hasSpecificLocation (Collum which
isSpecificSolidDivisionOf (Femur which
hasLeftRightSelector leftSelection))>))))
26
Manchester Medical Informatics Group
OpenGALEN
Experience of Intermediate
Representation
• Training time:
– Time to do independent work: 3 months  3 days + 3 days
• Productivity:
–
–
–
–
Raw: 50/day  150/day
Review: 50% of effort  10% of effort
Arguments: Many per cycle  rare
Evolution – updates for revision or schemas: Months  weeks or days
• Coupling
– Dependencies: High  Low
• Focus
– Technical and clinical/scientific: conflated  clearly separated
27
Manchester Medical Informatics Group
OpenGALEN
Language and Concepts
• Concepts – units of thought
– e.g. Tumour which is malignant
Tumour whether benign or malignant
– Can be similar or logically equivalent
• Operations are logical
• Terms – units of language
– e.g. “Neoplasm”
“Malignant tumour”
“Cancer”
– Can be synonyms, homonyms, metonyms, etc.
• Operations are lexical and linguistic
• Many arguments involve confusing the two
– Rector’s Law:
The length of argument inversely proportional to strength of evidence
28
Manchester Medical Informatics Group
OpenGALEN
It works for what it does
• Faster, more accurate development & updating
– Dutch experience: Cost cut by 70%
• Mostly by reducing unproductive arguments & committee meetings
– The French experience
• A practical way to agree
– Gene Ontology
–
• About 10% additional subsumptions in initial tests
myGrid
• Web service description/sepcification
• Reduces granularity of consensus required
– “Coherence without uniformity”
• Experience in
• GALEN
– Successful loosely coupled collaborations
• The Drug Ontology
– Untangled “forms and routes” in six weeks after 2 years prior frustration
with ‘simple’ methods
29
Manchester Medical Informatics Group
OpenGALEN
But it has limitations:
A specialised fragment of logic for a specialised task
• Not enough on its own – needs other tools for:
– Defaults & exceptions, meta models, constraint based reasoning, full first order
logic for Decision Support, quntities & units, complex spatio-temporal reasoning
…
– Best used as part of a larger environment
• As complement to frame system, e.g. Protégé?
• Limitations even within the paradigm
– Expressiveness costs computational complexity
• Scaling for complex tasks still being investigated
– Use in very large ontologies still developing
– Interactions of features a highly specialised topic
It doesn’t make the! Coffee!
30
Manchester Medical Informatics Group
OpenGALEN
Relevant Experience with Logic Based
Ontologies in Bio Health Applications
• Health informatics
– Surgical procedures and diseases
–
–
–
–
–
–
–
• Development of French National Classification & maintenance of Dutch
UK Drug ontology & HL7 forms and routes analysis
Gene Ontology Next Generation (GONG)
CLEF – integration and language engineering in Cancer Research
Analysis of UMLS (Shulz & Hahn, Freiburg)
MedSyndicate: Language engineering in biomedicine (Schulz & Hahn, Freiburg)
SNOMED RT/CT (CAP/Apelon)
Reprepresentation of Digital Anatomist FMA(Early experiments only)
• Bioinformatics
–
–
–
–
–
Tambis – information fusion and database mediation
myGrid – web services for bioinformatics
IRBAIN – Language engineering in bioinfomatics
Starch – Art history
Multiflora – Language Engineering and Cladistics in Botany
31
Manchester Medical Informatics Group
OpenGALEN
Summary: Logic based ontologies
• Work to manage granularity, scale, and context
– If you normalise (“untangle”) them
– And you make special provision for part-whole relations
• Are best developed through Intermediate Representations
– Formal ontologies are the “Assembly languages” of knowledge
representation
• Use specialised high level languages for applications – but define their semantics well
– Focus on Process rather than product
• “Just in time ontologies”
• Save time and make work more reliable
• Require separation of language (“terms”) from concepts
– Conflation  false problems
• Have limitations
– Need to be embedded in a broader environment
• Hybrid representation systems
32
Manchester Medical Informatics Group
OpenGALEN