The challenge of making ontologies useful and usable

Download Report

Transcript The challenge of making ontologies useful and usable

The challenge of making
ontologies useful and usable
Alan Rector
School of Computer Science / Northwest Institute of Bio-Health Informatics
[email protected]
www.co-ode.org
www.clinical-escience.org
www.opengalen.org
Users face a complex
landscape
Inhabited by many tribes,
Each tribe in its own teepee
Description
logic
Complexity
theory
Fuzziness
Default
logics
KR Logics
Logic programming
Argumentation
Belief
Revision
Natural
Language
Bayesian
analysis
and what feel like class
divides
The
The chain
chain of
of value
theorem envy
Pure
Logician/
KR
Mathematician Researcher
Too neat Too
academic!
Doesn’t
understand!
Too neat Too
Too scruffy
too adacademic!
hoc
Doesn’t
Doesn’t
understand!
understand!
Knowledge
Engineer
Too scruffyToo nea! Too
too ad hocacademic!
Doesn’t Doesn’t
understand!
understand!
Application
Builder
Too neat Too
Too scruffy
too ad academic!
hoc
Doesn’tDoesn’t
understand!
understand!
Too scruffy
too ad hoc
Doesn’t
understand!
No one person can understand it all - must manage the chain
… but logicians are often
seen as policemen
Incomplete! Undecidable!
Higher order! No semantics!
… and seem to insist on
solving harder problems
than the user actually has
… often without examples of
why not
…or insist users understand
the solution space
And don’t come
back until you
have the
semantics clear
The semantics is
your job!
Meet users where
they are
[Deborah McGuinness, Stanford]
So what is an ontology?
Thesauri
Catalog/
ID
Terms/
glossary
Informal
Is-a
Gene Ontology
Mouse Anatomy
Frames
(properties)
Formal
Is-a
General
Logical
constraints
Disjointness,
Inverse, partof
Formal
instance Value
restrictions
Arom
EcoCyc
PharmGKB
TAMBIS
My definition of an ontology
• Short
version:
“a representation of the shared background knowledge for a
community”
• Long
version:
“an implementable model of the entities that need to be
understood in common in order for some group of software
systems and their users to function and communicate at the
level required for a set of tasks”
• ...
and
“it
doesn’t
make
the
coffee”
Just one of at least three components of a complete system
11
But what’s it for?
12
“Ontologies” in Information Systems
• What information systems can say and how “Models of Meaning”
– Mathematical theories - although usually weak ones
• evolved at the same time as Entity Relation and UML style modelling
• Managing Scalabilty / complexity - “Knowledge driven systems”
– Housekeeping tools for expert systems
• Organising complex collections of rules, forms, guidelines, ...
• Interoperability
– The common grounding information needed to achieve communication
– Standards and terminology
• Communication with users
– Document design decisions
• Testing and quality assurance
– sufficient constraints to know when it breaks
– Empower users to make changes safely
– ... but “They don’t make the coffee”
– just one component of the system / theory
8
The scaling problem:
The combinatorial explosion
•
It keeps happening!
–
“Simple” brute force
solutions do not scale up!
•Conditions x sites x modifiers x activity
x context
–Huge number of terms to author
–Software CHAOS
Actual
Combination of things to be done
& time to do each thing
•Terms and forms needed
–Increases exponentially
•Effort per term or form
–Must decrease to
compensate
•To give the effectiveness we
want
–Or might accept
Things to
build
The means: Logic as the clips for
“Conceptual Lego”
gene
hand
protein
extremity
polysacharide
body
cell
expression
chronic
Lung
acute
infection
inflammation
abnormal
normal
bacterium
deletion
polymorphism
ischaemic
virus
mucus
Logic as the clips for
“Conceptual Lego”
“SNPolymorphism of CFTRGene causing Defect in MembraneTransport of
Chloride Ion causing Increase in Viscosity of Mucus in CysticFibrosis…”
“Hand which is
anatomically
normal”
Species
Genes
Protein
Function
Gene in humans
Disease
Protein coded by
gene in humans
Build complex
representations from
modularised
primitives
Function of
Protein coded by
gene in humans
Disease caused by abnormality in
Function of
Protein coded by
gene in humans
and more forms
A conceptual Coat rack
Fractal tailoring of reusable
resources:
example of data collection forms for trials
Renin dependent
Hypertension at
St Stevens Hospitals
for the National
Hypertension Survey
•
Solution space
–
–
–
–
–
–
–
–
–
–
–
–
Ontologies
Information Models
Logics
Rules
Frames
Planners
Logic programming
Bayes nets
Decision theory
Fuzzy sets
Open / closed world
…
•Problem space
–Answer questions
–Advising on actions
–Hazard monitoring
–Creating forms
–Discovering resources
–Constraint actions
–Assess risk
–…
Problem space & solution space
Problem
space
Guidelines, Patterns,
Tools
Solution
space
`
`
Matching problems and solutions is
worthwhile sciencen & craft
• Patterns, guidelines and tools
• Reformulations of users’ “solutions”
• Collaborations with behavioural scientists
• Challenges and demonstrations
Some observations…
Inter-rater variability
ART & ARCHITECTURE THESAURUS (AAT)
Domain:
art, architecture, decorative arts,
material culture
Content:
125,000 terms
Structure: 7 facets, 33 polyhierarchies
Associated concepts (beauty, freedom, socialism)
Physical attributes (red, round, waterlogged)
Style/Period (French, impressionist, surrealist)
Agents: (printmaker, architect, jockey)
Activities: (analysing, running, painting)
Materials (iron, clay, emulsifier)
Objects: (gun, house, painting, statue, arm)
Synonyms
Links to ‘associated’ terms
Access: lexical string match;
hierarchical view
And to real world problems
The Coding of Chocolate
An international conversion guide
SNOMED-CT
?



Term
CTV3
UbOVv
C-F0811 Bounty bar
C-F0816 Crème egg UbOW2
C-F0817
Kit Kat UbOW3
C-F0819 Mars Bar UbOW4
C-F081A Milky Way UbOW5
C-F081B
Smarties UbOW6
C-F081C
Twix
UbOW7
C-F0058
Snicker Ub1pT
Technology is improving
• Understanding of the Web stack is improving
• OWL is improving
– OWL 1.1
• but we are just beginning to learn how to use it
• Tools are improving
– Protege4, NEON, ...
• Applications are happening
– In Bioinformatics
– In Health Informatics
• Moore’s law is coming to the rescue
– We are crossing a critical threshold
... but for human issues we are just starting
almost ready to ask the important questions
28
Challenges
• Understanding problems
– From users’ perspective
– From value perspective
• Matching solutions to problems
– Solutions exist to solve problems
• ... solution designers exist to make better solutions
• Understanding misunderstandings
– Not trying to do the impossible
• Chocolate bars on the two sides of the Atlantic are different
• Improving the technology
– The dog just barely walks on its hind legs
• So
– What can we do?
– What’s it good for?
– Is it useful and usable?
29