tokyo_1 - Mitsu Okada laboratory, Keio University.

Download Report

Transcript tokyo_1 - Mitsu Okada laboratory, Keio University.

Ontology and Its Applications
Barry Smith
http://ontologist.com
OVERVIEW
Part I: A Brief Overview of Developments in
Ontology at the Borderlines of Philosophy
and Computation
Part II: Ontology and Biomedical Informatics
www.ifomis.org
2
IFOMIS
now part of
European Centre for Ontological Research,
Saarbrücken, Germany
www.ifomis.org
3
Institute for Formal Ontology and
Medical Information Science
16 staff
2 medical informaticians
1 neurologist
1 chemist
1 radiologist
2 computer scientists
9 philosophers
www.ifomis.org
4
The problem
Different communities of researchers use
different and often incompatible concepts /
categories in expressing the results of
their work
www.ifomis.org
5
Example: Medicine
blood is a tissue
blood is a body fluid
How to integrate competing conceptualizations?
www.ifomis.org
6
Example: Molecular Biology
GDB
Genome Database of Human Genome Project
GenBank
National Center for Biotechnology Information,
Washington DC
www.ifomis.org
7
What is a gene?
GDB: a gene is a DNA fragment that
can be transcribed and translated into
a protein
GenBank: a gene is a DNA region of
biological interest with a name and
that carries a genetic trait or
phenotype
www.ifomis.org
8
How to integrate competing
conceptualizations
for example across the granular divide
between medicine and molecular biology?
www.ifomis.org
9
Answer:
ONTOLOGY!
But what does “ontology” mean?
www.ifomis.org
10
Three senses of ‘ontology’
1. Philosophical sense:
Aristotle: an inventory of the types of entities and
relations in reality
Quine: an inventory of ontological commitments
2. Knowledge engineering sense: an ontology as
a consensus representation of the concepts
used in a given domain
3. Gene Ontology sense: a controlled vocabulary
for database annotation / indexing
www.ifomis.org
11
Two Communities
Reference Ontology Community: An ontology is
an inventory of the types of entities and
relations which exist in a given domain of
reality
KR Community: an ontology is a consensus
representation of the concepts used in a
given domain of discourse
www.ifomis.org
12
“Ontology” as used in KR / AI
had its roots in Quine’s doctrine of
ontological commitment and in the
‘internal metaphysics’ of
Carnap/Putnam
www.ifomis.org
13
Quineanism:
ontology is the study of the ontological
commitments or presuppositions
embodied in scientific theories
(or in the beliefs of those experts,
or in the databases of that company)
www.ifomis.org
14
Quineanism, too, faces the
integration problem
If an ontology is the set of ontological
commitments of a theory
how can we cope with questions pertaining to
the relations between the objects to which
different theories are committed?
Quine can tell us what there is
but can he tell us how it is related together?
www.ifomis.org
15
The problem of the unity of science
The logical positivist solution to this problem
addressed a world in which sciences are
identified with
printed texts
What if sciences are identified with
information systems
or with
the contents of websites?
www.ifomis.org
16
The Semantic Web Initiative
The Web is a vast edifice of
heterogeneous data sources
Needs the ability to query and integrate
across different and often incompatible
conceptual systems
www.ifomis.org
17
How resolve such incompatibilities and
make the various parts of the web
interoperable?
Enforce conceptual compatibility via
standardized taxonomies applied to websites
as meta-tags formulated within the framework
of a common web language like OWL
www.ifomis.org
18
Tim Berners Lee:
hyperlinked vocabularies, called
‘ontologies’ will be used by Web authors
‘to explicitly define their words and
concepts as they post their stuff online.
‘codes would let software "agents" analyze
the Web on our behalf, making smart
inferences that go far beyond the simple
linguistic analyses performed by today's
search engines.’
www.ifomis.org
19
A new silver bullet
www.ifomis.org
20
Metadata in Web commerce
agree on a metadata standard for washing
machines as concerns size, price, etc.
create machine-readable databases and
put them on the net
 consumers can query multiple sites
simultaneously
and search for highly specific, reliable,
context-sensitive results
www.ifomis.org
21
Metadata in science
agree on metadata standards for
molecules (genes, proteins, drugs), clinical
phenomena, therapies ...
create machine-readable databases and
put them on the net
 biomedical researchers can query
multiple sites simultaneously
and search for highly specific, reliable,
context-sensitive results
www.ifomis.org
22
A world of exhaustive,
reliable metadata
would be utopia
(Cary Doctorow)
www.ifomis.org
23
Problem 1: People lie
Cheating in assigning meta-tags can
confer benefits to the cheaters
Metadata exists in a competitive world.
Some people are crooks.
Some people are cranks.
www.ifomis.org
24
Semantic Web effort
thus far devoted primarily to developing
systems for standardized representation of
web pages and web processes
(= ontology of web typography)
not to the harder task of developing
ontologies
(reliable taxonomies, term hierarchies)
for the content of such web pages
www.ifomis.org
25
Problem 2: People are lazy
Half the pages on Geocities are called
“Please title this page”
www.ifomis.org
26
Problem 3: People are stupid
The vast majority of the Internet's users
(even those who are native speakers of
English)
cannot spell or punctuate
Will internet users learn to accurately tag
their information with whatever taxonomy
and syntax they're supposed to be using?
www.ifomis.org
27
even with correct XML-syntax:
<BUSINESS-CARD>
<FIRSTNAME>Jules</FIRSTNAME>
<LASTNAME>Deryck</LASTNAME>
<COMPANY>Newco</COMPANY>
<MEMBEROF>XTC Group</MEMBEROF>
<JOBTITLE>Business
Manager</JOBTITLE>
<TEL>+32(0)3.471.99.60</TEL>
<FAX>+32(0)3.891.99.65</FAX>
<GSM>+32(0)465.23.04.34</GSM>
<WEBSITE>www.newco.com</WEBSITE>
<ADDRESS>
<STREET>Dendersesteenweg 17
www.ifomis.org
28
</STREET>
errors still abound
Is "Jules" the
<BUSINESS-CARD>
<FIRSTNAME>Jules</FIRSTNAME>
first name of
<LASTNAME>Deryck</LASTNAME>
the person, or
<COMPANY>Newco</COMPANY>
of the
<MEMBEROF>XTC Group</MEMBEROF>
business<JOBTITLE>Business Manager</JOBTITLE>
<TEL>+32(0)3.471.99.60</TEL> card?
<FAX>+32(0)3.891.99.65</FAX>
<GSM>+32(0)465.23.04.34</GSM>
<WEBSITE>www.newco.com</WEBSITE>
<ADDRESS>
<STREET>Dendersesteenweg 17</STREET>
<ZIP>2630</ZIP>
www.ifomis.org
29
errors still abound
Is Jules or
<BUSINESS-CARD>
<FIRSTNAME>Jules</FIRSTNAME> Newco the
<LASTNAME>Deryck</LASTNAME> member of XTC
<COMPANY>Newco</COMPANY> Group?
<MEMBEROF>XTC Group</MEMBEROF>
<JOBTITLE>Business Manager</JOBTITLE>
<TEL>+32(0)3.471.99.60</TEL>
<FAX>+32(0)3.891.99.65</FAX>
<GSM>+32(0)465.23.04.34</GSM>
<WEBSITE>www.newco.com</WEBSITE>
<ADDRESS>
<STREET>Dendersesteenweg 17</STREET>
<ZIP>2630</ZIP>
<CITY>Aartselaar</CITY>
<COUNTRY>Belgium</COUNTRY>
</ADDRESS>
</BUSINESS-CARD>
www.ifomis.org
30
errors still abound
<BUSINESS-CARD>
<FIRSTNAME>Jules</FIRSTNAME>
<LASTNAME>Deryck</LASTNAME>
<COMPANY>Newco</COMPANY>
Do the phone
<MEMBEROF>XTC Group</MEMBEROF>
numbers and
<JOBTITLE>Business Manager</JOBTITLE>
<TEL>+32(0)3.471.99.60</TEL>
address belong
<FAX>+32(0)3.891.99.65</FAX>
<GSM>+32(0)465.23.04.34</GSM> to Jules or to the
business?
<WEBSITE>www.newco.com</WEBSITE>
<ADDRESS>
<STREET>Dendersesteenweg 17</STREET>
<ZIP>2630</ZIP>
<CITY>Aartselaar</CITY>
<COUNTRY>Belgium</COUNTRY>
</ADDRESS>
</BUSINESS-CARD>
www.ifomis.org
31
Problem 4: Building good
ontologies/standardized
taxonomies is very difficult
and the constraints imposed by OWL and
similar languages make the job even
harder
www.ifomis.org
32
Problem 5: Ontology Impedance
= semantic mismatch between ontologies
‘gene’ used in websites issued by
biotech companies involved in gene
patenting
medical researchers interested in role of
genes in predisposition to smoking
insurance companies
www.ifomis.org
33
Problem 6: The Concept Orientation
Tom Gruber: An ontology is a specification of
a conceptualization
Semantic Web: specify Tom’s, and Dick’s,
and Harry’s conceptualizations carefully,
ensure that all are formulated in a common
(XML-based) syntax
Presto: conceptualizations will somehow
become integrated
www.ifomis.org
34
even a world of
exhaustive, reliable
metadata
would not solve the problem of
integration
www.ifomis.org
35
expressing different systems of
concepts
in a common syntactic environment
does not resolve conceptual
incompatibilities
www.ifomis.org
36
different conceptualizations
www.ifomis.org
37
need not interconnect at all
www.ifomis.org
38
we cannot make incompatible
terminology-systems interconnect
just by looking at concepts,
or knowledge or language
www.ifomis.org
39
to decide which of a plurality of
competing conceptualizations to accept
we need some tertium quid
www.ifomis.org
40
we need, in other words,
to take the world itself into account
www.ifomis.org
41
Compare the way biologists resolve
disagreements as to whether they
mean the same thing by different
words:
by pointing to the objects in their lab
www.ifomis.org
42
www.ifomis.org
43
The Semantic Web
is a machine for creating syllogisms (Clay
Shirky)
Humans are mortal
Greeks are human
Therefore, Greeks are mortal
www.ifomis.org
44
Lewis Carroll
No interesting poems are unpopular among
people of real taste
No modern poetry is free from affectation
All your poems are on the subject of soapbubbles
No affected poetry is popular among people
of real taste
No ancient poetry is on the subject of soapbubbles
Therefore: All your poems are bad.
www.ifomis.org
45
the promise of the Semantic Web
it will improve all the areas of your life where
you currently use syllogisms
www.ifomis.org
46
Semantic Web
compatibility problems should be solved
automatically
(by machine)
Hence ontologies must be applications
running in real time
www.ifomis.org
47
Semantic Web methodology
Get syntax right first
(Conceptualism; weak expressive resource;
weak Description Logics – to ensure
computational tractability)
and integration of ‘concepts’ will take care of
itself
but only at the price of Procrustean
simplification
www.ifomis.org
48
IFOMIS methodology
Get ontology right first
(use powerful logic to develop ontology as
theory of reality
and solve tractability problems later)
only thus will we have some hope of
genuine integration across different
disciplines and data resources
www.ifomis.org
49
Belnap
“it is a good thing logicians were around
before computer scientists;
“if computer scientists had got there first,
then we wouldn’t have numbers
because arithmetic is undecidable”
www.ifomis.org
50
It is a good thing
philosophical ontology was around before
Description Logics, because otherwise
we would have only hierarchies of
concepts together with abstract
mathematical models
and no universals or instances in reality…
www.ifomis.org
51
Recall:
GDB: a gene is a DNA fragment that
can be transcribed and translated into
a protein
Genbank: a gene is a DNA region of
biological interest with a name and
that carries a genetic trait or
phenotype
www.ifomis.org
52
Ontology
‘fragment’, ‘region’, ‘name’, ‘carry’, ‘trait’,
‘type’
... ‘part’, ‘whole’, ‘function’, ‘inhere’,
‘substance’ …
are ontological terms in the sense of
traditional (philosophical) ontology
www.ifomis.org
53
The idea of a reference ontology
a theory of the kinds of entities existing in
reality and of the relations between them
www.ifomis.org
54
The Reference Ontology Community
IFOMIS (Saarbrücken)
Laboratories for Applied Ontology
(Trento/Rome, Turin)
Ontology Works (Baltimore)
Department of Biological Structure (Seattle)
Medical Ontology Research (Bethesda)
The Gene Ontology / Open Biological
Ontologies Consortium
www.ifomis.org
55
IFOMIS’s long-term goal
Build a robust high-level reference
ontology
THE WORLD’S FIRST
INDUSTRIAL-STRENGTH
PHILOSOPHY
as the basis for an ontologically coherent
unification of biomedical knowledge and
terminology
www.ifomis.org
56
Two upper-level ontologies
reference
BFO (Saarbrücken) – Basic Formal
Ontology
DOLCE (Trento/Rome)
www.ifomis.org
57
Aristotle
First ontologist
www.ifomis.org
58
Edmund Husserl
www.ifomis.org
59
Formal Ontology
term coined by Husserl
= the theory of those ontological structures
such as part-whole, universal-particular
which apply to all domains whatsoever
www.ifomis.org
60
Husserl’s
Logical Investigations¸1900/01
–Aristotelian theory of universals and
particulars
–theory of part and whole
–theory of ontological dependence
–the theory of boundaries and fusion
www.ifomis.org
61
Formal Ontology
contrasted with material or regional ontologies
(compare relation between pure and applied
mathematics)
Husserl’s idea:
If we can build a good formal ontology, this should
save time and effort in building reference
ontologies for each successive material domain
www.ifomis.org
62
In formal ontology
as in formal logic, we can grasp the
properties of given structures in such
a way as to establish in one go the
properties of all formally similar
structures
www.ifomis.org
63
Compare:
1) pure mathematics (theories of structures
such as order, set, function, mapping)
employed in every domain
2) applied mathematics, applications of
these theories = re-using the same
definitions, theorems, proofs in new
application domains
3) physical chemistry, biophysics, etc. =
adding detail
www.ifomis.org
64
Three levels of ontology
1) formal (top-level) ontology = ?????
biomedical ontology has nothing like the
technology of definitions, theorems and
proofs provided by pure mathematics
2) domain ontology
= UMLS Semantic Network, GO, GALEN CORE
3) terminology-based ontology
= UMLS, SNOMED-CT, GALEN, FMA
www.ifomis.org
65
www.ifomis.org
66
The Concept Orientation
An ontology is a consensus
representation of concepts
www.ifomis.org
67
‘concept’ runs together:
a) meaning shared in common by
synonymous terms
b) idea shared in common in the minds of
those who use these terms
c) universal, type, feature or property
shared in common by entities in the world
www.ifomis.org
68
There are more word meanings
than there are universals / types of
entities in reality
unicorn
devil
canceled workshop
prevented pregnancy
imagined mammal
fractured lip ...
www.ifomis.org
69
space of word
meanings
space of
universals
www.ifomis.org
70
space of word
meanings
space of
universals
www.ifomis.org
71
space of word
meanings
space of
universals
www.ifomis.org
72
space
space of
of word
word meanings
meanings
www.ifomis.org
73
if ontological relations are defined
across the whole space of word
meanings
rather than across the space of universals
instantiated in reality
then our tools for dealing with such relations
are blunted
www.ifomis.org
74
meningitis is_a disease of
the nervous system
is a statement about universals
in reality
www.ifomis.org
75
A is_a B =def.
‘A’ is narrower in meaning than ‘B’
unicorn is_a one-horned mammal
www.ifomis.org
76
The linguistic reading of ‘concept’
yields a smudgy view of reality, built out of
relations like:
‘synonymous_with’
‘associated_to’
www.ifomis.org
77
Fruit
SimilarTo
Vegetable
NarrowerThan
Orange
www.ifomis.org
SynonymWith
Apfelsine
Goble & Shadbolt
78
The concept-based approach
can provide some half-way coherent
treatment of is_a relations
www.ifomis.org
79
but it can’t cope at all with relations
like
part_of = def. composes, with one or more
other physical units, some larger whole
contains =def. is the receptacle for fluids or
other substances
www.ifomis.org
80
connected_to =def.
Directly attached to another
physical unit as tendons are
connected to muscles.
How can a meaning or concept
be directly attached to another
physical unit as tendons are
connected to muscles ?
www.ifomis.org
81
An example of the concept
orientation
Unified Medical Language System
(UMLS)
www.ifomis.org
82
UMLS Metathesaurus:
1 million biomedical concepts
2.8 million concept names
from more than 100 controlled vocabularies
and classifications
built by US National Library of Medicine
www.ifomis.org
83
UMLS Source Vocabularies
MeSH – Medical Subject Headings
…
ICD International Classification of Diseases
…
GO – Gene Ontology
…
FMA – Foundational Model of Anatomy
…
www.ifomis.org
84
To reap the benefits of standardization
we need to make ONE SYSTEM out of
many different terminologies
=
UMLS “Semantic Network”
nearest thing to an “ontology” in the UMLS
www.ifomis.org
85
UMLS SN
described by its authors as “An
Upper Level Ontology for the
Biomedical Domain”
(Compare the Semantic Web
initiative)
www.ifomis.org
86
UMLS SN
134 Semantic Types
54 types of edges (relations)
yielding a graph containing more than 6,000
edges
www.ifomis.org
87
www.ifomis.org
Fragment of UMLS SN
88
www.ifomis.org
89
www.ifomis.org
90
UMLS SN Top Level
entity
physical
object
event
conceptual
entity
organism
www.ifomis.org
91
conceptual entity
Organism Attribute
Finding
Idea or Concept
Occupation or Discipline
Organization
Group
Group Attribute
Intellectual Product
Language
www.ifomis.org
92
conceptual
entity
idea or concept
functional concept
body system
www.ifomis.org
93
entity
physical
object
conceptual
entity
idea or concept
confusion of
entity and
concept
functional concept
body system
www.ifomis.org
94
Functional Concept:
Body system is_a Functional Concept.
but:
Concepts do not perform functions or have
physical parts.
www.ifomis.org
95
This:
is not a
concept
www.ifomis.org
96
Confusion of Ontology and Epistemology
Physical Object
Substance
Food
www.ifomis.org
Chemical
Body Substance
97
Confusion of Ontology and Epistemology
Chemical
Chemical
Viewed
Structurally
www.ifomis.org
Chemical
Viewed
Functionally
98
Chemical
Chemical
Viewed
Structurally
Inorganic Organic
Chemical Chemical
www.ifomis.org
Chemical
Viewed
Functionally
Enzyme
Biomedical or
Dental Material
99
Chemical
Chemical
Viewed
Structurally
Inorganic
Organic
Chemical Chemical
Chemical
Viewed
Functionally
Biomedical or
Dental Material
Enzyme
www.ifomis.org
100
The Hydraulic Equation
BP = CO*PVR
arterial blood pressure is directly
proportional to the product of blood flow
(cardiac output, CO) and peripheral
vascular resistance (PVR)
www.ifomis.org
101
Confusion of Ontology and Epistemology
blood pressure is an Organism Function,
cardiac output is a Laboratory or Test Result
or Diagnostic Procedure
BP = CO*PVR thus asserts that
blood pressure is proportional either to a
laboratory or test result or to a diagnostic
procedure
www.ifomis.org
102
www.ifomis.org
Fragment of UMLS SN
103
UMLS Semantic Network
anatomical abnormality associated_with
daily or recreational activity
educational activity associated with
pathologic function
bacterium causes experimental model of
disease
www.ifomis.org
104
www.ifomis.org
105
GO: the Gene Ontology
3 large telephone directories of
standardized designations for gene
functions and products
organized into hierarchies via is_a and
part_of
www.ifomis.org
106
When a gene is identified
three important types of questions need to
be addressed:
1. Where is it located in the cell?
2. What functions does it have on the
molecular level?
3. To what biological processes do these
functions contribute?
www.ifomis.org
107
GO’s three ontologies
biological
processes
molecular
functions
cellular
components
www.ifomis.org
108
GO is three ontologies
cellular components
molecular functions
biological processes
December 16, 2003:
1372 component terms
7271 function terms
8069 process terms
www.ifomis.org
109
The Cellular Component
Ontology (counterpart of anatomy)
flagellum
chromosome
membrane
cell wall
nucleus
www.ifomis.org
110
The Molecular Function Ontology
ice nucleation
protein stabilization
kinase activity
binding
The Molecular Function ontology is
(roughly) an ontology of actions on the
molecular level of granularity
www.ifomis.org
111
Biological Process Ontology
Examples:
glycolysis
death
adult walking behavior
response to blue light
= occurrents on the level of granularity of
cells, organs and whole organisms
www.ifomis.org
112
Each of GO’s ontologies
is organized in a graph-theoretical
structure involving two sorts of links or
edges:
is-a (= is a subtype of )
(copulation is-a biological process)
part-of
(cell wall part-of cell)
www.ifomis.org
113
www.ifomis.org
114
GO is species-independent
an ontology of the unchanging
universal building blocks of life
(substances and processes)
and of the structures they form
www.ifomis.org
115
www.ifomis.org
116
The Gene Ontology
error prone
in part because of its sloppy treatment of
relations
menopause part_of death
www.ifomis.org
117
www.ifomis.org
118
Primary aim of GO
not rigorous definition and principled
classification
but rather: providing a practically useful
framework for keeping track of the biological
annotations that are applied to gene products
www.ifomis.org
119
Problem’s with GO Molecular
Functions
anti-coagulant activity (defined as: “a
substance that retards or prevents
coagulation”)
enzyme activity (defined as: “a substance
that catalyzes”)
structural molecule (defined as: “the action
of a molecule that contributes to structural
integrity”)
www.ifomis.org
120
GO:0005199: structural
constituent of cell wall
Definition: The action of a molecule that
contributes to the structural integrity of a
cell wall.
confuses actions, which GO includes in its
function ontology, with constituents, which
GO includes in its cellular component
ontology
www.ifomis.org
121
www.ifomis.org
122
www.ifomis.org
123
cars
red cars
www.ifomis.org
Cadillacs
cars with radios
124
Why do these problems arise?
Because GO has no clear formal
understanding of the role of relations in
organizing an ontology
(thus also no clear understanding of the
difference between a function and the
activity which is the realization of a
function – GO runs these two together)
www.ifomis.org
125
Thesis
GO can realize its goal more adequately
(and avoid many coding errors) by taking
ontology (especially the logic of
classifications and definitions) seriously
www.ifomis.org
126
Digital Anatomist
Foundational Model of Anatomy
(Department of Biological Structure,
University of Washington, Seattle)
The first crack
in the wall of
the Concept Orientation
www.ifomis.org
127
www.ifomis.org
128
Anatomical
Structure
Anatomical Space
Organ Cavity
Subdivision
Organ
Cavity
Organ
is_a
Organ Part
Serous Sac
Cavity
Subdivision
Serous Sac
Cavity
Serous Sac
Organ
Component
Organ
Subdivision
Pleural Sac
Pleural
Cavity
Parietal
Pleura
Interlobar
recess
www.ifomis.org
Mediastinal
Pleura
Tissue
Pleura(Wall
of Sac)
Visceral
Pleura
Mesothelium
of Pleura
129
Pleural Sac
Pleural
Cavity
Parietal
Pleura
Interlobar
recess
Mediastinal
Pleura
Pleura(Wall
of Sac)
Visceral
Pleura
Mesothelium
of Pleura
Tissue
Cell
Organelle
www.ifomis.org
Reference Ontology
for Anatomy at every
level of granularity
130
The Gene Ontology
The second
crack
European Bioinformatics Institute, ...
in theOpen
wallsource
Transgranular
Cross-Species
Components, Processes, Functions
www.ifomis.org
131
But:
No logical structure
Viciously circular definitions
Poor rules for coding, definitions,
treatment of relations, classifications
so highly error-prone
www.ifomis.org
132
New GO / OBO Reform Effort
OBO = Open Biological Ontologies
www.ifomis.org
133
OBO Library
Gene Ontology
MGED Ontology
Cell Ontology
Disease Ontology
Sequence Ontology
Fungal Ontology
Plant Ontology
Mouse Anatomy Ontology
Mouse Development Ontology
...
www.ifomis.org
134
coupled with
Relations Ontology (IFOMIS)
suite of relations for biomedical ontology to
be submitted to CEN as basis for
standardization of biomedical ontologies
+ alignment of FMA and GALEN
www.ifomis.org
135
www.ifomis.org
136
ENDE
www.ifomis.org
137