RO_Dusseldorf_Sep200.. - Buffalo Ontology Site

Download Report

Transcript RO_Dusseldorf_Sep200.. - Buffalo Ontology Site

The Relation Ontology
Barry Smith
1
Concepts, Types and Frames
Concepts Frames
Relational
Types
Structures
2
Concepts, Types and Frames
Concepts
Frames
Linguistic
Approach
Relational Scientific
Types
Structures Approach
3
TLR2:MyD88
complex
TLR2-MyD88
binding
TIR-TIR
binding
TLR2
has_disposition
LTA binding
MyD88
process
has_participant
has_part
has_lower_level_granularity
TLR2-TLR2
ligand
binding
has_participant
TIR domain
TLR-2 signalling pathway
4
how to define relations such as this?
5
Uses of ‘ontology’ in PubMed abstracts
6
By far the most successful: The Gene Ontology
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFES
How
to
do
biology
across
the
genome?
IPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVIS
VMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVY
TLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLER
CHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKY
GYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERL
KRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRAC
ALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVC
KLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDD
NNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGI
SLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLK
TLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPW
MDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEY
ATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS
RFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSG
TTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDR
KRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTL
SLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYM
FLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRA
CALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCAC
TARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTR
RIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDP
NQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS
RFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCS
FSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEI
YMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPV
RNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQS
QFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMF
NLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVV
WIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGG
LCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIE
RMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTAST
NVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATT
TESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTS
ATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTN
SNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSEN
MNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEAL
AVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTR
GKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKG
GVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSM
LIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDG
9
RFDILLCRDSSREVGE
what cellular component?
what molecular function?
what biological process?
10
GO used in curation of literature
what cellular component?
what molecular function?
what biological process?
11
and in integration of databases
GlyProt
MouseEcotope
sphingolipid
transporter
activity
DiabetInGene
GluChem
12
The GO Idea
GlyProt
MouseEcotope
Holliday junction
helicase complex
DiabetInGene
GluChem
13
The GO Idea
GlyProt
MouseEcotope
sphingolipid
transporter
activity
DiabetInGene
GluChem
14
GO used in reasoning
Clark et al., 2005
is_a
part_of
15
GO provides a controlled system of
representations for use in
annotating data
• multi-species
• multi-disciplinary
• multi-granularity, from
molecules to population
18
Gene products involved in cardiac muscle
19
development in humans
$100 mill. invested in literature
curation using GO
over 11 million annotations
relating gene products described
in the UniProt, Ensembl and other
databases to terms in the GO
20
GO allows a new kind of biological
research
based on analysis and
comparison of the massive
quantities of annotations linking
GO terms to the gene products
described in scientific literature
and in scientific databases
21
GO is amazingly successful in
overcoming data silo problems
but it covers only
– cellular components
– molecular functions
– biological processes
22
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry
23
The OBO Foundry
– to extend the GO to enable intelligent
integration of gigantic bodies of
heterogeneous data across the entire
domain of the life sciences, including
clinical medicine
– to create an evolving, map-like,
computable representation of the entire
domain of biological and medical reality
24
The OBO Foundry
Initial Candidate Members
– GO Gene Ontology
– CL Cell Ontology
– SO Sequence Ontology
– ChEBI Chemical Ontology
– PATO Phenotype (Quality) Ontology
– FMA Foundational Model of Anatomy
– ChEBI Chemical Entities of Biological Interest
– CARO Common Anatomy Reference Ontology
– PRO Protein Ontology
25
The OBO Foundry
Under development
– Disease Ontology
– Infectious Disease Ontology
– Mammalian Phenotype Ontology
– Plant Trait Ontology
– Environment Ontology
– Ontology for Biomedical Investigations
– Behavior Ontology
– RNA Ontology
– RO Relation Ontology
26
A success story in top-down
information integration
Ontologies configured as extensions
of a single upper level ontology (BFO)
Used by 100s of researchers to
promote interoperability of
experimental data in scores of highthroughput domains of biology and
medicine via semantic annotation
27
The linguistic approach
Bottoms-up, focused on linguistic
properties manifested by the contents
of a large corpus viewed from a
cognitive perspective
(mapping/modeling meanings or
concepts rather than entities in reality)
28
Automatic mining of
“assocations” from MEDLINE
FACTA: Finding Associated Concepts with Text
Analysis
– What diseases are related to a particular chemical?
– What proteins are related to a particular disease?
http://text0.mib.man.ac.uk/software/facta/
29
For the linguistic approach
fiction may be no less important than fact
English has no privileged status (the larger
the corpus, the better)
consistency (and thus additivity) of
annotations is not important, because
cognitive perspectives differ
goal is automatic generation of semantic
annotations via pattern- matching
30
For the scientific approach
factual discourse alone important
English is lingua franca
regimentation is allowed
goal of truth: to create a single
computer-processable map of reality
via painstaking Handarbeit
truth is one  we strive for consistency
of annotations
31
The linguistic approach is concerned
with knowledge representation
The scientific approach is concerned
with reality representation
32
OBO Relation Ontology (RO 1.0)
Foundational is_a
part_of
Spatial
Temporal
Participation
located_in
contained_in
adjacent_to
transformation_of
derives_from
preceded_by
has_participant
has_agent
33
Relation Ontology
supports consistent linkage of OBO
Foundry ontologies through a
common system of formally defined
relations
to enable reasoning both within and
across ontologies, and thus also
within and between the literature
annotated in its terms
34
Relation Ontology
instance_of
is_a (= is a subtype of)
depends_on
part_of
inheres_in
has_input
has_participant
….
http://obofoundry.org/ro/
35
Basic Formal Ontology (BFO)
Continuant
Independent
Continuant
Occurrent
(Process, Event)
Dependent
Continuant
http://ifomis.uni-saarland.de/bfo/
36
Fundamental Dichotomy
Continuants preserve their identity through
change
Occurrents (aka processes)
– have temporal parts
– unfold themselves in successive phases
– exist only in their phases
– have all their parts of necessity
37
instance_of
types
Continuant
Independent
Continuant
Dependent
Continuant
thing
quality
Occurrent
process, event
.... ..... .......
instances
38
types vs. instances
compare OWL: T-box vs. A-box
(terminology vs. assertions)
39
3 kinds of (binary) relations
Between types
• human is_a mammal
• human heart part_of human
Between an instance and a type
• this human instance_of the type human
• this human allergic_to the type tamiflu
Between instances
• Mary’s heart part_of Mary
• Mary’s aorta connected_to Mary’s heart
40
depends_on
Continuant
Independent
Continuant
Dependent
Continuant
thing
quality
Occurrent
process, event
quality depends
on bearer
.... ..... .......
41
Dependent continuants
the whiteness quality of this cheese
your role as lecturer
the disposition of this peach to ripen
42
depends_on
Continuant
Occurrent
process
Independent
Continuant
Dependent
Continuant
thing
quality
temperature depends
on bearer
.... ..... .......
43
depends_on
Continuant
Independent
Continuant
Dependent
Continuant
thing
quality, …
Occurrent
process, event
event depends
on participant
.... ..... .......
44
Type-level relations presuppose the
underlying instance-level relations
A is_a B =def. A and B are types and all
instances of A are instances of B
A part_of B =def. All instances of A are
instance-level-parts-of some instance
of B
45
The assertions linking terms in
ontologies must hold universally
Hence all type-level relations in RO
are provided with
All-Some definitions
(For linguists, Some-Some relations
are equally important)
47
Including only All-Some relations means:
All relations evaluable as
1. Transitive
2. Symmetric
3. Reflexive
4. Anti-Symmetric
All relations support logical reasoning
– as contrasted with: is_related_to,
is_associated_with,
is_narrower_in_meaning_than …
49
Reasoning should be able to cascade from
one relational assertion (A R1 B) to the next
(B R2 C).
Find all DNA binding proteins should also
Find all transcription factor proteins because
– Transcription factor is_a DNA binding
protein
Only the All-Some structure guarantees such
cascading of relational assertions
50
Organisms are continuants
they are entities which endure
through time through gain and
loss of parts
Processes are occurrents
they are entities which unfold
through time, and have all their
parts as a matter of necessity
53
human testis part_of adult human
being
but not
human being has_part human testis
and not even
male human being has_part human
testis
54
part_of for continuant types
A part_of B =def.
For all x, t if x instance_of A at t then
there is some y, y instance_of B at t
and x instance_level_part_of y at t
cell membrane part_of cell
55
part_of for occurrent types
A part_of B =def.
For all x, if x instance_of A then
there is some y, y instance_of B and
x instance_level_part_of y
EVERY A IS PART OF SOME B
56
transformation_of
A transformation_of B =Def.
Every instance of A was at some earlier
time an instance of B
– adult transformation_of child
59
transformation_of
same instance
C
C1
c at t1
c at t
pre-RNA
child
time
mature RNA
adult
60
derives_from
C
C1
c at t
c1 at t1
time
C'
c' at t
instances
ovum
zygote derives_from
sperm
correction to original Genome Biology paper:
derivation is never one-to-one
61
derives_from
two continuants fuse to form a
new continuant
C
C1
c at t
c1 at t1
C'
c' at t
fusion
62
derives_from
one initial continuant is replaced by two
successor continuants
C
c at t
C1
c1 at t1
C2
c1 at t1
fission
63
derives_from combined with
transformation_of
one continuant detaches itself from an
initial continuant, which itself continues
to exist
C
c at t
c at t1
C1
c1 at t
budding
64
derives_from combined with
transformation_of
one continuant absorbs a second
continuant while itself continuing to exist
C
c at t
c at t1
C'
c' at t
capture
65
ISO “Concept logic” for
mereology
Toronto part_of Ontario
brain part_of central nervous system
ISO, “Guidelines for the Construction, Format,
and Management of Monolingual Controlled
Vocabularies” ANSI/NISO Z39.19-2005) sees
these as examples of the same part_of relation
66
Instances vs. types
Instance-level relations and type-level
relations have logically distinct properties
Type relations are liftings of instance
relations
67
What is symmetric on the level
of instances need not be
symmetric on the level of types
adjacency on the instance
level is always symmetric
68
Not however on the level of
types:
seminal vesicle adjacent_to urinary
bladder
Not: urinary bladder adjacent_to
seminal vesicle
69
Similarly, on the level of types, while:
nucleus adjacent_to cytoplasm
it is not the case that
cytoplasm adjacent_to nucleus
70
continuous_with
on the instance level is always symmetric
a continuous_with b on the instance
level means: there is a fiat boundary
between a and b
if a continuous_with b,
then b continuous_with a
71
72
continuous_with as a relation
between types
A continuous_with B =Def.
for all x, if x instance-of A then there is
some y such that y instance_of B and x
continuous_with y
73
continuous_with is not symmetric
Consider lymph node and lymphatic vessel
Each lymph node is continuous with
some lymphatic vessel, but there are
lymphatic vessels (e.g. lymphs and
lymphatic trunks) which are not
continuous with any lymph nodes
74
3 kinds of binary relations
Between types
• human is_a mammal
• cell nucleus part_of cell
Between an instance and a type
• this human instance_of the type human
• this human allergic_to the type penicillin
Between instances
• Mary’s heart part_of Mary
• Mary’s aorta connected_to Mary’s heart
75
Linguistic vs. scientific approach
to semantic annotation
Semantic annotation can provide support for
logical reasoning across the content of
scientific literature only if the distinctions
between relations at the type level and
relations at the instance level are taken
account of.
(Many?) linguistic accounts of relations do
not take account of this distinction.
76
Why not?
Because linguistic accounts (like dictionaries)
focus on relations between meanings, not on
instances in reality
Because linguistic accounts focus on what is
meaningfully combinable, rather than on what
is logically inferrable
Because linguistic accounts focus on relations
captured grammatically, not on relations
observed experimentally and captured in
scientific theories
77
Sophia Ananiadou
UK National Centre for Text Mining
The Relation Ontology
Barry Smith
78
Do linguistics and biology truly ever meet?
79