Eg - Buffalo Ontology Site

Download Report

Transcript Eg - Buffalo Ontology Site

The Ontology of Experiments
+ PATO
Barry Smith
http://ontology.buffalo.edu/smith
1
Plan
1. The Experiment Ontology
2. Upper Level Ontologies
3. The Ontology of Biomedical
Investigations
4. Phenotype Ontology
2
EXPO
The Ontology of Experiments
L. Soldatova, R. King
Department of Computer Science
The University of Wales, Aberystwyth
3
EXPO
controlled vocabulary;
meta-model;
theory of content;
knowledge management
􀂄􀂄 knowledge systematization;
􀂄􀂄 knowledge sharing;
􀂄􀂄 knowledge treatment;
􀂄􀂄 knowledge reusability;
data integration.
4
EXPO Formalisation of Science
The goal of science is to increase our knowledge
of the natural world through the performance of
experiments.
This knowledge should, ideally, be expressed in a
formal logical language.
Formal languages promote semantic clarity, which
in turn supports the free exchange of scientific
knowledge and simplifies scientific reasoning.
5
6
7
8
Suggested Upper Merged
Ontology
Adam Pease
[email protected]
http://www.articulatesoftware.com
9
SUMO top level
Entity
– Physical
• Object
– SelfConnectedObject
» Substance
» CorpuscularObject
» Food
– Region
– Collection
– Agent
• Process
– Abstract
• SetOrClass
• Relation
• Quantity
– Number
– PhysicalQuantity
• Attribute
• Proposition
10
Suggested Upper Merged
Ontology
1000 terms, 4000 axioms, 750 rules
Associated domain ontologies totalling 20,000
terms and 60,000 axioms
[includes ontology of boundaries from BS]
11
SUMO Structure
Structural Ontology
Base Ontology
Set/Class Theory
Numeric
Graph
Temporal
Measure
Mereotopology
Processes
Objects
Qualities
12
SUMO+Domain Ontology
SUMO
Structural
Ontology
Base
Ontology
Total Terms Total Axioms Rules
Set/Class
Theory
Numeric
Temporal
Graph
Measure
Processes
Mereotopology
20399
67108
2500
Objects
Qualities
Geography
WMD
Transnational
Issues
Communications
Government
Transportation
Elements
NAICS
Mid-Level
Financial
Ontology
Distributed
Computing
People
UnitedStates
Afghanistan
France
…
Military
Terrorist
Attack Types
Terrorist
Economy
ECommerce
Services
Terrorist
Attacks
Biological
Viruses
World
Airports
13
entity
physical
object
process
dual object process
intentional process
intentional psychological
process
recreation or exercise
organizational process
guiding
keeping
maintaining
repairing
poking
content development
making
constructing
manufacture
publication
cooking
searching
social interaction
maneuver
motion
internal change
shape change
abstract
14
corpuscular object =def.
A SelfConnectedObject whose parts have properties that are not
shared by the whole.
Subclass(es)
organic object
artifact
Coordinate term(s)
content bearing object
food
substance
Axiom: corpuscular object is disjoint from substance.
substance =def.
An Object in which every part is similar to every other in every
relevant respect.
15
advantages of SUMO
clear logical infrastructure: FOL (too
expressive for decidability, more intuitive
(human friendly) than e.g. OWL)
much more coherent than e.g. CYC upper
level
much more coherent than the upper level
hard wired into OWL-DL (and a fortiori into
OWL-FULL)
FOL
16
problems with SUMO as Upper-Level
it contains its own tiny biology (protein,
crustacean, fruit-Or-vegetable ...)
it is overwhelmingly an ontology for abstract
entities (sets, functions in the
mathematical sense, ...)
no clear treatment of relations between
instances vs. relations between types
[all of these problems can be fixed]
17
EXPO: Experiment Ontology
18
representational style part_of experimental hypothesis
experimental actions part_of experimental design
19
equipment part_of experimental design
(confuses object with specification)
20
21
OBI
The Ontology of Biomedical Investigations
grew out of FuGE, FuGO, MGED, PSI
development activities
22
Overview of the Ontology of
Biomedical Investigations
with thanks to Trish Whetzel
(FuGO Working Group)
23
OBI née FuGO
Purpose
 Provide a resource for the unambiguous
description of the components of biomedical
investigations such as the design, protocols and
instrumentation, material, data and types of
analysis and statistical tools applied to the data
 NOT designed to model biology
Enables
 consistent annotation of data across different
technological and biological domains
 powerful queries
 semantically-driven data integration
24
Motivation for OBI
Standardization efforts in biological and
technological domains
 Standard syntax - Data exchange formats
 To provide a mechanism for software
interoperability, e.g. FuGE Object Model
 Standard semantics - Controlled
vocabularies or ontology
 Centralize commonalities for annotation term
needs across domains to describe an
investigation/study/experiment, e.g. FuGO
25
Emerging FuGO Design Principles
OBO Foundry ontology, utilize ontology best practices
 Inherit top level classes from an Upper Level ontology
 Use of the Relation Ontology
 Follow additional OBO Foundry principles
 Facilitates interoperability with other OBO Foundry
ontologies
Open source approach
 Protégé/OWL
 Weekly conference calls
 Shared environment using Sourceforge (SF) and SF
mailing lists
26
OBI Collaborating Communities
Crop sciences Generation Challenge Programme (GCP),
Environmental genomics MGED RSBI Group,
www.mged.org/Workgroups/rsbi
Genomic Standards Consortium (GSC),
www.genomics.ceh.ac.uk/genomecatalogue
HUPO Proteomics Standards Initiative (PSI), psidev.sourceforge.net
Immunology Database and Analysis Portal, www.immport.org
Immune Epitope Database and Analysis Resource (IEDB),
http://www.immuneepitope.org/home.do
International Society for Analytical Cytology, http://www.isac-net.org/
Metabolomics Standards Initiative (MSI),
Neurogenetics, Biomedical Informatics Research Network (BIRN),
Nutrigenomics MGED RSBI Group, www.mged.org/Workgroups/rsbi
Polymorphism
Toxicogenomics MGED RSBI Group, www.mged.org/Workgroups/rsbi
Transcriptomics MGED Ontology Group
27
28
29
FuGO - Top Level Universals
Continuant: an entity that endure/remains the same through time
•
Dependent Continuant: depend on another entity
E.g. Environment (depend on the set of ranges of conditions, e.g. geographic location)
E.g. Characteristics (entity that can be measured, e.g. temperature, unit)
- Realizable: an entity that is realizable through a process (executed/run)
E.g. Software (a set of machine instructions)
E.g. Design (the plan that can be realized in a process)
E.g. Role (the part played by an entity within the context of a process)
•
Independent Continuant: stands on its own
E.g. All physical entity (instrument, technology platform, document etc.)
E.g. Biological material (organism, population etc.)
Occurrent: an entity that occurs/unfold in time
•
•
E.g. Temporal Regions, Spatio-Temporal Regions (single actions or Event)
Process
E.g. Investigation (the entire ‘experimental’ process)
E.g. Study (process of acquiring and treating the biological material)
E.g. Assay (process of performing some tests and recording the results)
30
31
32
Basic Formal Ontology
a true upper level ontology
no interference with domain ontologies
no interference with physics / cognition
no abstracta
no negative entities
explicit treatment of instances, types and relations
33
Three dichotomies
instance vs. type
continuant vs. occurrent
dependent vs. independent
everything in the ontology is a type
types exist in reality through their instances
34
instance vs. type
experiments as instances
experiments as types
ontologies relate to types (kinds, universals)
we need to keep track of instances in
databases
35
BFO
Continuant
Independent
Continuant
Occurrent
(Process)
Dependent
Continuant
..... ..... ........
36
BFO
Continuant
Independent
Continuant
Dependent
Continuant
(molecule,
(quality,
cell, organ,
organism)
function,
disease)
Occurrent
(Process)
Functioning
Side-Effect,
Stochastic
Process, ...
..... ..... .... .....
37
BFO
Continuant
Occurrent
(Process)
Independent
Continuant
(molecule,
cell, organ,
organism)
PATO
Functioning
Side-Effect,
Stochastic
Process, ...
..... ..... .... .....
38
Unifying goal: integration
Integrating data
– within and across these domains
– across levels of granularity
– across different perspectives
Requires
– Rigorous formal definitions in both ontologies
and annotation schemas
39
Some thoughts on the ontology
itself
Outline
– Definitions
• how do we define PATO terms?
• what exactly is it we’re defining?
– is_a hierarchy
• what are the top-level distinctions?
• what are the finer grained distinctions?
– shapes and colors
40
It’s all about the definitions
OBO Foundry Principle
– Definitions should describe things in reality,
not how terms are used
• definitions should not use the word ‘describing’
Scope of PATO = Phenotypic qualities
41
Old PATO
Entity – Attribute – Value
Eye – Red – Dark
New PATO
Entity – Quality
Eye – Red
Eye – Dark Red
Dark Red is_a Red
42
What a quality is NOT
Qualities are not measurements
– Instances of qualities exist independently of their
measurements
– Qualities can have zero or more measurements
These are not the names of qualities:
–
–
–
–
percentage
process
abnormal
high
43
Some examples of qualities
The particular redness of the left eye of a single
individual fly
– An instance of a quality type
The color ‘red’
– A quality type
Note: the eye does not instantiate ‘red’
PATO represents quality types (universals)
– PATO definitions can be used to classify quality
instances by the types they instantiate
44
the type “red”
instantiates
the particular case of
redness (of a particular
fly eye)
the type “eye”
instantiates
an instance of an eye
inheres
(in a particular fly)
in (is a
quality of,
has_bearer)
45
Qualities are dependent entities
Qualities require bearers
– Bearers can be physical objects or processes
Example:
– A shape requires a physical object to bear it
– If the physical object ceases to exist (e.g. it
decomposes), then the shape ceases to exist
46
Scale
Bearer
Quality
Definition (proposed)
Physical
Continuant Mass
Equivalent to the sum of the mass of the
parts of the bearer (mass at the particle
level is primitive/outwith PATO)
Physical
Continuant Opacity
An optical quality manifest by the
capacity of the bearer to block light
A compositional relational quality
manifest by the relative quantity of some
chemical type contained by the bearer
Phys/Chem Liquid
Concentration
Molecular
splicing quality manifest by the splicing processes
Gene
undergone by the bearer
Cellular
Cell
ploidy
A cellular quality manifest by the number
of genomes that are part of the bearer
Cellular
Cell
transformative
potency??
A cellular quality manifest by the
capacity of the bearer cell to differentiate
to different cell types
Organismal Tissue
tone
Organismal Organism
reproductive
quality
48
How many types of shape are
there?
notched, T-shaped, Y-shaped, branched,
unbranched, antrose, retrose, curled, curved,
wiggly, squiggly, round, flat, square, oblong,
elliptical, ovoid, cuboid, spherical, egg-shaped,
rod-shaped, heart-shaped, …
How do we define them?
How do we compare them?
Shapes cannot be organized in a linear scale
Compare: problem of classifying RNA structures
49
Standard case: monadic qualities
Examples
– E=kidney, Q=hypertrophied
– autodef: a kidney which is hypertrophied
We assume that there is more contextual data
(not shown)
– e.g. genotype, environment, number of organisms
in study that showed phenotype
Interpretation (with the rest of the database
record):
– all fish in this experiment with a particular
genotype had a hypertrophied kidney at some
point in time
50
Who should use PATO?
Originally:
– model organism mutant phenotypes
But also:
– ontology-based evolutionary systematics
– neuroscience; BIRN
– clinical uses
• OMIM
• clinical records (clinical manifestations)
• drug effects, chemical effects
– to define terms in other ontologies
• e.g. diploid cell; invasive tumor, engineered gene, condensed
chromosome
51