Transcript ICBO_Masuya

ICBO 2011
July 28-30, 2011
Buffalo, New York,
An Advanced Strategy for Integration of
Biological Measurement Data
Hiroshi Masuya1, Georgios V. Gkoutos2,
Nobuhiko Tanaka1, Kazunori Waki1, Yoshihiro Okuda3,
Tatsuya Kushida3, Norio Kobayashi4, Koji Doi4,
Kouji Kozaki5, Robert Hoehndorf1,
Shigeharu Wakana1, Tetsuro Toyoda4 and
Riichiro Mizoguchi5
1: RIKEN BioResource Center, Tsukuba, Japan
2: Department of Genetics, University of Cambridge, UK
3: NalaPro Technologies, Inc, Tokyo, Japan
4: RIKEN BASE, Yokohama Japan
5: Department of Knowledge Systems, ISIR, Osaka University, Japan
Motivation of this study
Phenotypes represent a broad range of
variations in measured qualities
Integrated phenotypic
information whole
Organism A
Organism B
Organism C
Organism D
Sophisticated
informatics
infrastructure
(ontology)
Mining…
Biological knowledge
To contribute to development of the informatics
infrastructure for the description, exchange and mining of
phenotypic data.
Phenotypic Quality (PATO):
PATO provides a practical basis for vocabulary and
semantics for the description of phenotype information
across species.
•Single hierarchy model of
“quality” suite for BFO
•Standard of phenotype annotation
across species. (“EQ” annotation)
•Less confusions than “EAV”
annotation for non-ontology-familiar
people.
•Basis of inferences of cross-species phenotype equivalence with EQ.
(e.g. mouse phenotype and disease)
E
MP:0002269 !
muscular atrophy
Q
MA:0000015 !
muscle
PATO:0001623 !
atrophied
FMA:30316 !
muscle
PATO:0001623 !
atrophied
E
Q
HP:0003202 !
Amyotrophy
Expansion of PATO
We attempted to expand the PATO ontology to ensure a
more advanced, explicit and consistent knowledge
framework.
Objectives:
1. To provide fundamental classification of quality
values on the basis of measurement scales.
2. To provide strict data model to operate contextdependencies of ordinal values.
3. To provide model of datum (or description) as a
informational entity with the structure of common
formalisms.
Fundamental classification of quality-value (1)
Refrain from 2-hiearchy model
(and EAV formalism)
There were a lot of discussions for
PATO to take 1-hiearchy and EQ…
1.
Number of studies claims that the fundamental classification of
values: “scales of measurement” (Stevens S.S, 1946) is beneficial
for data integration in the field of experimental science.
length
temperature
20cm
37℃
-
310.15K
Long, short
high, low
color
red, blue..
This classification takes as starting point the mathematical operation!
Fundamental classification of quality-value (2)
2.
Foundation of explicit description of change of quality is
needed
Color 1: green to orange
Color 2: orange to green
1
2
t1
t2
1
Growing boy and his height quality
Ontology
System of
quality
Formalism
BFO,
PATO
1-hiearchy
EQ
DOLCE
2-hiearchy
EAV
2
Explicit description of color
change is needed.
Qualitative and quantitative descriptions are
integrated in a single knowledge framework in
DOLCE.
For the coordination of ongoing efforts,
equivalence mapping of these systems is
beneficial.
Model of context-dependency of ordinal value (1)
Problem of “large ant and
small elephant”
I’m big!!
How to classify value instances?
I’m small..
Context A: simple comparison
value C
value A
“Large” class
“Small” class
value D
value B
smaller
Threshold X
(some value)
larger
Model of context-dependency of ordinal value (2)
Problem of “large ant and
small elephant”
I’m big!!
How to classify value instances?
I’m small..
Context B: deviation based comparison
(context of inference of cross-species equivalence of phenotypes)
value A
value C
“abnormally large” class
larger
deviation
Threshold Y1 and Y2
(deviation-based
value)
smaller
value B
value D
“normal size” class
larger
smaller
“abnormally small” class
Knowledge model of context dependencies of ordinal
scale values is needed!
Model of datum as an informational entity
Current version of DOLCE, BFO and PATO deal only with the primary
reality and do not deal with quality description.
1. Distinction of a “true value” and an “empirical
measurement” as an approximation is needed.
weigh
t
weigh
t
Reality
Information
weigh
t
(Unknown…)
2. Modeling of informational entities with common
formalisms (eg. EQ, EAV and so on) and their
relationships would be useful!
weigh
t
Reality
(Unknown...)
EQ
EAV
weigh
t
Information
weigh
t
Expansion of PATO with YAMATO
framework
BFO
quality
PATO
Mapping
Interoperability
YAMATO
DOLCE
A reference ontology
“PATO2YAMATO”
•Equivalence mapping between
1- and 2-hiearcy models
•Model of context dependency
OBI
quale
quality-space
•Model of datum with common
formalisms
Practical use based proposals…
Yet Another More Advanced Top-Level Ontology
(YAMATO: Mizoguchi, 2009)
Features:
• Framework of interoperability of quality-related concepts between
top-level ontologies. Support of classification of scales of
measurements.
• Model of context dependency with “role”
• Detailed model of “representation” (an informational object) that
involves quality representation.
Equivalence mapping of 1- and 2-hearcy model
BFO
(Upper level)
quality
identical
property
quality
YAMATO
(Upper level)
quality_
quantity
(convertible)
generic quality
identical
quality value
quality space
DOLCE
(Upper level)
Classification of quality value
(scales of measurements :
Stevens S.S, 1946)
identical
region
quale
About 1,000 PATO terms were manually
mapped to YAMATO framework.
Modeling of context dependency with “role”
I’m a teacher.
(at school)
(at home)
In the distribution for weight, some weight quality values playing
large-roles thereby becomes role holders, abnormally heavy
context
role-holder
(Entity playing a role)
Distribution for
weight
role
Abnormally
heavy
large-role
depend on
Concept model of
role and role-holder
I’m a husband
An entity often plays
different “roles” with
different characteristics
under different contexts
playable
heavier than
normal value
potential player
weight
quality value
qualitative value for
weight
Modeling of context dependency with “role”
I’m a teacher.
I’m a husband
An entity often plays
different “roles” with
different characteristics
under different contexts
(at school)
(at home)
In the distribution for weight, some weight quality values playing
large-roles thereby becomes role holders, abnormally heavy
context
Potential
player
Role-holder
Implementation and representation in Hozo ontology editor
Inter-relationships among contexts
Classification of organisms
Inherit
Inherit
Inference of classification:
”Abnormally light in elephant is lighter than abnormally heavy in ant”
in the simple comparison context.
Inference of the Classification of
“abnormally heavy”
Coordination of ordinary values under
different contexts
Abnormally heavy in elephant
Context of
distribution of
weight in
elephant
larger
Normal weight in elephant
larger
Abnormally light in elephant
larger
Abnormally heavy in ant
larger
Context of
distribution of
weight in ant
Normal weight in ant
larger
Abnormally light in ant
Simple
comparison
context
Quality representation in YAMATO
YAMATO provides “quality representation” for the foundation of
formalized informational entities such as EQ, EAV and so on.
Informational entities
Reality
Quality
representation
Weight
Quality
(Symbolization)
Quality representation is modeled in the consistent way for content bearing
informational entity, “representation”.
Basic structure for
representation by symbol
(Mizoguchi, 2004)
quality representation
EP (=EQ)
(BFO, PATO)
EAV
(DOLCE)
Sentence of
natural language
Coding of genetic
information
Tupple
Triple
natural language
nucleotide
sequence
*entity, #property
*entity, #generic
quality, value
alphabet
molecular symbol
quality
measurement
quality
measurement
anything…
Specification of
gene product
*: symbolization operation, #: Class => individual operation (equivalent with punning in OWL 2)
Current status of the reference ontology:
PATO2YAMATO
•Including about 1,000 PATO terms into YAMATO framework
•Basic form of context-dependent ordinal values are defined.
They are workable under the classification of organisms.
•Basic form of quality representation (EAV and EQ) are
already defined in YAMATO.
http://www.brc.riken.go.jp/lab/bpmp/ontology/ontology_pato2yato.html
Preliminary trial of simple conversion of EQ to EAV
1,450 EQ annotation:
(OBO cross-product file for
Mammalian Phenotype
ontology)
reference:
PATO2YAMATO
EAV-quality
representations in
YAMATO framework
The ontology helps the automatic conversion from EQ to EAV!
We are planning full conversion of EQ across multiple species with coordinated EAVquality representation.
Summary of this talk
This study shows:
• YAMATO’s framework helps to coordinate different “qualities”
for phenotype information in both of reality and description
level.
• Role-model successfully coordinated ordinal values
dependent on multiple contexts (deviation-based and simple
comparison).
Future views:
• Automatic conversion of EQ of multiple species to EAV.
• Modeling of contexts of experimental conditions.
• Integration of qualitative and quantitative phenotype data.
• Coordination of more complicated phenotype data sets from
multiple species and experiments.
Acknowledgements
RIKEN BioResource Center
Nobuhiko Tanaka, Kazunori Waki, Terue Takatsuki
University of Cambridge
Georgios V. Gkoutos, Robert Hoehndorf
NalaPro Technologies Inc
Yoshihiro Okuda, Tatsuya Kushida
Enegate corp
Mamoru Ota
RIKEN BASE
Norio Kobayashi, Koji Doi, Tetsuro Toyoda
Department of Knowledge Systems, ISIR, Osaka University
Koji Kozaki, Riichiro mizoguchi
貴為和以
“Harmony is to be valued.”
In “Seventeen-article constitution”
(A.D 603, YAMATO imperial court in ancient Japan)
Authored by Prince Shōtoku (A.D. 573–621)
Thank you !