Transcript garciax
Annotating Experimental Records
using Ontologies
Olga Giraldo, Unal de Colombia/CIAT
Jael Garcia, 3Universität der
Bundeswehr
Alexander Garcia, UAMS
Motivation and Research Question
• Knowledge-based approach to managing
laboratory information
– it combines elements from the Semantic Web (SW),
e.g. ontologies supporting organization and
classification, with elements from Social Tagging
Systems, e.g. collaboration, ad-hoc organization
strategies.
• How can we semantically annotate laboratory
records?
• How can we facilitate the coexistence of
laboratory notebooks and electronic laboratory
records?
Motivation and Research Question
• Easy to use, highly portable,
easy to share, low cost…
• Great artifacts for supporting
design
• Legal requirement
Mutis
Marie Curie
da Vinci
Research Question
• How can we facilitate the coexistence of
laboratory notebooks and electronic
laboratory records?
• How can we semantically annotate laboratory
records?
Our Approach
• Documents should be able to “know about”
their own content for automated processes to
“know what to do” with them.
Semantics….
Materials and Methods
• Our scenario: supporting
the annotation of
experimental data for
some of the processes
routinely run at the
Center for International
Tropical Agriculture (CIAT)
biotechnology laboratory
• 15 laboratory notebooks
together with their
corresponding electronic
records, e.g. XLS files,
outputs from lab
equipment, etc.
• 10 biologists
• Direct non-intrusive
observation: 6 months
• Ontology and prototype
development: iterative
and collaborative process
• Existing ontologies
Results
•
•
•
•
•
•
Data types
Rhetorical structure
Ontologies
Orchestration of ontologies
Tags and ontologies
Lessons
Results
• Data Types
– Manuscript
– Digital
– Digital data with manuscript annotations
Results
• Manuscript
–
–
–
–
–
–
–
–
Lists
To-dos
How-tos (protocols)
Incomplete results
Dates
Formulas
Electronic paths
Sources for information
(URLs)
A
B
C
D
Results
• Digital
–
–
–
–
–
–
Photos
Lists
Incomplete results
Protocols
Figures
Sequences
Results
• Digital + Manuscript
– Digital files, print-outs,
tagged with manuscript
information.
A
B
Results
• We identified the rhetorical structure implicit
in those laboratory notebooks we studied
• And the metadata describing such structure
Rhetorical structure: Header, Body.
Title (DC)
Creator
(DC/AgMes)
Header:
metadata
describing
a lab
notebook
Notes (AgMes)
Date of creation
(DC)
Date of
finalization (M4L)
Lab
Notebook
Body:
metadata
describing
an
experiment
al activity
Laboratory
notebook number
(M4L)
Languaje (DC)
Samples: DNA, RNA,
whole plant, etc. (OBI,
CHEBI, PO)
Project
(OBI/AGROVO
C)
Date (DC)
Laboratory
procedure
(M4L)
Page number
(M4L)
Recorded by
(M4L)
Protocol (OBI)
Comments
(BioPortal,
NCIt,
SNOMED)
Purpose (M4L)
Materials &
Methods,
experimental
design
Materials
&
Methods:
Samples,
Reagents,
Assays,
Equipment
and
supplies.
Security
measurements
(M4L)
Reagents: buffer, dNTP
mix (CHEBI, M4L)
Assay: extraction DNA,
PCR, gel electrophoresis
(OBI, M4L).
Equipment & supplies:
freezer, centrifuge,
shaker, glove, etc. (OBI,
PEO, SEP, SNOMED,
BIRNLex M4L).
Outcome (NCIt)
Experimental
design
Experimental design:
(OBI, M4L)
DNA Extraction
We focused on: DNA extraction, PCR and Electrophoresis
2 process: mechanical pulverization of plant material
inheres
in
inheres
in
bearer
of
inheres
in
is a
is a
is a
is a
is a
A typical process in a plant biotechnology laboratory
Mechanical pulverization of plant material
Results
• M4L: our ontology for the experimental
processes we studied
– Based on OBI.
– Terms proposed to OBI: 197, including new terms
plus terms from other ontologies
– Other terms will be proposed to other ontologies,
e.g. ChEBI, GO, PO
Ontology
N. of concepts
0
Metadata for Laboratory Notebook (M4L)
149
1
Chemical Entities of Biological Interest (CHEBI) (Degtyarenko et al., 2008)
87
2
Ontology for Biomedical Investigation (OBI) (Brinkman et al., 2010)
59
3
Medical Subject Headings ontology (MSH) (Moerchen et al., 2008)
17
4
Gene Ontology (GO) (Ashburner et al., 2000)
14
5
Sample Processing and Separation Techniques (SEP) (http://psidev.info/index.php?q=node/312)
6
6
BIRN Project lexicon (BIRNLex) (Bug et al., 2008)
6
7
Gene Regulation Ontology (GRO) (Beisswanger et al., 2008)
5
8
National Cancer Institute thesaurus (NCIt) (Ceusters et al., 2005)
5
9
Plant Ontology Consortium (POC) (Jalswal et al., 2005)
5
10
SNOMED-CT (http://www.nlm.nih.gov/research/umls/Snomed/snomed_main.html)
5
11
BioTop Ontology (Beisswanger et al., 2007)
1
12
Foundational Model of Anatomy (FMA) (Rosse and Mejino, 2003)
1
13
Ontology for Genetic Interval (OGI) (Lin et al., 2010)
1
14
Parasite Experiment Ontology (PEO) (http://wiki.knoesis.org/index.php/Parasite_Experiment_ontology)
1
15
Proteomics Data and Process Provenance (PDPP) (Sahoo et al., 2006)
1
Results
• We have structured the
descriptive layers by reusing
and extending existing
ontologies.
• For supporting the
annotation within our
scenario we have identified
three main layers, namely:
– i) that related to the
document itself,
– ii) the annotation layer, and
– iii) that related to the
experiment.
Results
• Orchestration of ontologies: Annotation
Ontology
The Annotation Ontology is a vocabulary for performing
several types of annotation - comment, entities annotation
(or semantic tags), textual annotation (classic tags), notes,
examples, erratum... - on any kind of electronic document
(text, images, audio, tables...) and document parts. AO is
not providing any domain ontology but it is fostering the
reuse of the existing ones for not breaking the principle of
scalability of the Semantic Web.
Selector
(304,507)
rdfs:SubClassOf
aos:init
ImageSelector
(360,618)
aos:end
aof:onDocument
InitEndCornerSelector
rdf:type
ao:context
rdfs:SubClassOf
Annotation
rdf:type
Qualifier
aof:annotates
Document
Topic
ANNOT1
ao:hasTopic
GenBank:
AB005238
moat:tagMeaning
name
Provenance
pav:createdBy
http://www.tags4lab.org
/foaf.rdf#olga.giraldo
pav:createdOn
June 1, 2010
Partial sequence on
psy promoter
ann:body
tags:name
MOAT
rdf:type
foaf:Person
rdf:type
moat:Tag
moat:hasMeaning
Annotation
rdfs:SubClassOf
Definition
rdf:type
ANNOT2
aoex:hasMoatMeaning
rdf:type
aof:annotates
Document
http://www.ncbi.nlm.nih
.gov/pubmed/12520345
moat:Meaning
Results
• The AO is structuring
the semantic
annotation as well as
the tags generated by
users.
– In this way we are
supporting complex
SPARQL queries involving
several ontologies, for
instance:
• Retrieve from the
eLabBook the
pages tagged by Tim
Andrews or Lisa Watson
with the tags rice and
iron for which there is a
LIMS data entry”
Concluding Remarks
• Although several ELNs have been proposed and
replacing paper-based records has been a consistent
trend for several years, the technology has not yet
been widely adopted; Laboratory Information
Management Systems (LIMS) in combination with
paper-based laboratory notebooks continue to be
commonly used; particularly in academic
environments.
Concluding Remarks
• Sharing and organizing information happens
on a concept basis
– researchers studying genes involved in iron
transport share information with those who
undertake nutritional studies assessing the effects
of iron intake in human populations
– Clustering information based on concepts
Concluding Remarks
• Simple tagging mechanisms proved to be
valuable resources for organizing information
– Cloud of tags were used as TOCs
– Tags were also used to support a quick view of
laboratory pages
– Tags tend to stabilize over time
– Tags were a valuable resource of terms and
evidence (use cases) for those terms
Concluding Remarks
• Time is difficult to model
• Incremental prototyping and participatory
design were key –community engagement
• Limitations in the technology:
– Tablets, electronic pen, ipad first generation, now
motorola XOOM
– Browser compatibility
• Laboratory notebooks look like specialized
wikis
Future Work
• Focus on one technology: Android OS
• Semantic LIMS
• Support the whole cycle (LIMS record—
notebook—machine generated data)
• Automatic annotation of machine generated data
• Adopt minimal amounts of information
• Adopt techniques from Personal Information
Management approaches
• Look more like a wiki
Acknowledgments
• John Bateman, Oscar Corcho, Joe Tohme,
Cesar Montana, Alberto Labarga
• The CIAT biotech lab