simon_jupp - Bio

Download Report

Transcript simon_jupp - Bio

A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Document Navigation: Ontologies or
Knowledge Organisation Systems
Simon Jupp
Bio-Health Informatics Group
University of Manchester, UK
Bio-ontologies SIG, ISMB 2007
Vienna, Austria
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Introduction
Document navigation over a Semantic Web
Ontologies are being developed as background
knowledge to drive the Semantic Web.
Message: Formal ontologies are not the only
knowledge artefact needed, artefacts with weaker
semantics have their role and are the best solution in
some circumstances
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
COHSE (Conceptual Open Hypermedia SErvice)
Navigation via Hypertext is a mainstay of WWW
Problem: Links are typically embedded to Web
pages; hard-coding, format restrictions, ownership,
legacy resources, maintenance, Unary targets etc.
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
COHSE Architecture
HTML
Document in
Ontology
Knowledge
Service
SKOS
DLS
Agent
Resource
Service
Linked HTML
Document out
Search
Engine
Annotation
DB
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Sealife project
A realisation of a Semantic Grid Browser, which links the current Web to the
emerging eScience infrastructure
Built on existing tools developed by partners:
COHSE, but also GoPubMed and others.
Applications: from cells via tissue to patients
Evidence-based medicine
Patent and literature mining
Molecular biology
www.biotec.tu-dresden.de/sealife
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
What is a Sealife browser?
A Sealife browser is a browser that identifies concepts in
texts and offers appropriate services based on these
concepts.
Three main components:
1) an underlying “ontology” or set of “ontologies”
2) a text mining component
3) a service composition module
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Sealife use case - study of disease
NeLI: National electronic Library of Infection portal. Range
of users but few links.
Given a document about Tuberculosis, where would
users want to navigate to next?
User Gr oup
Family Doctor
(GP)
Clinicians
Molecular
Biologists
General Public
Question
Tuberculosis drugs and side
eff ects?
Tuberculosis treatments
guidelines?
Drug res istant tuberculosis
species?
What is tuberculosis?
Targe ts
British National Formulary (BNF)
Public Health Observatories
(PHO)
PubMed
Health Protection Agency (HPA)
or the NHS direct online website.
http://www.neli.org.uk
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Background Knowledge
What style of knowledge model is best suited for
navigation?
Do we need strict semantics, are they a help or
hindrance?
All I want to say is this concept “has something to
do with” this concept - this doesn’t fit well with
formal ontology
e.g Polio has something to do with polio virus / polio disease / polio treatment - get
me documents about all of them!
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Sealife background knowledge
To cover molecular biology through to medicine we need a
large knowledge artefact to serve as background
knowledge for SeaLife. This artefact must support sensible
navigation between documents on the web.
Luckily…
-Protein covalent bond
-Protein domain
-UniProt taxonomy
-Sequence types
and features
-Genetic Context
-Mosquito gross anatomy
-Mouse adult gross anatomy
-Mouse gross anatomy and development
-C. elegans gross anatomy
-Arabidopsis gross anatomy
-Cereal plant gross anatomy
-Drosophila gross anatomy
-Dictyostelium discoideum anatomy
-Fungal gross anatomy FAO
-Plant structure
-Maize gross anatomy
-Medaka fish anatomy and development
-Zebrafish anatomy and development
-Pathway ontology
-Event (INOH pathway
ontology)
-Systems Biology
-Protein-protein
interaction
BRENDA tissue /
enzyme source
Proteins
Sequence
Pathways
Phenotype
Anatomy
Phenotype
Gene products Transcript
- Molecule role
- Molecular Function
- Biological process
- Cellular component
eVOC (Expressed
Sequence Annotation
for Humans)
Cell type
Development
-Arabidopsis development
-Cereal plant development
-Plant growth and developmental stage
-C. elegans development
-Drosophila development FBdv fly
development.obo OBO yes yes
-Human developmental anatomy, abstract
version
-Human developmental anatomy, timed version
Plasmodium
life cycle
-NCI Thesaurus
-Mouse pathology
-Human disease
-Cereal plant trait
-PATO PATO attribute and value.obo
-Mammalian phenotype
-Habronattus courtship
-Loggerhead nesting
-Animal natural history and life history
SNOP
CPT
OPCS
EmTree
History of Medical Vocabularies
Synopsis
Nosologiae
Methodicae
MeSH
ICD
1603
1700
1785
1855
ICD9
1900
FMA
GALEN
OPCS
OPCS3
UMLS
OPCS4
SNOP
CPT
MESH
ICD
1975
SNOMED-2
1985
DM&D
OPCS4.3
CTV3
READ
ICPC
1975
SNOMED
International
SNOMED-CT
SNOMED-RT
1995
2005
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
What do we need for navigation?
The bio-medical domain is rich in vocabularies and
ontologies.
Large lexical resource including textual definitions and
synonyms
There is a varying degree of semantics, expressivity and
formality in these vocabularies (e.g. MeSH) and ontologies
(e.g FMA). Most include some form of hierarchy.
Hierarchies are well suited for driving navigation.
Question: Do we want strict sub/super class relationships?
Or, do we want looser notations such as broader/narrower?
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Ontology or Vocabulary?
Initial approach to COHSE and SeaLife was to represent
everything in OWL
The strict semantics of OWL do not always lend
themselves to sensible navigation, conversion from
vocabularies to OWL are difficult. It’s hard to model some
things in OWL…
MeSH
OBO/OWL
Nucleus part_of Cell
Head <-- Ear
<-- Nose
Accident <-- Traffic Accident
<-- Accident Prevention
Cell has_part Nucleus - Not always True
PolioDisease causedby PolioVirus
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
SKOS (Simple Knowledge Organisation System)
Purpose: Subject Metadata and information retrieval
e.g. This document is about tuberculosis
Model for representing concept schemes, thesauri,
classification system, taxonomies etc…
SKOS Concept is an individual of the class skos:concept,
relationships are just asserted facts between (not
restrictions!)
Model ideally suited for our task
e.g. prefLabe, altLabel, broader, narrower, related.
RDF/XML representation
http://www.w3.org/2004/02/skos/
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Conversion to SKOS
• relationship:part_of --> skos:broader ( e.g. finger part_of hand)
• relationship:contains --> skos:narrower ( e.g. skull contains brain)
• relationship:causes --> skos:related ( e.g. PolioDisease causes PolioVirus)
Sub properties:
inverse
skos:broader
skos:narrower
obo:part_of
obo:has_part
Leaves us open to migration back to OWL
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Conversion to SKOS
OBO ontologies
MeSH
OWL ontologies
Concept Schemas
Taxonomies
Thesauri
SKOS
SNOMED
NeLI
Other..
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Advantage of this approach
For a given concept e.g. “Polio Virus”, we can query multiple resources
and bring related concepts together.
Source
MeSH
Disease Ontology
SNOMED
Terms found
Brunhilde Virus
Spinal cord disease
Postpoliomyelitis
Syndrome
Microorganism
Enterovirus
SKOS relation to
PoliovirusΣ
skos:altTerm
skos:broaderThan
skos:narrowerThan
skos:broaderThan
skos:broaderThan
• Rapid (and cheap!) generation of knowledge artefact
• Take advantage efforts in multiple biomedical communities
• We don’t have to make any strong ontological distinctions
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Disadvantage of this approach
Trade off: Lose the inferential power when querying a knowledge
resource
Unwanted concepts & relationship - especially from OWL conversion e.g.
‘Physical Entity’, ‘Continuant’ etc….
Linking overload!
Inability to do inconsistency checking
Potentially large redundancy in our knowledge base
Maintenance and scalability (>1000000 concepts) - especially for dynamic
hyper-linking.
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Plug for Manchester’s SKOS plug-in - Protégé 4
• Instance hierarchy viewer
• OBO or OWL --> SKOS wizards
• Various rendering options
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Conclusion
We need a large knowledge artefact that support navigation between
related web resources
Rapid generation and reuse of existing terminologies (cheap)
Loosening the semantics of our model enables this with acceptable trade
off
SKOS is a suitable model to represent our knowledge artefact
We have strong use cases from the life sciences - demos available
Still some open issues ….
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Acknowledgments
Manchester
Other
Robert Stevens
COHSE developers
Sean Bechhofer
Sealife project
Yeliz Yesilada
NeLI
Patty Kostkova
Thank you.