Transcript Document

ISMB 2002
Fifth Annual Bio-Ontologies Meeting
August 8, 2002
Experiences in visualizing and navigating
biomedical ontologies and knowledge bases
Olivier Bodenreider
Lister Hill National Center
for Biomedical Communications
Bethesda, Maryland - USA
Introduction 1
 Biomedical



Terminologies
Ontologies
Knowledge bases
 Common


knowledge
(names)
(objects)
(facts)
features
Terms / Concepts
Inter-concept relationships


Hierarchical
Associative
Lister Hill National Center for Biomedical Communications
2
Introduction 2
 Challenges

Volume of information



104 - 106 concepts
105 - 107 relationships
Orientation



Mapping to concepts
Visualizing concept spaces
Navigating concept spaces
knowledge
term
Lister Hill National Center for Biomedical Communications
3
Introduction 3

SemNav

GenNav
UMLS browser
Entry point: biomedical
term
Display related concepts


Display properties of
interconcept relationships


Allow navigation among
concepts






GO browser
Entry point: GO term or
gene product name/symbol
Display related GO terms
and gene products
Display properties of
term/term and term/gene
product relationships
Allow navigation between
GO terms and gene
products
Lister Hill National Center for Biomedical Communications
4
Outline
 Background


Unified Medical Language System (UMLS)
Gene Ontology
 Overview


of the browsers
SemNav
GenNav
 Common
features
 Differences
Lister Hill National Center for Biomedical Communications
5
UMLS and GO
Unified Medical Language System
 Developed
at NLM since 1990
 13th edition in 2002
 Integrates some 60 terminological resources



Clinical vocabularies (including specialties)
Core terminologies (anatomy, drugs, med. devices)
Administrative terminologies, standards
 Integration


Synonymous terms are clustered in a concept
Hierarchies (trees) are combined in a graph structure
Lister Hill National Center for Biomedical Communications
7
Terminology integration Terms
Duchenne muscular dystrophy
MeSH, SNOMED
CTV3, Jablonski,
CRISP, DxPlain,
MedDRA, LOINC
Duchenne’s muscular dystrophy
COSTAR
Duchenne de Boulogne muscular dystrophy
Jablonski
Duchenne type progressive muscular dystrophy
SNOMED
pseudohypertrophic muscular dystrophy
MeSH, CTV3
SNOMED
X-liked recessive muscular dystrophy
Jablonski
severe generalized familial muscular dystrophy
SNOMED
Lister Hill National Center for Biomedical Communications
8
Terminology integration Relationships
Adrenal Gland Diseases
Adrenal Cortex Diseases
SNOMED
MeSH
AOD
Read Codes
Hypoadrenalism
Adrenal Gland Hypofunction
UMLS
Adrenal cortical hypofunction
Addison’s Disease
Lister Hill National Center for Biomedical Communications
9
UMLS
 Two-level

Semantic Network



134 Semantic Types (STs)
54 types of relationships
among STs
Metathesaurus



structure
800,000 concepts
~10 M inter-concept
relationships
Link = categorization
Semantic Network
Semantic
Type
categorization
Concept
Metathesaurus
Lister Hill National Center for Biomedical Communications
10
Semantic Types
Anatomical
Structure
Fully Formed
Anatomical
Structure
Embryonic
Structure
Body Part, Organ or
Organ Component
Disease or
Syndrome
Pharmacologic
Substance
Population
Group
Semantic
Network
Metathesaurus
Mediastinum
4
Saccular
Viscus
Angina
97 Pectoris
Esophagus
12
Heart
Left Phrenic
Nerve
Concepts
9
Heart
Valves
Fetal
31 Heart
Cardiotonic
225 Agents
Tissue
22 Donors
Gene Ontology
 Developed
by the GO Consortium
 Several components

Ontology (~11,000 concepts)





Molecular functions
Cellular components
Biological processes
Gene products (~125,000)
Associations between Gene products and GO concepts
(~357,000)
Lister Hill National Center for Biomedical Communications
12
SemNav
MeSH Browser
SemNav Visualization options
Lister Hill National Center for Biomedical Communications
18
SemNav Relationships
Semantic Types
Biologically Active
Substance
Amino Acid,
Peptide or Protein
Disease or
Syndrome
Muscular
Dystrophy,
55 Duchenne
Dystrophin
Concepts
Lister Hill National Center for Biomedical Communications
23
GenNav
Material and Methods
Common features
and differences
Mapping query terms
 Mapping


Matching criteria (exact, approximate)
Normalization techniques


work well on clinical terms
less applicable to gene names
 Query


terms to concepts
disambiguation
With semantic type in SemNav
With species in GenNav
Lister Hill National Center for Biomedical Communications
31
Visualization
 Graph


Multiple inheritance is better visualized by graphs than
by trees
Off-the-shelf, freely available graph visualization
packages are available (GraphViz)
 Need


vs. Trees (Forest)
to reduce complexity
Transitive reduction on complex graphs
Feature selection


e.g., a given vocabulary in SemNav
e.g., a given species in GenNav
Lister Hill National Center for Biomedical Communications
32
Navigation
 Tool


for exploration
Navigation among concepts
(SemNav and GenNav)
Navigation between two poles
(Gene products and GO concepts in GenNav)
 Self-contained
(SemNav)
or opened to external resources (GenNav)
Lister Hill National Center for Biomedical Communications
33
Conclusions
Conclusions
 Most
of the lessons learned while developing
SemNav (for browsing general biomedical
knowledge) were applicable to GenNav (for
browsing molecular biology knowledge)
 The lexical techniques suitable for mapping text to
clinical terminologies require adaptation to the
specificity of molecular biology terminologies
Lister Hill National Center for Biomedical Communications
35
Olivier Bodenreider
Lister Hill National Center
for Biomedical Communications
Bethesda, Maryland - USA
Contact: [email protected]
SemNav
http://umlsks.nlm.nih.gov*
► Resources ► Semantic Navigator
(* free UMLS registration required)
GenNav
http://etbsun2.nlm.nih.gov:8000/perl/gennav.pl