Transcript SC32WG2-Wuh

Enhancing Quality of Retrieval
Through Concept Edit History -EVS Update
Frank Hartel
Sherri De Coronado
Gilberto Fragoso
Iris Guo
Kim Ong
February 26, 2003
NCICB Jamboree
1
Outline
• Terminology development -- concept
creation, modification, split, merge,
retirement
• Edit history Usage
• TDE Ontylog editor extension
• Next steps
• Summary
February 26, 2003
NCICB Jamboree
2
Elementary Edit Actions In
Terminology Development
(Create, Modify, Split, Merge, Retire)
Create
Split
Create
Modify
Modify
Version 1
Retire
Create
Split
Modify
Version 2
Merge
Split
Retire
Merge
Create
Split
Modify
Version 3
Retire
Merge
Version 4
Retire
Merge
Evolution of versions/baseline over time
February 26, 2003
NCICB Jamboree
3
Scientific Reasons for
Concept Splits
• Oncogene ras discovered based on sequence
homology (hybridization) to the v-onc gene
of the Harvey strain of murine sarcoma
virus.
• Subsequently, it was discovered that there
were multiple related ras genes, Ha-ras, and
Ki-ras. Later on, a new ras, N-ras, was
found.
February 26, 2003
NCICB Jamboree
4
Scientific Reasons for
Concept Merges
• BCL1 gene discovered in the vicinity of a
t(11;14) translocation, involved in the
malignant transformation of B cells.
• PRAD1 gene found in parathyroid adenomas
bearing chromosomal abnormalities.
• CCND1 codes for one of a set of proteins,
cyclins, that regulate cell cycle progression.
February 26, 2003
NCICB Jamboree
5
Concept Based Retrieval
C2
C1
User
Concepts used for retrieval
Search
Engine
Relevant documents
D1<C1, C2>
D2<C1, C3, C4 >
Document
February 26, 2003
NCICB Jamboree
Indexing terms
6
Edit History Usage
Thesaurus
version
pre-indexed
documents
Edit History
R1
Version 1
new
R2
Version 2
modify
Version 3
retire
merge
R3
split
R4
Version 4
Search
Engine
February 26, 2003
Concepts used for
retrieval
• Document are often indexed using different versions of
terminology.
• Re-indexing document to keep in pace with changes made
to the terminology is impractical and can be very costly.
• Edit
history
can greatly enhance precision and recall.
NCICB
Jamboree
7
Edit History Storage
February 26, 2003
NCICB Jamboree
8
Terminology Development Environment
February 26, 2003
NCICB Jamboree
9
Terminology Development Environment
• Previously, only three types of edit action are
logged – add, modify, and delete.
• Concepts created through split actions are
confounded by newly created concepts.
• Concepts merged into other concepts are
indistinguishable from retired concepts.
• Failure to explicitly track merge and split edit
actions may result in a low recall rate in
information retrieval.
*February
Recall defines
documents
retrieved as fraction of all relevant documents.
26, 2003the number of relevant NCICB
Jamboree
10
Approach Taken to Extend TDE
• Create reusable concept edit tree Java bean
• Develop user interface for processing split,
merge, and retirement edit actions
• Log edit events in TDE history database with
clarity and precision
February 26, 2003
NCICB Jamboree
11
Extend Ontylog Editor With Plug-Ins
Use Concept Edit Tree widget to build plug-ins
February 26, 2003
NCICB Jamboree
12
TDE Extension - Split Panel
Roles and properties may be transferred
from one concept to another using drag & drop.
A concept is created as a result of a split.
Edit action is explicitly logged in the TDE History database as a split event.
February 26, 2003
NCICB Jamboree
13
TDE Extension - Merge Panel
Concept to stay
Concept to retire
Non-redundant roles and properties are transferred
from the retiring concept to the resultant merged
concept.
February 26,
2003
NCICB
Edit
action is explicitly logged
in theJamboree
TDE History database as a merge event. 14
TDE Extension - Preretirement
Concept to retire
•Sub-concepts are re-treed.
•Role relationships targeted (i.e., pointing) to
the retiring concept are either removed or
re-targeted.
February 26, 2003
NCICB Jamboree
Concept can be retired
only if all preconditions are met.
15
TDE Extension - Retire Panel
A non-editable tree shows
concept definition information
pertinent to the retiring
concept.
February 26,
2003
NCICB
Edit
action is explicitly logged
in theJamboree
TDE History database as a retire event. 16
Next Steps
• Consolidate edit history logged by
individual modelers in terminology
development environment (TDE) into
concept history data useful to Distributed
Terminology System (DTS) users
February 26, 2003
NCICB Jamboree
17
Next Steps
• Extend caBIO and DTS Server capability to
facilitate high quality information retrieval
XMLRPC
Client
Repositories
of Indexed
Document
External
Databases
February 26, 2003
DTS
XMLRPC
History
Server
API
caBIO.jar
DTS Extension
End User
Applications
DTS Server
Edit history database
EVS
Concepts used for
retrieval
NCICB Jamboree
(
18 )
to be developed
Summary
• Tracking explicit edit actions in TDE is
absolutely essential to terminology and concept
based information retrieval.
• We have successfully extend TDE Ontylog
editor to explicitly track split, merge, and
retirement edit events.
• Concept history data and supporting APIs will
soon become available to DTS users and
developers through caBIO.
February 26, 2003
caBIO (Cancer Bioinformatics Infrastructure Objects)
NCICB Jamboree
19
EVS Team
Frank Hartel
Sherri De Coronado
Gilberto Fragoso
Margaret Haber
Larry Wright
Jim Oberthaler
Northrop Grumman, Inc.
Kevric Corporation
Aspen Inc.
Apelon, Inc.
February 26, 2003
NCICB Jamboree
Kim Ong
Iris Guo
Bob Dione
20
Contact
Dr. Francis W. Hartel
Center for Bioinformatics
National Cancer Institute
6116 Executive Blvd.
Rockville, MD 20892-8335
Phone: (301) 435-3869
Fax: (301) 480-4222
Email: [email protected]
February 26, 2003
NCICB Jamboree
21