Searching and Exploring Biomedical Data

Download Report

Transcript Searching and Exploring Biomedical Data

Searching and Exploring
Biomedical Data
Vagelis Hristidis
School of Computing and Information Sciences
Florida International University
Roadmap
Why is it challenging to search EMRs?
 XOntoRank: Leveraging Ontologies to
improve sensitivity in EMR search
 ObjectRank: Use authority flow to rank
EMR entities
 BioNav: Using MeSH to explore the
results of PubMed queries

Vagelis Hristidis, Searching and Exploring Biomedical Data
2
Roadmap
Why is it challenging to search EMRs?
 XOntoRank: Leveraging Ontologies to
improve sensitivity in EMR search
 ObjectRank: Use authority flow to rank
EMR entities
 BioNav: Using MeSH to explore the
results of PubMed queries

Vagelis Hristidis, Searching and Exploring Biomedical Data
3
ELECTRONIC MEDICAL
RECORDS (EMRs)
Adoption of EMRs hard due to political reasons
◦ No unique patient id
◦ Confidentiality
◦ HIPAA (Health Insurance Portability and Accountability Act)
 Move towards XML-based format.
 One of most promising:
Health Level 7’s Clinical Document Architecture (CDA).
 EMRs pose new challenges for Computer Scientists
◦ Confidentiality, authentication, secure exchange
◦ Storage, Scalability
◦ Dictionaries, terms disambiguation
◦ Search for interesting patterns (Data Mining)
◦ Data Integration, Schema mapping
◦ Searching and Exploring

4
Vagelis
Hristidis, Searching and Exploring Biomedical Data
SAMPLE CDA FRAGMENT
5
Vagelis
Hristidis, Searching and Exploring Biomedical Data
CDA Document – Tree View
6
Vagelis
Hristidis, Searching and Exploring Biomedical Data
LIMITATIONS OF
Traditional IR
General XML Search
Text-based search engines
do not exploit the XML
tags, hierarchical structure
of XML
 Whole XML document
treated as single unit unacceptable given the
possibly large sizes of XML
documents
 Proximity in XML can also
be measured in terms of
containment edges

EMRs have known but
complex semantics
 EMRs include free text,
numeric data, time
sequences, negative
statements.
 Routine references in
EMRs to external
information sources like
dictionaries and ontologies.

Vagelis Hristidis, Searching and Exploring
7
Biomedical
Data
Syntax vs. Semantics in Schema
Example – query “Asthma Theophylline”
More details at [Hristidis et al. NSF Symposium on Next Generation of Data
Mining ’07]
Vagelis Hristidis, Searching and Exploring Biomedical Data
8
Roadmap
Why is it challenging to search EMRs?
 XOntoRank: Leveraging Ontologies to
improve sensitivity in EMR search
 ObjectRank: Use authority flow to rank
EMR entities
 BioNav: Using MeSH to explore the
results of PubMed queries

Vagelis Hristidis, Searching and Exploring Biomedical Data
9
XOntoRank: Leverage Ontological
Knowledge
Algorithm to enhance keyword search using
ontological knowledge (e.g., SNOMED) [ICDE’08
poster, ICDE’09 full paper]
Medical Dictionary

301229001
Bronchial
Finding
118946009
Disorder of
Thorax
50043002
Disorder of
Respiratory system
Is a
Is a
Is a
41427001
Disorder of
Bronchus
79688008
Respiratory
Obstruction
Is a
Medical
Dictionary
405944004
Asthmatic
Bronchitis
Is a
Is a
Is a
Finding site of
May be
195967001
Asthma
Finding site of
Is a
May be
266364000
Asthma attack
Vagelis Hristidis, Searching and Exploring Biomedical Data
82094008
Lower respiratory tract
structure
955009
Bronchial Structure
Finding site of
10
Example 1
q = {“bronchitis”, “albuterol”}
result =
Vagelis Hristidis, Searching and Exploring Biomedical Data
11
Example 2
q = {“asthma”, “albuterol”}
result = ???
Vagelis Hristidis, Searching and Exploring Biomedical Data
12
XOntoRank
A CDA node may be associated to a query
keyword w through ontology.
 XOntoRank first assigns scores to ontological
concepts

◦ OntoScore OS(): Semantic relevance of a concept c in
the ontology to a query keyword w.

Then, given these scores, assign Node Scores
NS() to document nodes

Other aggregation functions are possible.
Vagelis Hristidis, Searching and Exploring Biomedical Data
13
Computing OntoScore of Concept
Given Query Keyword

Three ways to view the ontology graph:
◦ As an unlabeled, undirected graph.
◦ As a taxonomy.
◦ As a complete set of relationships.
Vagelis Hristidis, Searching and Exploring Biomedical Data
14
Roadmap
Why is it challenging to search EMRs?
 XOntoRank: Leveraging Ontologies to
improve sensitivity in EMR search
 ObjectRank: Use authority flow to rank
EMR entities
 BioNav: Using MeSH to explore the
results of PubMed queries

Vagelis Hristidis, Searching and Exploring Biomedical Data
15
Authority Flow Ranking in EMRs
Query: “pericardial effusion”
EventsPlan
Hospitalization
History = “48 year old..”
Medication
TimeStampCreated=”20
03-02-13 21:57:00.0"..
Cardiac PatientID=”1438"
ciate
d
v3
v7 Hospitalization
_with TimeStampCreated=”2004-1027 22:00:00.0" History=”18
year old boy with an aggressive
form of chest lymphoma…”
Allergies = “NKDA”…...
ith
v2 prescribed_to
asso
_w
TimeStampCreated=”2004-11-03
11:57:00.0" Events=”….small
residual pericardial effusion…..”
as
so
cia
ted
v1
p re s
cribe
d
_ by
recorded_by
Employee
TimeStampCreated=”2004-12recorded_by 23 14:03:00.0" Title=”Pediatric
Cardiologist”….
v6
v5 EventsPlan Events=“4 month
old baby… pericardial effusion...”
Complication=”apical impulse … Echov4
large increasing pericardial effusion…”
A subset of the electronic health record dataset.
Work under submission.
Vagelis Hristidis, Searching and Exploring Biomedical Data
16
Associated_
Events
A-E
created_by
Employee
created_by
for
A-H
Patient
Hospitalization
P-M
of
H-E
M-E
prescribed_by
recorded_by
P-E
Authority Flow Ranking
H-M
prescribed_ Medication
to
Schema of the EMR dataset
Vagelis Hristidis, Searching and Exploring Biomedical Data
17
User Study
Vagelis Hristidis, Searching and Exploring Biomedical Data
18
Explaining Subgraph
Vagelis Hristidis, Searching and Exploring
Biomedical Data
19
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Average Specificity
Average Sensitivity
User Study Results
CO085BM25 BM25
Mean Sensitivity
CO085
CO030
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
CO085BM25 BM25
CO085
CO030
Mean Specificity
BM25: Traditional Information Retrieval Ranking Function
CO: Clinical ObjectRank (Authority Flow)
Vagelis Hristidis, Searching and Exploring Biomedical Data
20
Roadmap
Why is it challenging to search EMRs?
 XOntoRank: Leveraging Ontologies to
improve sensitivity in EMR search
 ObjectRank: Use authority flow to rank
EMR entities
 BioNav: Using MeSH to explore the
results of PubMed queries

Vagelis Hristidis, Searching and Exploring Biomedical Data
21
Biological Databases (cont’d) –
Results Navigation [ICDE09, TKDE 2010]
With SUNY Buffalo.
 Demo at http://db.cse.buffalo.edu/bionav/
 Most publications in PubMed annotated
with Medical Subject Headings (MeSH)
terms.
 Present results in MeSH tree.
 Propose navigation model and smart
expansion techniques that may skip tree
levels.

Vagelis Hristidis, Searching and Exploring
Biomedical Data
22
BioNav: Exploring PubMed Results
- Query Keyword:
prothymosin
- Number of results: 313
- Navigation Tree stats:
• # of nodes: 3941
• depth: 10
• total citations: 30897
Big tree with many
duplicates!
MESH (313)
Amino Acids, Peptides, and Protei
Proteins (307)
Nucleoproteins (40)
Histones (15)
4 more nodes
45 more nodes
2 more nodes
Biological Phenomena, … (217)
Cell Physiology (161)
Cell Growth Processes (99)
15 more nodes
3 more nodes
Genetic Processes (193)
Gene Expression (92)
Transcription, Genetic (25)
1 more node
10 more nodes
95 more nodes
Static Navigation Tree
for query “prothymosin”
Vagelis Hristidis, Searching and Exploring Biomedical Data
23
BioNav: Exploring PubMed Results
Reveal to the user a selected set of descendent concepts
that:
(a) Collectively contain all results
(b) Minimize the expected user navigation cost
Not all children of the root are necessarily revealed as in static
navigation.
Vagelis Hristidis, Searching and Exploring Biomedical Data
24
BioNav Evaluation
Overall Navigation Cost
(# of Concepts Revealed + # of EXPAND Actions)
20
18
16
14
12
10
8
6
4
2
0
Static
BioNav
Vagelis Hristidis, Searching and Exploring
Biomedical Data
25
References

Abhijith Kashyap, Vagelis Hristidis, Michalis Petropoulos, and Sotiria Tavoulari.
Effective Navigation of Query Results Based on Concept Hierarchies. IEEE
Transactions on Knowledge and Data Engineering (TKDE) 2010

Fernando Farfán, Vagelis Hristidis, Anand Ranganathan, and Michael Weiner.
XOntoRank: Ontology-Aware Search of Electronic Medical Records. IEEE
International Conference on Data Engineering (ICDE) 2009

Abhijith Kashyap, Vagelis Hristidis, Michalis Petropoulos, and Sotiria Tavoulari.
BioNav: Effective Navigation on Query Results of Biomedical Databases. IEEE
International Conference on Data Engineering, ICDE 2009

Vagelis Hristidis, Fernando Farfán, Redmond P. Burke, Anthony F. Rossi, Jeffrey A.
White. Information Discovery on Electronic Medical Records. National Science
Foundation Symposium on Next Generation of Data Mining and Cyber-Enabled
Discovery for Innovation (NGDM) 2007
Supported by

NSF IIS-0811922: Information Discovery on Domain Data Graphs, 20082011

NSF CAREER IIS-0952347, 2010-2015
Vagelis Hristidis, Searching and Exploring Biomedical Data
26