Addressing Users’ Healthcare Needs through Personal Health

Download Report

Transcript Addressing Users’ Healthcare Needs through Personal Health

Addressing Users’ Healthcare Needs
through Personal Health Messages
Presenter : Jason H.D. Cho
Department of Computer Science,
University of Illinois at Urbana-Champaign, Urbana, IL
Health Informatics and Data Science
•
Healthcare is becoming an emerging area.
•
Lots of data readily available!
•
•
Medical web forums, which we have used in this paper, typically spans
millions of posts.
Traditionally, health informatics utilize Electronic Medical Records.
•
In 2006, less than 10% of the hospitals used EMR. By 2009, almost 50% of
the hospitals started using EMR.
EMR and Medical Web Forums
Why bother with medical web forums?
•
Electronic Medical Records traditionally used in health informatics.
•
Privacy issues.
•
Data not readily available.
•
In this talk, I’ll talk about how we can utilize personal health messages, or in
our case, medical web forums, can be used to address similar problems that
EMRs do.
•
I’ll talk about two works, each from different perspective:
•
Macro and Micro.
Addressing Users’ health needs
•
Macro Perspective
•
Learn what vast majority of users are saying.
Addressing Users’ health needs
•
Macro Perspective
•
Learn what vast majority of users are saying.
Addressing Users’ health needs
•
Macro Perspective
•
Learn what vast majority of users are saying.
Addressing Users’ health needs
•
Macro Perspective
•
Learn what vast majority of users are saying.
Chee, 2011
Addressing Users’ health needs
•
Micro Perspective
•
Users would like to conduct their own research.
Addressing Users’ health needs
•
Micro Perspective
•
Users would like to conduct their own research.
In this talk…
•
I’ll present two works that address both the macro and micro perspective.
•
Macro Perspective : Comparative Effectiveness Research
•
Micro Perspective : Case Retrieval System (Under submission)
Aggregating Personal Health
Messages for Scalable Comparative
Effectiveness Research
Jason H.D. Cho1,4, Vera Q.Z Liao1,4,
Yunliang Jiang1,4,5, Bruce R. Schatz1,2,3,4
1Department
of Computer Science,
2Institute of Genomic Biology,
3Department of Medical Information Science,
4University of Illinois at Urbana-Champaign, Urbana, IL
5Twitter, Inc., San Francisco, CA
Comparative Effectiveness Research
•
Generation and synthesis of evidence that compares the benefits and harms of
alternative methods to prevent, diagnose, treat, and monitor clinical conditions
or to improve the delivery of care.
•
The American Recovery and Reinvestment Act of 2009 (ARRA) allotted $1.1
billion to support CER.
•
Existing Approaches :
•

Randomized trials – precise, but expensive to conduct, generally not scalable.

Research reviews – scalable, but only utilize works done in existing literature.
Our approach:

Low cost, can generate hypotheses quickly, scalable

MedHelp (1 million health messages), Yahoo! Answers (10 million health messages),
HealthBoards (1 million health messages)
General Technique & Terminologies
Attitude of Context :
sgn(Positive – Negative) = -1
So negative attitude
Treatment
Sentence
Context
Our Approach
•
We have determined users’ sentiment towards treatment is a good
indicator of effectiveness.
 Users’ sentiment towards
treatment : Summation of context attitude the
user makes towards treatment of interest
 Preference
between the two treatments is defined if more people have
more net positive sentiment towards a treatment than the other.
•
We introduce three different approaches to determine effectiveness
based on users’ sentiments.
Individual Effectiveness Comparison Study
•
Compare authors who explicitly compare two treatments.
•
This approach is more precise since the person is directly comparing two
treatments against each other.
•
However, not many patients compare two treatments directly. We can relax the
definition of effectiveness.

The new approach should be consistent with individual effectiveness comparison study.
Chemotherapy : +
Hormonal Therapy : -
Chemotherapy : Hormonal Therapy : +
Chemotherapy is preferred
over hormonal therapy.
Population Effectiveness Comparison Study
•
Compare groups of people who prefer treatment over those who do not.
•
This approach allows leveraging bigger pool of population cohort.
•
Both individual comparison and population comparison gave similar
preference results on experiments we ran.
Allows us to run population effectiveness comparison in lieu of individual
effectiveness comparison! Increases size of cohort pool by order of magnitude.
Chemotherapy : +
Chemotherapy : Hormonal Therapy : +
Hormonal Therapy : 
Chemotherapy is preferred
over hormonal therapy
Demographics Effectiveness Comparison
•
Different demographics may react differently to a given treatment.
•
We conducted population effectiveness comparison study on each
demographic groups of interest.
•
Two types of comparison :
•

Cross-group Comparison : Compares against two different
demographic groups on one treatment.

Within-group Comparison : Compares against two treatments on one
demographics group.
Beta Blocker : Beta Blocker : +
Beta Blocker : Beta Blocker : +
Q : How do we extract patient’s demographics?
Older people prefer beta blockers
than younger people do.
Young
Old
Demographic Extraction
•
Approach 1 : Utilize users’ Profile
•
What if user did not list demographic information?

We implemented rule-learning demographic extraction algorithm to solve this problem.
Demographic Extraction
We introduce rule-learning algorithm to
extract age.
I am 30 years old, ...
…, a 30 year old …
He is 30 years old.
Day 30 for me.
I am 30 years old, ...
…, a 30 year old …
He is 30 years old.
Day 30 for me.
1. Extract all phrases that match users’
profile page demographic information
and mentions in health messages.
2. Run frequent sequence pattern
mining algorithm (PrefixSpan) to mine
frequent patterns.
3. Remove low precision frequent
patterns
Demographic Extraction Performance
Evaluation
# Users
Users w Age
# Inferred
# Inferred & # Inferred &
Precision
has Age
no age
Breast Heart
Cancer Disease
Our approach effectively removed most of
the inferred age that is irrelevant compared
to the baseline approach.
This approach doubled the number of
people with demographic information.
Our Findings
•
We used MedHelp forums as our data source, and selected forum categories based
on diseases of interest.
•
We chose diseases and treatments to conduct experiments from Institute of
Medicine’s 100 CER priority list.
test to determine preference significance.
•
•
Many of the findings were consistent with existing medical literature, such as those
from Cochrane Reviews, Agency for Healthcare Research Quality (AHRQ) and
New England Journal of Medicine.
•
We show some of the results that were statistically significant.

On population effectiveness comparison study, 50% of our findings were consistent with
existing literature. The rest, we weren’t able to find literature that verified our claim.
How big is the cohort pool?
•
Population Effectiveness Comparison :
 Generally
each treatments had thousands of patients.
 For
breast cancer : Radiation (2,393), Chemotherapy (2,878),
Hormonal Therapy (1,680) – approximately 7,000 patients

For heart disease : Anticoagulants (2,162), Inhibitor (2,422),
Blocker (7,257), Device (2,457) – almost 15,000 patients
Population Effectiveness Comparison
Breast Cancer Treatment Comparison
0.5
0.45
0.44
0.41
0.39
0.4
0.34
0.35
0.3
0.29
0.26
Positive
0.25
New 0.2
England Journal of Medicine
Cochrane
:
Review :
Patients
who had radiation therapy
Chemotherapy is advantageous
0.15
showed lower post-treatment side
over
effects
hormonal therapy in
0.1
than those who had hormonal treatments.
reducing tumor response rate
0.05
0
Radiation
Hormonal
Chemotherapy
Negative
Population Effectiveness Comparison
Heart Disease Treatments Comparison
0.6
0.54
0.5
0.43
0.4
0.4
0.35
0.3
0.34
Positive
0.27
Negative
New England Journal ofNew
Medicine:
England Journal of Medicine :
- Warfarin is at least as effective
- Patients
asusing
beta blockers,
devices (pacemakers, ICDs)
but are often times more cost-effective.
often take Warfarin (anticoagulant).
0.1
0.2
0
Blocker
Anticoagulants
Device
Demographic Effectiveness Comparison
Age comparison on beta blockers
Gender comparison on Inhibitors
0.7
0.7
0.6
0.59
0.56
0.62
0.6
0.56
0.5
0.5
Agency for Health Research and Quality
Archives
:
of Internal Medicine :
0.4
0.4
ACE
inhibitors reduce composite efficacy
Younger people have trends of being more
impacted
Positive
Positive
endpoints
similarly in males and females.
by cognitive
impairment than
0.3
Negative
0.3
0.26 older people.
Negative
0.21
0.2
0.2
0.21
0.2
0.1
0.1
0
0
Male
Female
Young
Old
Conclusion
•
•
We introduced how CER hypotheses can be generated using health messages.

We introduced how preference as measured by sentiment can be a good indicator of
treatment effectiveness.

We also introduced high precision demographic extraction algorithm to broaden the
cohort pool.

Personal health messages are scalable. MedHelp was used as our data source, but other
forums can be aggregated to further broaden the cohort pool.
The results from our algorithm was consistent with existing medical literature.
Future Works
•
•
Investigate on signals that can be a good indicator of effectiveness (depth).

Entity relation semantics extraction to analyze relation between treatment and
its aspects (effectiveness, side effects, etc’s)

Shallow Information Extraction approach can be utilized to determine whether
subsection of forum text is about symptoms or treatments.
Merge multiple sources to leverage bigger cohort pool (breadth).

Other medical web forums, such as WebMD, HealthBoards.

Social networks and micro-blogs such as Facebook, Twitter and other sources.
Resolving Healthcare Forum
Posts via Similar Thread Retrieval
Jason H.D. Cho1,4, Parikshit Sondhi1,4,
Chengxiang Zhai1,4, Bruce R. Schatz1,2,3,4
* Slides Courtesy of Parikshit Sondhi
1Department
of Computer Science,
2Institute of Genomic Biology,
3Department of Medical Information Science,
4University of Illinois at Urbana-Champaign, Urbana, IL
Case Retrieval Task
•
Users may often want to conduct research by themselves.
•
•
They may be curious about what disease they have, or which
medications they may take.
Macro-tasks cannot take care of this, since it assumes users
already know what they want already.
Query Characteristics
• Queries meant for human experts not automated systems
• Simple non-technical language
• Presence of emotional statements
Document Characteristics
Envisioned Response
The following threads discuss similar problems:
 Doritos Allergy Very Severe and New
 Certain Foods + Beer = Flushing and Head Pounding…Help!
 Peanut/Food Allergies
……………………
Method Overview
•
•
•
•
Baseline Weighing
•
First Post BM-25
•
Thread BM-25
Semantic Weighing
•
Medical term extraction
•
Shallow Information Extraction
Post Weighing
•
Monotonic Weighing
•
Parabolic Weighing
Forum Category Weighing
•
Uniform Weighing (FCUW)
•
Feedback Weighing (FCFW)
Shallow Information Extraction
I am severly allergic to some product that is found in
both Tostitos and Doritos, as well as random other
types of chips. I know the solution is "don't eat chips"
but what could the product be? I don't
want to accidentally consume it. When I eat this, I get
very bad stomach cramps and it ruins the rest of my
day/night - the only solution is to go to sleep so I can't
feel it. Help! Any ideas on this?
Physical Examination (PE)
Medication (MED)
Background (BKG)
Disease, Symptoms
Treatment, Prevention
Neither PE nor MED
Sondhi, 2010
Medical Entity Extraction
•
Applied ADEPT toolkit (MacLean and Heer 2013)
•
High precision but low recall
Post Weighing
c' ( w, t )
Not all posts are equally representative
Sondhi, 2013
Post Weighing
f (1,3)c( w, p1 )
f (3,3)c(w, p3 )
f (i, K ): gives the weight of post i in a thread with K posts
Monotonic Post Weighing
Relative
Post
Weight
for K=10
m  1
m  3
m  2
Post Position i
Parabolic Post Weighing
Post Weighing Methods Evaluation
Accuracy
0.8
0.7
0.6
Uniform
Monotonic
0.5
Parabolic
0.4
FF
UF
LQ
Forum Used
Cross
Forum
Forum Categories
Forum Category Weighing
•
Relevance feedback based on top k retrieved documents.
•
Forum Category Uniform Weighing (FCUW) : Weighs top-k forum categories equally.
•
Forum Category Feedback Weighing (FCFW) : Weighs forum categories based on
how frequently they appeared on retrieved documents.
Randomly selecting forum ID
Ratio of current forum ID
amongst retrieved
documents
State of the Art Baseline
•
Baseline BM-25 formula:
•
c(w,t): Count of word w in thread t
•
c(w,q): Count of word w in query q
•
FPBM-25: Consider only the content of first post to represent the
thread document
•
TBM-25: Consider content of entire thread to represent the thread
document
ShallowEx: Relevance Scoring
Modified
Query Count
Word count in
PE sentences
Word count in
MED sentences
Word count in
BKG sentences
Give higher importance to PE and MED sentences
MedicalEx: Relevance Scoring
Modified
query
frequency
Count of
occurrences
labeled as med
entity
Count of
occurrences
not labeled as
med entity
Post Weighing: Relevance Scoring
Modified Thread
Frequency
Post Weight
Post Frequency
Forum Category Weighing Scoring
Weights for forum
category weighing
New Score
Forum Category
Feedback Weighing
Method Summary
•
•
•
•
Baseline Weighing
•
First Post BM-25
•
Thread BM-25
Semantic Weighing
•
Medical term extraction
•
Shallow Information Extraction
Post Weighing
•
Monotonic Weighing
•
Parabolic Weighing
Forum Category Weighing
•
Uniform Weighing (FCUW)
•
Feedback Weighing (FCFW)
Evaluation via Pooling
•
350K threads and 20 queries from HealthBoards
•
2 judges first judged 100 query-thread pairs
•
•
88% agreement (κ=0.76)
730 total judged query-thread pairs
•
324 relevant
•
406 irrelevant
Results: Semantic Methods
Run
Method
P@5
Recall@30
MAP
B1
Baseline TBM-25
0.3000
0.2846
0.1977
B2
Baseline FPBM-25
0.4700
0.4975
0.3316
S1
B2+MedEx
0.4600
0.4283
0.2918
S2
B2+ShallowEx
0.53 (12.7%) 0.4847 (-2.5%) 0.3481 (4.9%)
Shallow extraction is better than medical entity
extraction
Results: Post Weighing
Run
Method
P@5
Recall@30
MAP
B2
Baseline FPBM-25
0.4700
0.4975
0.3316
P1
Monotonic
0.5100 (8.5%) 0.5240 (5.3%) 0.3631 (9.5%)
P2
Parabolic
0.5100 (8.5%)
0.5040
0.3494
Both post weighing schemes outperform the
baseline
Results: Forum Category Weighing
Run
P@5
Recall@30
MAP
Baseline FPBM-25
0.4700
0.4975
0.3316
P1
Uniform Weighing
0.5200
(10.6%)
0.4678
(-7.0%)
0.3334 (0.5%)
P2
Feedback Weighing
0.5100
(8.5%)
0.4610
(-7.3%)
0.3389 (2.2%)
B2
Method
Uniform Weighing and Feedback Weighing similar
performance, but FCFW less parameters to tune.
Results: Method Combinations
Run
Method
P@5
Recall@30
MAP
B2
Baseline FPBM-25
0.4700
0.4975
0.3316
S2
Baseline FPBM-25
+ ShallowEx
0.53
0.4847
0.3481
C2
Monotonic
+ ShallowEx
0.5400 (14.9%)
0.5354 (7.6%)
0.3745 (12.9%)
C3
Parabolic
+ShallowEx
0.5100
0.5155
0.3573
Monotonic + ShallowEx performs the best
C4
Monotonic +
ShallowEx + FCFW
0.5200
0.5625 (13.1%)
0.3702
What we Learnt
•
Fairly high P@5 accuracy is achievable
•
Shallow information extraction is better for query understanding
•
Utility of posts drops steadily with position
•
Easy extension of baseline method
Conclusion
•
•
It is possible to address health problems from both macro and micro perspective
using health messages.
•
Macro : Comparative Effectiveness Research
•
Micro : Case retrieval task
Health informatics is an emerging area, lots of works done, lots to be done.
Future works
•
Utilizing Medical web forums
•
Phones can be used to measure health as well!
•
•
Many fitness apps are out on the market.
•
Gait patterns are known to be indicative of health.
If this line of task sound interesting, please feel free to talk to me!
Questions?
Thank you!
Demographics?
•
It is possible that only a subset of demographics may be posting on medical forums.

For example, people who have severe sickness are less likely to post, and those who are more
educated are more likely to use the web.

People who’s had negative experience with treatment more likely to post.
•
However, these forums do not have limitations on geography, while many randomized
trials tend to be limited to particular region, i.e., hospitals that conducted the study.
•
Furthermore, we expect these sampling bias to be evenly distributed across treatments.
•
Finally, while not utilized in our approach, patients often post symptoms or diagnosis
results on the thread post. This allows us to later on sift based on symptoms.
•
People tend to post negative symptoms. These can be expected to be evenly spread out
between other treatments.
How big is the cohort pool?
•
•
Demographic Effectiveness Comparison :

Generally hundreds of cohorts for each treatment and each demographic group.
Examples are for older population.

For breast cancer : Radiation (739), Hormonal (525), Chemotherapy (770)

For heart disease : Anticoagulants (153), Device (166), Inhibitor (217), Blocker (414)
We have only used one source, MedHelp. We can broaden this pool by

Aggregating multiple sources (WebMD, HealthBoards, disease specific forums, or
even micro-blogs such as Twitter.)

Coming up with treatment-agnostic supervised demographic inference algorithm
can broaden the pool as well.
Treatment Lists?
•
We used top-down approach using various reliable sources (such
as Mayo clinic’s website and those from various government
sponsored agencies) to extract keywords.
•
We also used bottom-up approach that utilized UMLS thesaurus
to generate keywords.

MetaMap was initially used to extract treatments from forum threads.

These words were then queried into Medline Plus Connect API to
determine if they indeed belong to the treatment class or not.
References
•
J. H. D. Cho and V. Q. Liao and Y. Jiang and B. Schatz, Aggregating
Personal Health Messages for Scalable Comparative Effectiveness
Research. ACM BCB, 2013
•
J. H. D. Cho and P. Sondhi and C. Zhai and B. Schatz, Resolving
Healthcare Forum Posts via Similar Thread Retrieval. WWW, 2014
•
K. Pattabiraman and P. Sondhi and C. Zhai, Exploiting Forum Thread
Structures to Improve Thread Clustering. ICTIR 2013.
•
P. Sondhi and M. Gupta and C. Zhai and J. Hockenmaier, Shallow
Information Extraction from Medical Forum Data. COLING 2010.
•
B. W. Chee and R. Berlin and B Schatz, Predicting Adverse Drug Events
from Personal Health Messages, AMIA 2011
•
Diana L. MacLean and Jeffrey Heer. Identifying medical terms in patientauthored text: a crowdsourcing-based approach. Journal of the American
Medical Informatics Association, pages amiajnl–2012–001110+, May 2013.
ShallowEx: Extraction Model
Performance results for different feature sets
Percentage Accuracy
76
74
72
70
68
66
Order-1
CRF
SVM
64
62
60
Feature Set
We use the best performing SVM based classifier
(Posts: 175, Sentences: 1494)
Acknowledgements
•
We thank the anonymous reviewers for their insightful comments. This research was
supported in part by Health Information Technology Center (HITC) Fellowship at the
University of Illinois at Urbana-Champaign, and State Farm Doctoral Scholarship. We
would also like to thank Sean Massung for helping the authors with the revision.