Transcript Document

Automated Vocabulary Maintenance System for the Open
Access, Collaborative Consumer Health Vocabulary
Kristina M Doing-Harris, BCompSci, MA, MS, PhD; Qing Zeng-Treitler, PhD
Department of Biomedical Informatics, University of Utah, Salt Lake City, USA
Introduction
•Controlled vocabularies play an important role in the development of
biomedical informatics applications.
•Consumer health vocabulary (CHV), has been rising in prominence.
•Controlled vocabularies require maintenance and update, due to the
continuing evolution of language itself.
•In healthcare especially there is a constant stream of new names (e.g. new
medications, disorders, tests) being coined in the literature.
• CHV must keep up with these changes in the language used by consumers.
Main Question
How can a consumer health vocabulary evolve with
consumer language?
Schematic Diagram of the AVM system
PatientsLikeMe : Patients Helping
Patients Live Better Every Day.
Secure login
Join today!
You appear to have JavaScript disabled in
your browser.
PatientsLikeMe relies on JavaScript and
Cookies to deliver the best possible
experience to you.
How do I enable JavaScript?
Find Patients Just Like You
I wish this site was around years ago as I
lost so much time and money doing what
didn't work.
Multiple Sclerosis Community Member ;
Find a patient like you now
Current Disease Communities
Prevalent Diseases
ALS/MND
ID
51
52
53
54
55
56
Stage 1 (A,B & C)
57
58
59
PatientsLikeMe.com
Raw text file excerpt
ID
Term
60
firstPOS_N
1
1
1
…
1…
1…
1…
0
0
0
1
0
1
Frequency
…
…
…
2560 …
490 …
228 …
34
34
1
0
1
0
1
0
…
34
…
0
…
…
…
…
1…
freq_in_subs
0
1214
83
9428
9394
9546
…
34
34
34
…
…
…
42 …
9360
9478
9614
27
Excerpt from n-gram database
CUI
68 Eszopiclone
164 Piroxicam
214 Back pain
366 Adherence
402 Celecoxib
403 diagnose medical conditions
404 using Ankle-Foot Orthosis
405 60 mg
Type ALS Motor Neuron Disease
406
tom Sex
407 Nebulizer Treatment device
408 Efficacy Reasons taken of patients
Aquatic Therapy Exercises
409
Treatment Report
410 See all patient evaluation
Ngram
in_NP
initfunction
pageInit100
shortlinks35
over navigation PatientsLikeMe
Share your real-world symptom
navigation PatientsLikeMe Share
your real-world symptom
navigation PatientsLikeMe Share
your real-world symptom
experience
PatientsLikeMe Share your realworld symptom
PatientsLikeMe Share your realworld symptom experience
PatientsLikeMe Share your realworld symptom experience with
Share your real-world symptom
isTerm
C1436328
C0031990
C0004604
cScore
Frequency
inGoldStand inMedRec
0
0
0
3.6203
3.6203
1.894
3.6203
3.6203
3.3945
3.366
3.2662
0
0
9
0
0
0
0
31
31
61
9
1032
31
10
9
31
1
0
1
1
1
0
0
0
1
0
0
1
1
0
0
0
0
3.3945
0
9
0
0
0
0
3.6203 9.509775004
2.3304
0
9
131
1
0
0
0
0
4.0203 11.60964047
5
0
0
0
3.3945
20
0
0
0
C0538927
0
Excerpt from potential term database
Stage 2 (C)
CHV Update Wiki
www.ConsumerHealthVocab.utah.edu/AutoVocabMaint
Results
•Combined: Termhood score threshold of 3.6 for terms found in
the medical records and C-value threshold of 15.
•Produced 774 candidate terms, with 237 valid terms.
•Reviewers will find 1 valid term for every 3 or 4 candidate terms.
•Better than initial n-gram list with an average of 1 valid term for
every 137 candidate terms.
Summary of Conclusions
• Social network data can be used to provide a living corpus.
• It can be mined to provide new consumer health vocabulary terms.
• Using ATR and dictionary look up can produce a concise list of
candidate terms.
• Allowing the consumer health vocabulary to evolve with consumer
language.
CHV Website
www.ConsumerHealthVocab.org
Acknowledgements
NLM Training Grant No. RO1
LM07222
Contact Information
[email protected]