Biomedical Informatics and Clinical NLP in

Download Report

Transcript Biomedical Informatics and Clinical NLP in

Biomedical Informatics and
Clinical NLP in Translational
Science Research
Piet C. de Groen, M.D.
Overview - Examples
 Patient-specific research – N=1 study
 Understanding a disease
 Finding the right MD, diagnosis and
treatment
Renal Transplant patient
May, 2005
Hepatobiliary Clinic Consultation
 Abnormal liver tests – using Lipitor™
 Diarrhea and weight loss
Challenge
 Very complex medical history
 Nobody understands the case
 HUGE history with hundreds of notes
Patient January 16, 2006
Total weight of printed pages presented for review:
5 lbs.
Patient January 16, 2006
Total number of X-rays presented for review:
16,902
Questions
 What is exactly the patient’s problem?
– Are liver tests and weight loss due to Lipitor?
– When did she use Lipitor?
– What was the weight on what date?
 Impossible to review all notes!
– Which notes are relevant to current symptoms?
– Which have notes have weights and drug
information?
What I need
 I need to see trends over time
– Weight
– Lipitor use
– Effects of Lipitor on lipids and liver tests
 But I cannot see trends over time
– EMR does not have structured data for weight or
Lipitor use
– EMR only allows for display of laboratory test results
in very large tables or simple graphs
Data Warehouse to the
Rescue!
 Demographics
– MC # = xx-xxx-xxx
 Clinical Notes
– Patient Vitals
 Weight exists
Weight in kg
70
65
60
55
 Result
– 243 notes
 43 had weight
50
Start Dialysis
TransplantNew Problem
45
40
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005
What happened to
Cholesterol?
She was on Lipitor, but:
– When was it discontinued?
– Did it do anything to her lipid levels?
NLP to the rescue!
 Sort 33 identified Clinical Notes on date
 First note is from 1997
– Lipitor is highlighted in the note
– …Dr. X recommended discontinuation of Pravachol
and initiation of Lipitor … have written a prescription
for Lipitor …
 Last note is from 2005
– … Lipitor was discontinued in 2004 …
– March 2004 note confirms discontinuation
Warehouse to the Rescue!
 Demographics
– MC # = xx-xxx-xxx
 Tests
– Cholesterol exists
 Clinical Notes
– “Lipitor”
 Result
Cholesterol in mg/dL
350
300
250
200
150
100
Lipitor
50
0
1993
1995
– 22 cholesterol levels
– 243 notes: 33 mentioned “Lipitor”
1998
2001
2004
2006
Recommendations
 72 hour stool fat on 100 gram fat diet
– 689 gram, 23 gram fat/day (2-7 Normal)
 EGD/EUS with biopsies and aspirate
–
–
–
–
–
–
Esophagitis - ? Candida – biopsy negative
Duodenal diverticula, normal pancreas
Duodenal biopsy normal
Aerobes > 100,000 Gram negative bacillus cfu/mL
Anaerobes > 10,000 Bacteroides Fragilis cfu/mL
Yeast 1,000-10,000 cfu/mL
 Small Bowel X-ray
– Numerous diverticula
Understanding a disease
Hepatocellular Cancer in
Obesity
Spring 2006
Based on simple queries of MCLSS
• For NASH the ICD-9 code 473.8 was
used; this code may include other
diagnoses, but the vast majority is NASH
• For Primary Liver Cancer the ICD-9 codes
155.0 and 155.1 were used
• For Obesity ICD-9 code 278.0 was used,
or Diagnosis section Clinical Notes
• BMI was retrieved from Clinical Notes;
maximum value during life time was used
Primary Liver Cancer
NASH Cases with BMI>30
40
35
30
25
Cases
Males
20
Females
15
10
5
0
1992
1994
1996
1998
2000
2002
2004
2006
Cancers with Increasing Incidence
2012 report US: 1999 through 2008
CA: A Cancer Journal for Clinicians
Volume 62, Issue 2, pages 118-128, 4 JAN 2012 DOI: 10.3322/caac.20141
http://onlinelibrary.wiley.com/doi/10.3322/caac.20141/full#fig2
Finding the right MD,
diagnosis and treatment
Interval Colorectal Cancer
Time Line
Example of Interval Colorectal Cancer
Pathology
< 3 years
Endoscopy
Diagnoses
Time Line
1
2
3
4
5
Year
Benign Colon
Colon Cancer
Non-Colon Disease
(Endoscopy data)
325,370 Procedures
Part description = “COL/RECT”
AND Valid MCN
Patients with CC diagnosis
and C procedure
238,177 specimens
2,692 patients
Diagnosis_code = One of 50
identified cancer diagnosis codes
Extract all C procedures, the
date and other features
19,259 specimens
4,743 procedures (date,
other features)
Unique? (One specimen may have
multiple diagnosis codes)
Compare the CC diagnosis
and C dates
13,477 specimens
(10,136 patients)
Remove Patients with Research
Authorization = ‘No’
Missed Lesions (Anatomic
location, tumor size, other
characteristics)
Colonoscopy 1992- 2004
Colon Cancer 1993-2006
(Pathology data)
4,203,857 specimens
Methods
Negative History
Pathology = Colorectal Cancer
No lesions at colonoscopy
Truly Missed
1
Probably Missed
1
2
3
4
5
2
3
4
5
Year
Lesions at colonoscopy
Seen, removed
1
2
3
4
5
Seen, not removed
1
2
3
4
5
1
2
Colorectal Cancer History
Recurrent, 2nd, 3rd
cancer not prevented
3
4
5
Results Summary
• Truly missed case
– 90 days to 3 years
82
• Probably missed case
– 3 to 5 years
54
>283
• A lesion was seen
– removed <5 years
– not removed <5 years
• Local recurrence or
2nd, 3rd cancer
95
8
>44
©Ralph A. Clevenger
Tumor Growth Curves
3 Months
Doubling Time
100
90
80
t=3
yrs
Truly Missed
Probably Missed
Seen & Removed
Recurrent, 2nd, 3rd
Tumor Size (mm)
70
60
50
40
30
20
10
5
0
0
200
400
600
800
1000
Time Interval (days)
1200
1400
1600
1800
Numbers for each Endoscopist
20
Truly Missed
Probably Missed
Seen & Removed
Recurrent, 2nd, 3rd
15
Number
Not
Detected
10
5
0
2
4
6
8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46
2
4
6
8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46
100
Number
Seen
50
0
Miss Rate for each Endoscopist
Truly Missed
Probably Missed
Seen & Removed
25
20
% Not
Detected
15
10
5
0
2
4
6
8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46
Endoscopist
8
7
Detection of cancers in previously seen patients (self)
6
5
4
3
2
1
0
2
4
6
8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46
8
7
6
5
4
3
2
1
0
Detection of cancers in patients seen by colleagues (others)
Overview - Examples
 Patient-specific research – N=1 study
 Understanding a disease
 Finding the right MD, diagnosis and
treatment