AICML-W08-8minTalkOsmar - Department of Computing Science

Download Report

Transcript AICML-W08-8minTalkOsmar - Department of Computing Science

Contact: Osmar R. Zaïane
Data Mining for Health Applications
Information Extraction
Digitized text
database
Perinatal Data Analysis
UNIVERSITY OF
ALBERTA
– Goals:
250,000 birth
records,
Central &
Northern
Alberta, over
12 years.
Text, XML or
web pages
Osmar R. Zaïane, Ph.D.
Associate Professor
– Data is noisy and missing information
Department of Computing
Science
Abou t 1%
are
pre t e r m.
Automatic
Information
Extraction
120
100
Pr e t e r m b i r t h t i e d t o
li f elong heal t h
problems
•Regular for 50 to 69
•Regular if prescribed
for 40 to 49 and over 70
Health Canada reports that only 12% of eligible women in Alberta underwent regular
screening in 2002. Today only 40% according to ACB. Goal: 80%. Most are single readings.
25% double reading selected randomly (not enough staff)
False-Positive and False-Negatives vary among
Radiologists [3.5% to 21%] (American Cancer
Society)
Appointed
Scientific
Goal
1: Mammography Classification (build a
tool that ranks mammograms by priority – 2nd
Director as
of recommendation)
screening
prototype: limited visual features; classifies
January – Current
malignant, benign, normal; accuracy about 80%.
– Are there better visual feature to exploit?
60
40
Telephone: Office +1 (780) 492 2860
Fax +1 (780) 492 1071
E-mail: [email protected]
Data Mining Canonical Tasks
http://www.cs.ualberta.ca/~zaiane/
20
7
20 7
03
1
20 7
02
1
20 7
01
1
20 7
00
1
19 7
99
1
19 7
98
1
19 7
97
1
19 7
96
1
19 7
95
1
1
0
19 7
94
1
RelationalAthabasca
Database
352
Hall
Data Anonymization, Privacy Preservation, …
Edmonton, Alberta
Breast CancerCanada
Detection
T6G 2E8
80
19 7
93
1
Since 1999
“Screening mammograms can detect breast cancer &
early detection increases the chance of successful
treatment”
• Understand preterm births in Alberta
• Understand what causes risk in pregnancy
• Predict pre-term births  Decision Support for
Hospital
• Recommend what data needs to be collected
19
92
…
PRINCIPAL DIAGNOSIS: Anemia and GI bleed.
SECONDARY DIAGNOSES: Diabetes , mitral valve replacement , atrial
fibrillation , and chronic kidney disease.
HISTORY OF PRESENT ILLNESS: The patient is an 86-year-old woman
with a history of diabetes , chronic kidney disease , congestive heart failure
with ejection fraction of 45% to 50% who presents from clinic with a chief
complaint of fatigue and weakness for one week. She had had worsening
right groin and hip pain , status post a total hip replacement approximately 13
years ago which had been worsening for two weeks , and she has also
recently completed a course of Levaquin for urinary tract infection. She
presented to Dr. Parrent office complaining of fatigue and weakness for one
week. She has had some abdominal pain in a band-like distribution around
her right side. She was found to have a hematocrit of 21 down from 30 eight
days ago and was sent to the emergency department for transfusion and
workup of her anemia.
PRE-ADMISSION MEDICATIONS: Caltrate plus D one tab p.o. b.i.d. ,
Lantus 7 units SC q.p.m. , NovoLog 4 units/4 units/5 units
SC t.i.d. , Imdur 30 mg b.i.d. , amlodipine 5 mg b.i.d. , furosemide 80 mg
daily , valsartan 120 mg daily , warfarin 4 mg daily , iron sulfate 325 mg p.o.
daily , and multivitamin daily.
PAST MEDICAL HISTORY: Chronic kidney disease , presumed due to
congestive heart failure/diuresis/renal artery disease/early diabetic
nephropathy; type 2 diabetes; previous stroke; congestive heart failure with
ejection fraction of 45% to 50%; rheumatic valvular disease with mitral valve
replacement and tricuspid valve repair; atrial fibrillation; history of small
bowel obstruction; status post right total hip replacement approximately 13
years ago.
FAMILY HISTORY: No family history of kidney disease or heart
disease.
SOCIAL HISTORY: She has 10 children , lives alone with home care in ME
, but has moved in to live with her daughter in News Irv In She denies
tobacco use and drinks alcohol rarely. ALLERGIES: Codeine and Benadryl.
ADMISSION PHYSICAL EXAMINATION: Vital signs were temperature
96.7 , heart rate 60 , blood pressure 153/74 , respirations 22 , and
SaO2 95% on room air. The patient is a frail elderly woman in no acute
distress. She has poor dentition. JVP is difficult to assess secondary to
tricuspid regurgitation. Lungs were clear to auscultation bilaterally.
Cardiovascular exam showed bradycardia with heart rate in the 50s that was
irregular , S1 plus S2 with 3/6 systolic murmur heard throughout with
mechanical sounding S2. Abdomen was mildly tender to palpation in the mid
epigastrium with no rebound or guarding. Extremities showed venous stasis
changes in her lower extremities bilaterally. Feet were cool with diminished
DP and PT pulses. On neurological exam , she was alert and oriented x3 and
cranial nerves II through XII were intact.
…
> Association Rules
– Efficient discovery of frequent itemsets
– Automatically finding relationships in large data
Database
> Supervised Learning
Artificial– Associative
classifier – rule-based and transparent
Management
model
Intelligencelearning
Systems
> Unsupervised
Learning
– HCI
Parameter-free clustering
– Clustering in high dimensional spaces & sub-spaces
> Outlier Detection
– Finding aberrations in data and ranking outliers.
> Privacy Preservation
Goal 2: Breast MRI Classification
What are the appropriate visual features to use?
– Sharing data without compromising data privacy or
jeopardizing data mining outcome – a tradeoff.
Contact: Osmar R. Zaïane
Data Mining for Health Applications
Information Extraction
Perinatal Data Analysis
Digitized text
database
– Goals:
250,000 birth
records,
Central &
Northern
Alberta, over
12 years.
Text, XML or
web pages
Abou t 1%
are
pre t e r m.
Automatic
Information
Extraction
• Understand preterm births in Alberta
• Understand what causes risk in pregnancy
• Predict pre-term births  Decision Support for
Hospital
• Recommend what data needs to be collected
– Data is noisy and missing information
120
100
60
40
20
7
20 7
03
1
20 7
02
1
20 7
01
1
20 7
00
1
19 7
99
1
19 7
98
1
19 7
97
1
19 7
96
1
19 7
95
1
1
0
19 7
94
1
Data Anonymization, Privacy Preservation, …
80
19 7
93
1
Relational Database
Pr e t e r m b i r t h t i e d t o
li f elong heal t h
problems
19
92
…
PRINCIPAL DIAGNOSIS: Anemia and GI bleed.
SECONDARY DIAGNOSES: Diabetes , mitral valve replacement , atrial
fibrillation , and chronic kidney disease.
HISTORY OF PRESENT ILLNESS: The patient is an 86-year-old woman
with a history of diabetes , chronic kidney disease , congestive heart failure
with ejection fraction of 45% to 50% who presents from clinic with a chief
complaint of fatigue and weakness for one week. She had had worsening
right groin and hip pain , status post a total hip replacement approximately 13
years ago which had been worsening for two weeks , and she has also
recently completed a course of Levaquin for urinary tract infection. She
presented to Dr. Parrent office complaining of fatigue and weakness for one
week. She has had some abdominal pain in a band-like distribution around
her right side. She was found to have a hematocrit of 21 down from 30 eight
days ago and was sent to the emergency department for transfusion and
workup of her anemia.
PRE-ADMISSION MEDICATIONS: Caltrate plus D one tab p.o. b.i.d. ,
Lantus 7 units SC q.p.m. , NovoLog 4 units/4 units/5 units
SC t.i.d. , Imdur 30 mg b.i.d. , amlodipine 5 mg b.i.d. , furosemide 80 mg
daily , valsartan 120 mg daily , warfarin 4 mg daily , iron sulfate 325 mg p.o.
daily , and multivitamin daily.
PAST MEDICAL HISTORY: Chronic kidney disease , presumed due to
congestive heart failure/diuresis/renal artery disease/early diabetic
nephropathy; type 2 diabetes; previous stroke; congestive heart failure with
ejection fraction of 45% to 50%; rheumatic valvular disease with mitral valve
replacement and tricuspid valve repair; atrial fibrillation; history of small
bowel obstruction; status post right total hip replacement approximately 13
years ago.
FAMILY HISTORY: No family history of kidney disease or heart
disease.
SOCIAL HISTORY: She has 10 children , lives alone with home care in ME
, but has moved in to live with her daughter in News Irv In She denies
tobacco use and drinks alcohol rarely. ALLERGIES: Codeine and Benadryl.
ADMISSION PHYSICAL EXAMINATION: Vital signs were temperature
96.7 , heart rate 60 , blood pressure 153/74 , respirations 22 , and
SaO2 95% on room air. The patient is a frail elderly woman in no acute
distress. She has poor dentition. JVP is difficult to assess secondary to
tricuspid regurgitation. Lungs were clear to auscultation bilaterally.
Cardiovascular exam showed bradycardia with heart rate in the 50s that was
irregular , S1 plus S2 with 3/6 systolic murmur heard throughout with
mechanical sounding S2. Abdomen was mildly tender to palpation in the mid
epigastrium with no rebound or guarding. Extremities showed venous stasis
changes in her lower extremities bilaterally. Feet were cool with diminished
DP and PT pulses. On neurological exam , she was alert and oriented x3 and
cranial nerves II through XII were intact.
…
Breast Cancer Detection
“Screening mammograms can detect breast cancer &
early detection increases the chance of successful
treatment”
•Regular for 50 to 69
•Regular if prescribed
for 40 to 49 and over 70
Health Canada reports that only 12% of eligible women in Alberta underwent regular
screening in 2002. Today only 40% according to ACB. Goal: 80%. Most are single readings.
25% double reading selected randomly (not enough staff)
False-Positive and False-Negatives vary among
Radiologists [3.5% to 21%] (American Cancer
Society)
Goal 1: Mammography Classification (build a
tool that ranks mammograms by priority – 2nd
screening recommendation)
– Current prototype: limited visual features; classifies
malignant, benign, normal; accuracy about 80%.
– Are there better visual feature to exploit?
Data Mining Canonical Tasks
> Association Rules
– Efficient discovery of frequent itemsets
– Automatically finding relationships in large data
> Supervised Learning
– Associative classifier – rule-based and transparent
learning model
> Unsupervised Learning
– Parameter-free clustering
– Clustering in high dimensional spaces & sub-spaces
> Outlier Detection
– Finding aberrations in data and ranking outliers.
> Privacy Preservation
Goal 2: Breast MRI Classification
What are the appropriate visual features to use?
– Sharing data without compromising data privacy or
jeopardizing data mining outcome – a tradeoff.
Data Mining Canonical Tasks
> Association Rules
– Efficient discovery of frequent itemsets
– Automatically finding relationships in large data
> Supervised Learning
– Associative classifier – rule-based and transparent
learning model
> Unsupervised Learning
– Parameter-free clustering
– Clustering in high dimensional spaces & sub-spaces
> Outlier Detection
– Finding aberrations in data and ranking outliers.
> Privacy Preservation
– Sharing data without compromising data privacy or
jeopardizing data mining outcome – a tradeoff.
Perinatal Data Analysis
Preterm birth tied to
lifelong health problems
– Goals:
– Data is noisy and missing information
120
100
80
60
40
20
7
20 7
03
1
20 7
02
1
20 7
01
1
20 7
00
1
19 7
99
1
19 7
98
1
19 7
97
1
19 7
96
1
19 7
95
1
19 7
94
1
1
0
19 7
93
1
About 1%
are
preterm.
• Understand preterm births in Alberta
• Understand what causes risk in pregnancy
• Predict pre-term births  Decision Support for
Hospital
• Recommend what data needs to be collected
19
92
250,000 birth
records,
Central &
Northern
Alberta, over
12 years.
Breast Cancer Detection
“Screening mammograms can detect breast cancer & early
detection increases the chance of successful treatment”
• Regular for 50 to 69
• Regular if prescribed
for 40 to 49 and over 70
Health Canada reports that only 12% of eligible women in Alberta underwent regular
screening in 2002. Today only 40% according to ACB. Goal: 80%. Most are single
readings. 25% double reading selected randomly (not enough staff)
False-Positive and False-Negatives vary among Radiologists [3.5% to 21%]
(American Cancer Society)
Mammography Classification
build a tool that ranks mammograms by
priority  recommends 2nd screening
– Current prototype: limited visual features; classifies
malignant, benign, normal; Error Rate 20%.
– Are there better visual feature to exploit?
Breast Cancer Detection (future?)
Mammograms are relatively cheap to produce
but have MANY disadvantages.
MRI is expensive but has many advantages
© Siemens Medical
• Build a tool for Breast MRI Classification
• What are the appropriate visual features
to use?
Information Extraction
…
PRINCIPAL DIAGNOSIS: Anemia and GI bleed.
SECONDARY DIAGNOSES: Diabetes , mitral valve replacement , atrial
fibrillation , and chronic kidney disease.
HISTORY OF PRESENT ILLNESS: The patient is an 86-year-old woman
with a history of diabetes , chronic kidney disease , congestive heart failure
with ejection fraction of 45% to 50% who presents from clinic with a chief
complaint of fatigue and weakness for one week. She had had worsening
right groin and hip pain , status post a total hip replacement approximately 13
years ago which had been worsening for two weeks , and she has also
recently completed a course of Levaquin for urinary tract infection. She
presented to Dr. Parrent office complaining of fatigue and weakness for one
week. She has had some abdominal pain in a band-like distribution around
her right side. She was found to have a hematocrit of 21 down from 30 eight
days ago and was sent to the emergency department for transfusion and
workup of her anemia.
PRE-ADMISSION MEDICATIONS: Caltrate plus D one tab p.o. b.i.d. ,
Lantus 7 units SC q.p.m. , NovoLog 4 units/4 units/5 units
SC t.i.d. , Imdur 30 mg b.i.d. , amlodipine 5 mg b.i.d. , furosemide 80 mg
daily , valsartan 120 mg daily , warfarin 4 mg daily , iron sulfate 325 mg p.o.
daily , and multivitamin daily.
PAST MEDICAL HISTORY: Chronic kidney disease , presumed due to
congestive heart failure/diuresis/renal artery disease/early diabetic
nephropathy; type 2 diabetes; previous stroke; congestive heart failure with
ejection fraction of 45% to 50%; rheumatic valvular disease with mitral valve
replacement and tricuspid valve repair; atrial fibrillation; history of small
bowel obstruction; status post right total hip replacement approximately 13
years ago.
FAMILY HISTORY: No family history of kidney disease or heart
disease.
SOCIAL HISTORY: She has 10 children , lives alone with home care in ME
, but has moved in to live with her daughter in News Irv In She denies
tobacco use and drinks alcohol rarely. ALLERGIES: Codeine and Benadryl.
ADMISSION PHYSICAL EXAMINATION: Vital signs were temperature
96.7 , heart rate 60 , blood pressure 153/74 , respirations 22 , and
SaO2 95% on room air. The patient is a frail elderly woman in no acute
distress. She has poor dentition. JVP is difficult to assess secondary to
tricuspid regurgitation. Lungs were clear to auscultation bilaterally.
Cardiovascular exam showed bradycardia with heart rate in the 50s that was
irregular , S1 plus S2 with 3/6 systolic murmur heard throughout with
mechanical sounding S2. Abdomen was mildly tender to palpation in the mid
epigastrium with no rebound or guarding. Extremities showed venous stasis
changes in her lower extremities bilaterally. Feet were cool with diminished
DP and PT pulses. On neurological exam , she was alert and oriented x3 and
cranial nerves II through XII were intact.
…
Digitized text
database
Text, XML or
web pages
Automatic
Information
Extraction
Relational Database
Data Anonymization, Privacy Preservation, …