Comparing Association Rules and Decision Trees for Disease

Download Report

Transcript Comparing Association Rules and Decision Trees for Disease

國立雲林科技大學
National Yunlin University of Science and Technology
Comparing Association Rules and
Decision Trees for Disease Prediction
Advisor : Dr. Hsu
Presenter : Yu-San Hsieh
Author
: Carlos Ordonez
2006. CIKM.17-24
1
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Outline





Motivation
Objective
Method
Experiments
Conclusions
2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Motivation

The mining association rules exits some
questions in a medical data set
─
─
─

Irrelevant
Most relevant rules appear only at low support
The number of discovered rules becomes large at low
support
The number of rules makes search slow and
interpretation by the domain expert difficult.
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Objective

We propose search constraints to find only
medically significant association rules and
make search more efficient.
4
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Method
Medical
dataset

Transforming
Search constraints
─
─
Phase 1
─
─
Phase 2
Phase 1
Search
Constraints
Support
confidence
User-specified maximum item-set size κ
group : A→g group(Aj) = gj
group(AGE)=0  AGE is not group-constrained
group(AL)=1 AL is constrained to belong group 1
group(attribute(a)) ≠ group(attribute(b))
(-1.0<= IL < 0.2) and (-1.0 <= LA < 0.2) are not in the same itemset
ac : A→C  ac(Aj) = cj AGE  LAD
ac(AGE) = 1 AGE is in antecedent
5
ac(LAD)
=2  LAD is in consequent
Intelligent Database Systems Lab
Phase 2
Association rules
Experiments

N.Y.U.S.T.
I. M.
Decision tree
The medical data set
─
─
655 patients and 25 attribute (numeric and categorical)
Three basic elements for analysis



─
Default parameter setting



─
Perfusion defect
Coronary stenosis
Risk fatocr
Maximal itemset size κ=4
Minimum support = 1%
Minimum confidence = 70%
Negation, ac and Group
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Conclusions

The decision tree are less effective than
constrained association rule
─
─
─
─
─
Predict disease with several related target attribute
Low confidence factor
Slight overfitting
Rule complexity
Data set fragmentation
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
My opinion

Advantage
─

Drawback
─
─

Producing medically useful rules, reducing the number
of discovered rules and improving running time
Lack of quantitative evaluation
Most of rules’ analysis
Application
─
─
Prediction
Classification
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Method

Transformed to binary dimension
─
─

Numerical data: age
0< age <=40 and 40< age <=60
Categorical data: sex
sex = Male and sex = Female
First constraint
─
An attribute has negation
Additional items are created and corresponding to each
negated categorical value or each negated interval
example: not(0 <= LM < 30), not(0 <= LAD <50), not(0 <= LCX <50)……
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiments
 Predictive
healthy
10
association rule
LCX
diseased
LAD
RCA
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiments

Predictive Decision tree
─
─
─
Using the CN4.5 decision tree algorithm
Focused on predicting LAD disease (LAD≧50 as the
target class)
Result : maximal height = 3
Numeric dimensions and automatic splits
Manually binned variable
Confidence↓,not useful
11
Confidence↓
Intelligent Database Systems Lab