Diapositive 1

Download Report

Transcript Diapositive 1

HASAR : Mining Sequential
Association Rules for
Atherosclerosis Risk Factor
Analysis
Laurent Brisson, Nicolas Pasquier, Céline Hebert,
Martine Collard
I3S Laboratory, University of Nice-Sophia Antipolis
GREYC Laboratory, University of Caen
Contents
1. Analytic question & Objectives
2. Model & Data Preparation
3. Algorithms
4. Results
Analytic Question
Are there any differences in the
development of risk factors and other
characteristics between men of the risk
group, who came down with the
observed cardiovascular diseases and
those who stayed healthy ?
Objectives
Evolution of Risk Factors according behavioural
changes
• Groups RG versus PG and NG
• Healthy patients (NCVD) versus those with
cardiovascular diseases (CVD)
• Groups based on patient education level and job
Sequential Rules
IDE_itemset  BEH_time_itemset  RF_time_item
• IDE_itemset : static identification attributes
Age of the patient
 Educational level of the patient
 Alcohol consumption at the beginning of the
study

Sequential Rules
IDE_itemset  BEH_time_itemset  RF_time_item
• BEH_time_itemset : behavioural change attributes
Comsumption of cigarettes a day
 Physical activity after job
 Physical activity in a job
 Different kinds of diet
 Medecine for cholesterol
 Medecine for blood pressure

Sequential Rules
IDE_itemset  BEH_time_itemset  RF_time_item
• RF_time_item : risk factor change attribute
Cholesterol level
 HDL Cholesterol level
 LDL Cholesterol level
 Triglycerides level
 Obesity
…

Model
IDE_itemset  BEH_time_itemset  RF_time_item
• Action period
where it occurs at least one control
• Latency period
a waiting time before observing effects
• Observation period
where it occurs only one control
Data Preparation : creation of
changes variables
Control N
Entry.height = h
Control.weight = w
BEH_OBESITY
id
Unknown
Unknown
Stay_normal
-2
h or w = *
-2 h or w = Unknown
1
w / h² <= 25
Decreased
Increased
Stay_high
2
3
4
w / h² > 25
w / h² <= 25
w / h² > 25
Control N+1
Entry.height = h
Control.weight = w
h or w = Unknown
h or w = *
w / h² <= 25
w / h² <= 25
w / h² > 25
w / h² > 25
Data Preparation : Flattening
operation
Initial table : 1 row  1 control
ID
Control
ID
Patient
BEH 1 … BEH j
RF 1 … RF k
Data Preparation : Flattening
operation
Flattened table : 1 row  1 patient
static attributes
ID
IDE 1
Patient
…
IDE j
BEH 1
…
BEH k
RF 1
…
RF m
control 1
…
BEH 1
…
BEH k
RF 1
…
RF m
control n
Evolutionnary Approach
A Genetic Algorithm searching for temporal rules
Fixed-length chromosome
Identification
Behaviours
Risk factor
Evolutionnary Approach
A gene for each static identification attributes
IDE 1 … IDE j
Behaviours
Risk factor
Evolutionnary Approach
A gene for each kind of behavioural changes
Identification BEH 1 … BEH k
Action period
Risk factor
Evolutionnary Approach
One gene to describe a risk factor
Identification
Behaviours
Action period
RF i
Observation period
Evolutionnary Approach
Fitness function : support * confidence * lift
Latency period
Identification
Behaviours
Action period
RF i
Observation period
Genetic Algorithm Optimization
• A CLOSE based approach for initialization
• CLOSE algorithm improves:
 extraction efficiency reducing the search-space
(use of generators and frequent close itemset)
 results relevance suppressing redondant rules
(bases generation)
Results : Patient classes
comparison
• Best rules on PG versus NG and RG
Results : Patient classes
comparison
• Best rules on CVD versus NCVD
Results : Initialization Methods
• Comparison on RG group
Conclusion
• Different tendencies among groups
• Confirmation of prior medical knowledge
• Contradictions with some "assumptions"
• Further investigations with assistance of
medical experts
Future Researches
• To analyse relationships between time
windows and various risk factors
• To Develop new evaluation criteria
• To Integrate physician’s prior knowledge
• To apply HASAR approach to other
temporal datasets