Identifying Interesting Association Rules with
Download
Report
Transcript Identifying Interesting Association Rules with
Identifying Interesting Association
Rules with Genetic Algorithms
Elnaz Delpisheh
York University
Department of Computer Science and Engineering
April 6, 2016
Data mining
Too much
data
•I = {i1,i2,...,in} is a set of items.
•D = {t1,t2,...,tn} is a transactional database.
•ti is a nonempty subset of I.
•An association rule is of the form AB, where
A and B are the itemsets, A⊂ I, B⊂ I, and
A∩B=∅ .
•Apriori algorithm is mostly used for association
rule mining.
•{milk, eggs}{bread}.
2
Data
Data Mining
Association
rules
Apriori Algorithm
TID
3
List of item IDs
T100
I1,I2,I3
T200
I2, I4
T300
I2, I3
T400
I1,I2,I4
T500
I1, I3
T600
I2, I3
T700
I1, I3
T800
I1, I2, I3, I5
T900
I1, I2, I3
Apriori Algorithm (Cont.)
4
Association rule mining
Too much
data
Data
Data Mining
Too many
association
rules
Association
rules
5
Interestingness criteria
Comprehensibility.
Conciseness.
Diversity.
Generality.
Novelty.
Utility.
...
6
Interestingness measures
Subjective measures
Data and the user’s prior knowledge are considered.
Comprehensibility, novelty, surprisingness, utility.
Objective measures
The structure of an association rule is considered.
Conciseness, diversity, generality, peculiarity.
Example: Support
It represents the generality of a rule.
It counts the number of transactions containing both A and B.
7
Drawbacks of objective measures
Detabase-dependence
Lack of knowledge about the database
Threshold dependence
Solution
Multiple database reanalysis
Problem
o Large number of disk I/O
Detabase-independence
8
Genetic algorithm-based learning
(ARMGA )
Initialize population
2. Evaluate individuals in population
3. Repeat until a stopping criteria is met
1.
Select individuals from the current population
B. Recombine them to obtain more individuals
C. Evaluate new individuals
D. Replace some or all the individuals of the current population
by off-springs
A.
4.
9
Return the best individual seen so far
ARMGA Modeling
Given an association rule XY
Requirement
Conf(XY) > Supp(Y)
Aim is to maximise
10
ARMGA Encoding
Michigan Strategy
Given an association k-rule XY, where X,Y⊂I, I is
a set of items I=i1,i2,..., in, and X∩Y=.
For example
{A1,...,Aj}{Aj+1,...,Ak}
11
ARMGA Encoding (Cont.)
The aforementioned encoding highly depends on the length
of the chromosome.
We use another type of encoding:
Given a set of items {A,B,C,D,E,F}
Association rule ACFB is encoded as follows
00A11B00C01D11E00F
00: Item is antecedent
11: Item is consequence
01/10: Item is absent
12
ARMGA Operators
Select
Crossover
Mutation
13
ARMGA Operators-Select
Select(c,ps): Acts as a filter of the chromosome
C: Chromosome
Ps: pre-specified probability
14
ARMGA Operators-Crossover
This operation uses a two-point strategy
15
ARMGA Operators-Mutate
16
ARMGA Initialization
17
ARMGA Algorithm
18
Empirical studies and Evaluation
Implement the entire procedure using Visual C++
Use WEKA to produce interesting association rules
Compare the results
19
20