Transcript LCS E2

LEARNING CLASSIFIER SYSTEMS
APPLICATIONS
CITS4404 – GROUP E2
Kieran Quirke-Brown - 20927193
Jason Seotis - 21877352
Sebastian Williams - 20919522
SEMINAR OUTLINE
• LCS Overview
• E-Fraud Detection
• Breast Cancer Grade Classification
OVERVIEW
• LCS is a rule-based machine learning algorithm
• Classifiers define the rules in an LCS
• 2 types of parameters
• Rule parameters – define addition information about each rule e.g. fitness
• System parameters - used to control the learning process e.g. population size and the
learning rate
CLASSIFIERS
• A classifier represents knowledge in the system and is made up of a rule and
it’s parameters
• The size of the population is defined by a system parameter, N, which
specifies the maximum number of classifiers in the system
CLASSIFIER STRUCTURE
66
Fitness
00#10#11
01
...
...
Condition
Action
5
Experience
Rule
Rule Parameters
Classifier
LCS STYLES
• Two main styles
• Pittsburgh
• Michigan
• Both applications implement Michigan-style
LCS TYPES
• Two main ways to calculate rule fitness
• Strength-based e.g. ZCS
• Accuracy-based e.g. XCS
• Both applications implement a continuous version of XCS called XCSR
• Uses interval-based representation instead of ternary representation
• A rule matches if the input parameter falls within the interval defined for that rule
XCSR OVERVIEW
Enviroment
Input
(0.2, 0.6)
Population
C
A F
(0.1-0.3)(0.2-0.6):
(0.7-0.8),(#):
(0.1-0.2)(0.5-0.7):
(1.1-1.2)(0.5):
(#),(0.5-1.3):
…
1
1
0
1
0
44
22
33
11
5
Issue reward & update
rule parameters
Perform action = 1
Match Set
Action Set
(0.1-0.3)(0.2-0.6): 1 44
(0.1-0.2)(0.5-0.7): 0 33
(#),(0.5-1.3):
0 5
Prediction Array
Action 1 = 44
Action 0 = 38
(0.1-0.3)(0.2-0.6): 1 44
Action
Selection
Genetic Algorithm
TS/RWS
LEARNING TYPES
• Supervised Learning
- Comparing a system’s response to the expected response
- E.g. Breast Cancer Classification
• Reinforced Learning
- Classifiers are updated using feedback from the environment
- E.g. E-Fraud Detection
E-FRAUD DETECTION
AN LCS IMPLEMENTATION BY MOHAMMAD BEHDAD
QUICK FACTS
WHAT PERCENTAGE OF AUSTRALIANS (15 YO AND OLDER) FELL VICTIM TO FRAUD IN 2014-15?
Approx. 8.5% ~ 1.6 million
FRAUDULENT ACTIVITIES COSTED AUSTRALIANS $1 BILLION IN 2010-11
THIS AMOUNT DOUBLED TO $2.1 BILLION IN 2014-15
TYPES OF E-FRAUD
• Phishing
• Spam
• Network intrusion
XCSR SYSTEM PARAMETERS
Symbol
Property
Value
N
Maximum population value
R
Actual Reward
0 – 1000
β
Learning rate
Usually 0.2
εo
Minimum error
Max(reward)/100
θGA
GA threshold
25-50
θsub
Subsumption threshold
20
θdel
Deletion threshold
20
Χ
Crossover probability
0.5-1.0
μ
Mutation probability
0.01-0.05
This table represents some of the properties defined for the system
XCSR RULE SET UP AND COVERING FUNCTION
• Using training data (Supervised)
• Start with no data (Reinforced)
• Generate new classifiers with covering
REWARD AND LEARNING
• Learning is based on how close the prediction is
• Send out action as either 0 or 1 (not malicious or malicious)
• The reward returned is either 0 or max
• Update a number of parameters
EVOLUTION
• Crossover
• a cut off point is chosen and
the bounds from two parents
are swapped
• Mutation
• The upper or lower bounds are
randomly mutated
7 DIFFICULTIES WHEN DETECTING
1. Class Imbalance Problem
2. Online Learning
3. Adaptive Adversities
4. Concept drift
5. Unequal misclassification cost
6. Noise
7. Fast processing over large data sets
ADAPTIVE ADVERSARIES EXPERIMENT
• If the environment detects a fault in the XCSR from a particular input it sends more
of the same input
• This actually helped the XCSR correct the mistake
REAL WORLD EXPERIMENT
• KDD cup 1999 data
• Based on 41 attributes of continuous and discrete data
• Length of connection, number of urgent packets,
number of “compromised” conditions…
• Types of attacks:
• 22 in training
• 39 in testing
Cost per Example (CPE)
1
CPE = 𝑁
𝑚
𝑖=1
𝑚
𝑗=1 𝐶𝑀
𝑖, 𝑗 ∗ 𝐶(𝑖, 𝑗)
Where CM is confusion matrix and C is
cost of miscalculation.
Example confusion matrix
EXPERIMENTAL RESULTS
• A subset of the entire dataset was used to reduce run times
• Using on-line learning had training data run through once
KNOWLEDGE DISCOVERY FOR BREAST CANCER
RESEARCH
A DATA MINING APPLICATION FOR XCS
QUICK FACTS
HOW MANY WOMEN IN AUSTRALIA WERE DIAGNOSED WITH BREAST CANCER IN 2012?
15, 166
BREAST CANCER ACCOUNTS FOR APPROX. 27% OF CANCER IN WOMEN
EARLY DETECTION HAS SHOWN A 5-YEAR SURVIVAL RATE CLOSE TO 100%
THE APPLICATION
• An XCS was applied to a dataset of primary breast cancer patients for the
purpose of knowledge discovery.
• The goal was not teach the XCS how to diagnose future patients but to find
patterns and relationships in the existing data.
• These discoveries could then be passed on to breast cancer researches and
improve future diagnosis.
THE DATASET
• 45 attributes collected on each patient , collectively describing the status of
the breast cancer in the patient.
• 1150 cases where taken from the dataset and fed into the XCS
• Purpose of XCS is to find relationships between the grade of the cancer and
the 44 other attributes.
3 TYPES OF ATTRIBUTES, 45 TOTAL PER PATIENT
1) Numerical Attributes
No.
1
2
3
4
5
Attribute Name
Grade
Age
Tumour Size
Tumour Weight
DCIS extent beyond
invasive tumour
Example
G2
57
2.345
40
24
6
7
DCIS Margin
Size of DCIS Invasive
12
54
8
Nodes examined total
7
9
Involved nodes total
14
10
Immuno pS2 score
115
11
Immuno ER H Score
245
12
13
Immuno PR score
Tubule Formation
Score
Nuclear Pleomorphism
Score
195
2
14
15
Mitotic Figure Score
1
3
2) Boolean Attributes
No.
16
17
18
19
20
3) Categorical Attributes
29
Attribute Name
Example
Exscision Margin score
C
30
Core Biopsy B Code
31
32
Primary Type
Specimen type
33
Involved Margin which
Medial
34
DCIS Nuclear Grade
High
35
36
37
Latescality
B
Histology
M-82113M
Histology Calcification
Benign
38
Synchronous Tumour
Attribute Name
Multifocal?
Multicentric
DCIS Necrosis?
DCIS component?
Extra Nodal
Invasion?
Example
TRUE
FALSE
FALSE
FALSE
TRUE
21
22
23
24
25
26
LCIS component?
Immuno Done?
Immuno pS2 pos?
Immuno ER pos?
Immuno PR pos?
Immuno C erb B2
pos?
TRUE
FALSE
FALSE
FALSE
FALSE
TRUE
27
Immuno Bcl 2 pos?
TRUE
39
28
Final Operation?
FALSE
40
No.
Lympho Vascsclar
Invasion
Immuno Bcl 2 Strength
B3
C50.0
Mastectomy
N
X
Moderate
42
Immuno C erb B2
Strength
Pathological M Stage
43
Pathological N Stage
pN3a
44
Pathological T Stage
pT1a
41
DCIS type
45
Strong 3
pM0
Solid and
Micro
papillary
STRUCTURE OF THE INVESTIGATION
Frenchay Breast
Cancer dataset
Data Preparation
XCS Algorithm
Rule Compaction
Knowledge Discovery System
Knowledge Evaluation
DATA PREPARATION
• Class Imbalance Problem:
Representation of each Grade in
the Dataset
• Not all three grades where
60
represented equally in the dataset
of XCS
• Dataset was oversampled to solve this
problem (cases of G1 and G3 where
selected at random and replicated until
all class representation where equal)
50
% of cases
• Potentially hinders the learning ability
40
30
20
10
0
G1
G2
G3
DATA PREPARATION (CONT.)
• Missing Data Problem
• Data could be missing due to two reasons
1)
2)
Data was not available when the database was created (e.g. specimen type)
An the existence of one attribute depends on another attribute taking a specific value.
(e.g. If Histology = M85203 then the attribute DSIC is not applicable)
• Solved using Wild to Wild mechanism. Missing attributes are replaced by #. They are
essentially not considered when creating the match set.
DATA PREPARATION (CONT.)
• Encoding
• Numerical Attributes normalised between 0 and 1
𝑥′
𝑥 − min(𝑋)
=
max 𝑋 − min(𝑋)
• Boolean Attributes = {0, 1, #}
• Categorical Attributes with n categories = {0, 1, … , n-1 ,#}
THE XCS ALGORITHM
• Algorithm used was a typical XCS
• Fitness based on accuracy of payoff prediction rather then payoff itself
• GA acts on action set instead of population
• Previous action set is not stored (application is a one step
problem)
• Reward based on if the rules in the action set correctly diagnosed
the cancer (supervised learning).
• Roulette wheel selection
• Population started empty
THE XCS ALGORITHM (CONT.)
• Rule conditions catered for the three data types
• Intervals where used when matching numerical attributes
• Eg:
(0.0 – 0.9)(#)(1)(3) … (#) (0.4 – 0.6): G2
• System Parameters where chosen empirically
N = 10,000 (Max population size) (not always full)
P# = 0.75 (probability of inserting a # during covering)
.
.
.
COMPACTION PROCESS
Too many rules to be interpreted by a human.
• Rule Filtering: Weak rules removed
• Rule Clustering:
• Group similar rules together in clusters in an attempt to generalise.
• QT –Clust algorithm was chosen for this application
• Each cluster then turned into one rule using Aggregate Average
Rule
KNOWLEDGE EVALUATION
• Compacted Ruleset was presented to a domain expert, in this
case a consultant pathologist in the domain of primary breast
cancer
Rules
Usefulness
(1-5)
Contradiction Y/N
1
N
Interesting/new
knowledge Y/N
…
IF
Immuni ER pos = True
And Mitotic Figure Score <=1
Then
Grade = G1
…
N
Remarks
RESULTS
XCS
2901 Rules
Rule
Clustering
300 Rules
No. of Rule with usefulness = 4
10
No. of Rules deemed to be interesting
9
No. of Rules deemed to contradict current
knowledge
0
Knowledge
Evaluation
QUESTIONS?
REFERENCES
[1] F. Kharbet et al.: Knowledge discovery from Medical Data: An Empirical Study with XCS, Studies in
Computational Intelligence (SCI) 125, 93 – 121 (2008)
[2] Batista, G., Prati, R., Monard, M. (2004). A study of the behaviour of several methods for balancing
machine learning training data. SIGKDD Explorations, 6(1), 20 – 29
[3] Holmes, J, Sager, J., Bilker, W. (2004). A comparison of three methods for covering missing data in
XCS. In: 7th International Workshop on Learning Classifier Systems (IWLCS- 2004)
[4] Butz, M., Wilson, S.W. (2001). An algorithmic description of XCS. In: Advances in Learning Classifer
Sysrems, Proceeding of the Third International Conference – IWLCS2000. Springer, Berlin, Heidelberg
New York , pp. 253 - 272
[5] Heyer, L., Kruglyak, S., Yooseph, S. (1999). Exploring expression data: identification and analysis of
coexpressed gened. Genome research, 9(11), 1106 - 1115
[6] Application of Learning Classifier Systems to Fraud Detection Problem – Mohammad Behdad –
University of Western Australia School of Computer Sciecne and Software Engineering - 2013
[7] KDD Cup 1999 Data - http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html - Information and
Computer Science University of California, Irvine – October 28, 1999