Ensemble Methods

Download Report

Transcript Ensemble Methods

Examples of Ensemble Methods

How to generate an ensemble of classifiers?
– Bagging
– Boosting
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Bagging

Sampling with replacement
Original Data
Bagging (Round 1)
Bagging (Round 2)
Bagging (Round 3)





1
7
1
1
2
8
4
8
3
10
9
5
4
8
1
10
5
2
2
5
6
5
3
5
7
10
2
9
8
10
7
6
9
5
3
3
10
9
2
7
Build a classifier on each bootstrap sample
Each example has probability 1/N being selected
For an example not being selected after N times, the
probability is (1 – 1/N)N (when N is large, it is close to 1/e)
For an example being selected after N times, the
probability is 1 – 1/e = 0.632
A bootstrap sample contains 63% of the original data
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Boosting

An iterative procedure to adaptively change
distribution of training data by focusing more on
previously misclassified records
– Initially, all N records are assigned equal
weights
– Unlike bagging, weights may change at the
end of boosting round
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Boosting
Records that are wrongly classified will have their
weights increased
 Records that are classified correctly will have
their weights decreased

Original Data
Boosting (Round 1)
Boosting (Round 2)
Boosting (Round 3)
1
7
5
4
2
3
4
4
3
2
9
8
4
8
4
10
5
7
2
4
6
9
5
5
7
4
1
4
8
10
7
6
9
6
4
3
10
3
2
4
• Example 4 is hard to classify
• Its weight is increased, therefore it is more
likely to be chosen again in subsequent rounds
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Example: AdaBoost

Base classifiers: C1, C2, …, CT

Error rate:
1
i 
N

 w  C ( x )  y 
N
j 1
j
i
j
j
Importance of a classifier:
1  1  i 

i  ln 
2  i 
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Example: AdaBoost

Weight update:
 j

if C j ( xi )  yi
w exp
( j 1)
wi


j
Zj 
exp
if C j ( xi )  yi

where Z j is the normalizat ion factor
( j)
i
If any intermediate rounds produce error rate
higher than 50%, the weights are reverted back
to 1/n and the resampling procedure is repeated
 Classification:
T

C * ( x )  arg max  j C j ( x )  y 
y
© Tan,Steinbach, Kumar
Introduction to Data Mining
j 1
4/18/2004
‹#›
Illustrating AdaBoost
Initial weights for each data point
Original
Data
0.1
0.1
0.1
+++
- - - - -
++
Data points
for training
B1
0.0094
Boosting
Round 1
+++
© Tan,Steinbach, Kumar
0.0094
0.4623
- - - - - - -
Introduction to Data Mining
 = 1.9459
4/18/2004
‹#›
Illustrating AdaBoost
B1
0.0094
Boosting
Round 1
0.0094
+++
0.4623
- - - - - - -
 = 1.9459
B2
Boosting
Round 2
0.0009
0.3037
- - -
- - - - -
0.0422
++
 = 2.9323
B3
0.0276
0.1819
0.0038
Boosting
Round 3
+++
++ ++ + ++
Overall
+++
- - - - -
© Tan,Steinbach, Kumar
Introduction to Data Mining
 = 3.8744
++
4/18/2004
‹#›