Ensemble Methods
Download
Report
Transcript Ensemble Methods
Examples of Ensemble Methods
How to generate an ensemble of classifiers?
– Bagging
– Boosting
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Bagging
Sampling with replacement
Original Data
Bagging (Round 1)
Bagging (Round 2)
Bagging (Round 3)
1
7
1
1
2
8
4
8
3
10
9
5
4
8
1
10
5
2
2
5
6
5
3
5
7
10
2
9
8
10
7
6
9
5
3
3
10
9
2
7
Build a classifier on each bootstrap sample
Each example has probability 1/N being selected
For an example not being selected after N times, the
probability is (1 – 1/N)N (when N is large, it is close to 1/e)
For an example being selected after N times, the
probability is 1 – 1/e = 0.632
A bootstrap sample contains 63% of the original data
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Boosting
An iterative procedure to adaptively change
distribution of training data by focusing more on
previously misclassified records
– Initially, all N records are assigned equal
weights
– Unlike bagging, weights may change at the
end of boosting round
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Boosting
Records that are wrongly classified will have their
weights increased
Records that are classified correctly will have
their weights decreased
Original Data
Boosting (Round 1)
Boosting (Round 2)
Boosting (Round 3)
1
7
5
4
2
3
4
4
3
2
9
8
4
8
4
10
5
7
2
4
6
9
5
5
7
4
1
4
8
10
7
6
9
6
4
3
10
3
2
4
• Example 4 is hard to classify
• Its weight is increased, therefore it is more
likely to be chosen again in subsequent rounds
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Example: AdaBoost
Base classifiers: C1, C2, …, CT
Error rate:
1
i
N
w C ( x ) y
N
j 1
j
i
j
j
Importance of a classifier:
1 1 i
i ln
2 i
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Example: AdaBoost
Weight update:
j
if C j ( xi ) yi
w exp
( j 1)
wi
j
Zj
exp
if C j ( xi ) yi
where Z j is the normalizat ion factor
( j)
i
If any intermediate rounds produce error rate
higher than 50%, the weights are reverted back
to 1/n and the resampling procedure is repeated
Classification:
T
C * ( x ) arg max j C j ( x ) y
y
© Tan,Steinbach, Kumar
Introduction to Data Mining
j 1
4/18/2004
‹#›
Illustrating AdaBoost
Initial weights for each data point
Original
Data
0.1
0.1
0.1
+++
- - - - -
++
Data points
for training
B1
0.0094
Boosting
Round 1
+++
© Tan,Steinbach, Kumar
0.0094
0.4623
- - - - - - -
Introduction to Data Mining
= 1.9459
4/18/2004
‹#›
Illustrating AdaBoost
B1
0.0094
Boosting
Round 1
0.0094
+++
0.4623
- - - - - - -
= 1.9459
B2
Boosting
Round 2
0.0009
0.3037
- - -
- - - - -
0.0422
++
= 2.9323
B3
0.0276
0.1819
0.0038
Boosting
Round 3
+++
++ ++ + ++
Overall
+++
- - - - -
© Tan,Steinbach, Kumar
Introduction to Data Mining
= 3.8744
++
4/18/2004
‹#›