Ensemble Methods

Download Report

Transcript Ensemble Methods

Ensemble Methods

An ensemble method constructs a set of base
classifiers from the training data
– Ensemble or Classifier Combination

Predict class label of previously unseen records
by aggregating predictions made by multiple
classifiers
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
General Idea
D
Step 1:
Create Multiple
Data Sets
Step 2:
Build Multiple
Classifiers
D1
D2
C1
C2
Step 3:
Combine
Classifiers
© Tan,Steinbach, Kumar
....
Original
Training data
Dt-1
Dt
Ct -1
Ct
C*
Introduction to Data Mining
4/18/2004
‹#›
Why does it work?

Suppose there are 25 base classifiers
– Each classifier has error rate,  = 0.35
– Assume classifiers are independent
– Probability that the ensemble classifier makes
a wrong prediction:
 25  i
25i

(
1


)
 0.06



 i 
i 13 

25
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Methods

By manipulating the training dataset: a classifier
is built for a sampled subset of the training
dataset.
– Two ensemble methods: bagging (bootstrap
averaging) and boosting.
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Characteristics

Ensemble methods work better with unstable
classifiers.
– Base classifiers that are sensitive to minor
perturbations in the training set.

For example, decision trees and ANNs.
– The variability among training examples is
one of the primary sources of errors in a
classifier.
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Bias-Variance Decomposition

Consider the trajectories of a projectile launched at a
particular angle. The observed distance can be divided
into 3 components.
d f , ( y , t )  Bais  Variance f  Noiset
– Force (f) and angle (θ)
– Suppose the target is t, but the projectile hits at x at
distance d away from t.
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Two Decision Trees (1)
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Two Decision Trees (2)



Bias: The stronger the assumptions made by a classifier
about the nature of its decision boundary, the larger the
classifier’s bias will be.
– A smaller tree has a stronger assumption.
– An algorithm cannot learn the target.
Variance: Variability in the training data affects the
expected error, because different compositions of the
training set may lead to different decision boundaries.
Intrinsic noise in the target class.
– Target class for some domain can be nondeterministic.
– Same attributes values with different class labels.
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Bagging

Sampling with replacement
Original Data
Bagging (Round 1)
Bagging (Round 2)
Bagging (Round 3)
1
7
1
1
2
8
4
8
3
10
9
5
4
8
1
10
5
2
2
5
6
5
3
5
7
10
2
9
8
10
7
6
9
5
3
3
10
9
2
7

Build classifier on each bootstrap sample

Each sample has probability 1 - (1 – 1/n)n of
being selected. When n is large, a bootstrap
sample Di contains about 63.2% of the training
data.
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Bagging Algorithm
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
A Bagging Example (1)
Consider a one-level binary decision tree x <= k
where k is a split point to minimize the entropy.
 Without bagging, the best decision stump is
– x <= 0.35 or x >= 0.75

© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
A Bagging Example (2)
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
A Bagging Example (3)
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
A Bagging Example (4)
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Summary on Bagging
Bagging improves generalization error by
reducing the variance of the base classifier.
 Bagging depends on the stability of the base
classifier.
 If a base classifier is unstable, bagging helps to
reduce the errors associated with random
fluctuations in the training data.
 If a base classifier is stable, then the error of the
ensemble is primarily caused by bias in the base
classifier. Bagging may make error larger,
because the sample size is 37% smaller.

© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Boosting

An iterative procedure to adaptively change
distribution of training data by focusing more on
previously misclassified records
– Initially, all N records are assigned equal
weights
– Unlike bagging, weights may change at the
end of boosting round
– The weights can be used by a base classifier
to learn a model that is biased toward higherweight examples.
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Boosting
Records that are wrongly classified will have their
weights increased
 Records that are classified correctly will have
their weights decreased

Original Data
Boosting (Round 1)
Boosting (Round 2)
Boosting (Round 3)
1
7
5
4
2
3
4
4
3
2
9
8
4
8
4
10
5
7
2
4
6
9
5
5
7
4
1
4
8
10
7
6
9
6
4
3
10
3
2
4
• Example 4 is hard to classify
• Its weight is increased, therefore it is more
likely to be chosen again in subsequent rounds
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Example: AdaBoost

Base classifiers: C1, C2, …, CT

Error rate:
 i   w j Ci ( x j )  y j 
N
j 1

Importance of a classifier:
1  1  i 

i  ln 
2  i 
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Example: AdaBoost

Weight update:
 j

if C j ( xi )  yi
w
exp
( j 1)
wi


Z j  exp j if C j ( xi )  yi

where Z j is the normalization factor to ensure  w i (j1)  1.
( j)
i
i
If any intermediate rounds produce error rate
higher than 50%, the weights are reverted back
to 1/n and the resampling procedure is repeated
 Classification:
T

C * ( x )  arg max  j C j ( x )  y 
y
© Tan,Steinbach, Kumar
Introduction to Data Mining
j 1
4/18/2004
‹#›
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
A Boosting Example (1)
Consider a one-level binary decision tree x <= k
where k is a split point to minimize the entropy.
 Without bagging, the best decision stump is
– x <= 0.35 or x >= 0.75

© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
A Boosting Example (2)
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
A Boosting Example (3)
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›