multipleLearners - Heather Dewey
Download
Report
Transcript multipleLearners - Heather Dewey
Ensemble Methods
“No free lunch theorem”
Wolpert and Macready 1995
“No free lunch theorem”
Wolpert and Macready 1995
Solution search also involves searching for
learners
Different algorithms
Different algorithms
Different parameters
Different algorithms
Different parameters
Different input representations/features
Different algorithms
Different parameters
Different input representations/features
Different data
Base learner
Diversity over accuracy
Model combination
Voting
Bagging
Boosting
Cascading
Data set = [1,2,3,4,5,6,7,8,9,10]
Samples:
Input to learner 1 = [10,2,5,10,3]
Input to learner 2 = [4,5,2,7,6,3]
Input to learner 3 = [8,8,4,9,1]
Create complementary learners
Create complementary learners
Train successive learners on the mistakes of
predecessors
Weak learners combine to a strong learner
Adaboost – Adaptive Boosting
Adaboost – Adaptive Boosting
Allows for a smaller training set
Adaboost – Adaptive Boosting
Allows for a smaller training set
Simple classifiers
Adaboost – Adaptive Boosting
Allows for a smaller training set
Simple classifiers
Binary
Modify probability of drawing examples from
a training set based on errors
Step 3
1
1 error
1 log(
)
2
error
error 0.33
1
1 .33
1 log(
)
2
.33
1 0.35
Demo
Sequence classifiers by complexity
Sequence classifiers by complexity
Use classifier j+1 if classifier j doesn’t meet a
confidence threshold
Sequence classifiers by complexity
Use classifier j+1 if classifier j doesn’t meet a
confidence threshold
Train cascading classifiers on instances the
previous classifier is not confident about
Sequence classifiers by complexity
Use classifier j+1 if classifier j doesn’t meet a
confidence threshold
Train cascading classifiers on instances the
previous classifier is not confident about
Most examples classified quickly, harder ones
passed to more expensive classifiers
Boosting and Cascading
Object detection/tracking
Collaborative filtering
Neural networks
Optical character recognition ++
Biometrics
Data mining
Ensemble methods are proven effective,
but why?