Boosting - UCLA Human Genetics
Download
Report
Transcript Boosting - UCLA Human Genetics
Boosting
Ashok Veeraraghavan
Boosting Methods
• Combine many weak classifiers to
produce a committee.
• Resembles Bagging and other committee
based methods.
• Many weak classifiers combined to
produce a powerful committee.
General Idea of Boosting
• Weak Classifiers ( error rate slightly better
than random).
• Sequentially apply weak classifiers to
modified versions of data.
• Predictions of these classifiers are
combined to produce a powerful classifier.
• Remember “Bagging”.
Preliminaries
•
•
•
•
•
Two class classification.
Output: Y {1,1}
Predictor Variables: X
Weak Classifier: G(X)
Errtrain = err 1 N I ( y
train
N
i 1
i
G( xi ))
Schematic of Adaboost
Weighted Sample
GM(x)
…………..
M
G ( x ) si g n ( m Gm ( x) ).
m1
Weighted Sample
G3(x)
Weighted Sample
G2(x)
Training Sample
G1(x)
AdaBoost
• Initialize the observation weights wi=1/N.
• For m=1 to M
Fit a classifier Gm(x) using weights wi.
Compute errtraining.
1 e r r
l
o
g
(
)
Compute
err
Compute w w . exp[ . I ( y G ( x ) ].
m
m
m
i
i
m
• Output
G ( x ) si g n (
i
i
M
m 1
m
Gm ( x) ).
Additive model
M
• Additive Model : f ( x) b x;
eg., Neural Network, Wavelets, MARS etc.
i 1
M
m
m
M
L ( y , b x; )
• Optimization problem.
• Computationally intensive.
• Resort to “Forward Stagewise Additive
Modeling”.
min{
M
m , m }1
i 1
i
i 1
m
m
Exponential Loss and AdaBoost
• AdaBoost is equivalent to “Forward
Stagewise additive modeling”.
• Loss Function: L ( y , f ( x)) exp( y . f ( x)).
• Basis Functions: Individual weak
classifiers Gm(x).
• Exponential Loss makes AdaBoost
Computationally efficient.
Loss Functions and Performance
•
•
•
•
•
Misclassification
Exponential
Binomial Deviance
Squared Error
Choice of loss function depends upon task
at hand (Classification Vs Regression) and
the distribution of the data.
Off-the shelf Procedures for
Data-Mining
• Off-the shelf
minimum preprocessing
of data.
• Requirements:
Qualitative understanding of relationship
between predictors and output.
Filter out irrelevant features.
Tackle Outliers.
• Decision Trees
Trees
• Trees: Partition space into disjoint
regions Rj(j=1,2,…J).
• f(x ε Rj) =
ρj
J
• Formally, T ( X , ) j I ( x R j ).
j 1
arg min
J
L ( y , R ).
j 1 xi R j
i
j
• Computationally intensive search.
Trees Contd.
• Divide minimization into two parts and
iterate.
• Finding ρj given Rj.
• Finding Rj : Greedy recursive partitioning.
• Advantage
Qualitative understanding of output.
Internal feature selection.
Boosting Trees
G ( x ) si g n (
M
m 1
m
Gm ( x) ).
Gm ( x ) T ( X , m )
• Boosting done by Forward Stagewise
Additive Modeling or by AdaBoost.
• Squared error loss
Regression Tree.
Optimal Size of Trees
• Usually oversized Trees grown and then
pruned to predetermined size.
• Instead grow all trees to same size.
• Optimal size of each tree estimated
beforehand from data and target function.
ANOVA( analysis of variance) expansion
(X )
( X )
j
j
j
jk
jk
(X j , Xk )
jkl
jkl
( X j , X k , X l ) .........
Regularization
• Number of Boosting Iterations ???
• M too large then poor generalization i.e.,
overfitting.
• Usually M selected by Validation.
Shrinkage
• Scale the importance each tree by 0<ε<1.
f m ( x) f m1 ( x)
J
j 1
m
b( x , )
• Trade off between ε and M.
• Small ε
Large M
• Computations proportional to M!
Applications
• Non intrusive powing monitoring system
• Tumor classification with Gene Expression
data
• Text Classification
References
•
•
•
•
•
www.boosting.org
T.Hastie, R.Tibshirani, J.Friedman. “The Elements of Statistical LearningData Mining,Inference, Prediction.” Springer Verlag.
R. Meir and G. Rätsch. An introduction to boosting and leveraging. In
S. Mendelson and A. Smola, editors, Advanced Lectures on Machine
Learning, LNCS, pages 119-184. Springer, 2003. In press. Copyright by
Springer Verlag. (PDF)
R.E. Schapire. A brief introduction to boosting. In Proceedings of the
Sixteenth International Joint Conference on Artificial Intelligence, 1999.
Y.Freund and R.E. Schapire. A short introduction to boosting. Journal of
Japanese Society for Artificial Intelligence, 14(5):771-780, September 1999.
Appearing in Japanese, translation by Naoki Abe.