Learning under concept drift: an overview
Download
Report
Transcript Learning under concept drift: an overview
Learning under concept drift: an overview
Zhimin He
iTechs – ISCAS
2013-03-21
Agenda
What’s Concept Drift
Causes of a Concept Drift
Types of Concept Drift
Detecting and Handling Concept Drift
Implications for Software Engineering Research
Definitions
Prediction
p
X t R is a vector in p-dimensional feature space
observed at time t and yt is the corresponding label.
We call Xt an instance and a pair (Xt; yt) a labeled
instance. We refer to instances (X1; : : : ;Xt) as
historical data and instance Xt+1 as target (or testing)
instance.
The task is to predict a label yt+1 for the target
instance Xt+1.
Definitions(cont.)
Concept Drift
Every instance Xt is generated by a source St.
If all the data is sampled from the same source, i.e. S1
= S2 = : : : = St+1 = S we say that the concept is stable.
If for any two time points i and j Si != Sj , we say that
there is a concept drift.
Causes of Concept Drift
Let X R is an instance in p-dimensional
feature space. X c , where c1, c2,….ck is the
set of class labels.
The optimal classier to classify X c is
determined by a prior probabilities for the
classes P(ci) and the class-conditional
probability density functions p(X | ci), i = 1,….k.
Concept /data source:
p
i
i
a set of a prior probabilities of the classes and classconditional pdf's:
S {( P (c ), P ( X | c )),( P(c ), P( X | c )),...( P(c ), P( X | c ))}
1
1
2
2
k
k
Causes of Concept Drift (cont.)
p (c ) p ( X | c )
p (c | X )
p( X )
i
i
i
Concept drift may occur in three ways:
Class priors P(c) might change over time.
The distributions of one or several classes p(X|ci)
might change. (virtual drift)
The posterior distributions of the class memberships
p(ci|X) might change.(real drift)
Types of Concept Drift
Types:
Sudden drift
Gradual drift
Incremental drift
reoccurring contexts
Detecting and Handling Concept Drift
Detecting
Monitoring the raw data
Monitoring parameters of learners
Monitoring prediction errors of learners
Handling
Ensemble learning
Instance selection
Instance weights
Training windows
Training windows are naturally suitable for sudden concept
drift, while ensembles are more flexible in terms of change
type.
Detecting and Handling Concept Drift (cont.)
Overall solution for learning under concept drift
Implications for SE Research
Concept drift is a fundamental issue for SE
predictions
Cost estimation, defect prediction…
Especially in the cross-company/cross-project context
Be harmful to performance of prediction models
Detecting and handling concept drift is a
challenging task!
Quality problems of SE data, e.g., insufficient data
Data generation context is highly unstable.
Has become a increasingly popular research
topic in SE field!
E.g., Burak Turhan [JESE 2012], Jayalath Ekanayake
[MSR 2009, JESE 2011]
References
1.Indre Zliobaite, “Learning under Concept Drift : an
Overview,” Tech-report, 2009
2.A. Dries and R. Ulrich, “Adaptive Concept Drift
Detection,” Journal of Statictical Analysis and Data
Mining, 2009
3.L. Minku, A. White, and X. Yao. “The impact of diversity
on on-line ensemble learning in the presence of concept
drift.” IEEE Transactions on Knowledge and Data
Engineering, 2009
4.M. Kelly, D. Hand, and N. Adams. “The impact of
changing populations on classier performance.”
KDD,1999