幻灯片 1 - Peking University

Download Report

Transcript 幻灯片 1 - Peking University

Total Variation and Euler's Elastica for
Supervised Learning
Tong Lin, Hanlin Xue, Ling Wang, Hongbin Zha
Contact: [email protected]
Key Laboratory of Machine Perception, School of EECS, Peking University, Beijing 100871, China
MODELS
MOTIVATION
Supervised learning infers a function that maps
inputs to desired outputs with the guidance of
training data. The state-of-the-art algorithm is SVM
based on large margin and kernel trick. It was
observed that SVM is liable to overfitting,
especially on small sample data sets; sometimes
SVM can offer 100% accuracies.
We argue that maximal margin should not be the
sole criterion for supervised learning; the
curvature and gradients of the output decision
boundaries can play an important role to avoid
overfitting. Our method is inspired by the great
success of Total variation (TV) and Euler’s elastica
(EE) in image processing. We extend TV and EE to
high dimensional supervised learning settings.
PROBLEM DEFINITION
Given training data {(xi , yi )}, i  1,..., n , the goal is to
infer the underlying mapping:
Total Variation Total variation is the measure of total quantity of
TV model:
J TV [u ]   (u  y ) 2 dx    | u | dx


the changes of a function, which has been widely
used in image processing, such as the ROF
denoising. The proposed TV model can be seen
as a special case of the following EE model.
Euler’s Elastica (EE) model in image inpainting:
J EE [ I ] 

2 E
( I  I 0 ) 2 dx  
E K
( a  b 2 ) | I | dx
curvature formula:
I
   n, n 
| I |
Curvature of a curve
Euler’s Elastic Energy
EE model:
J EE [u ]   (u  y ) 2 dx    (a  b 2 ) | u | dx


 u 

|

u
|


  
Mean curvature of a
hypersurface/decision
boundary
Geometric intuition of TV/EE regularization:
u : xi  yi , i  1,..., n
A general framework for supervised learning is
the following regularization framework:
min  i 1 L(u (xi ), yi )   S (u )
n
where L denotes the loss function, and S(u) is the
regularization/smoothing term. This discrete
model can be changed into the following
continuous form with squared loss:
min  (u  y ) 2 dx   S (u )

IMPLEMENTATION
Using the calculus of variations, we get the following Euler-Lagrange PDEs:
Our purpose is to introduce two smoothing
terms (TV & EE) for supervised learning tasks.
FLOW CHART
Energy functional with TV/EE
regularization terms
min J [u ]   (u  y ) 2 dx   S (u )
STV (u )   | u | dx


u
 (u  y)  0
| u |
V   ( )
Euler-Lagrange PDE via the
calculus of variations
RBF function approximations
Two numerical methods to find
the PDE solutions
Classification results on synthetic data sets:
SVM
Radial Basis Function Approximation

S EE (u )   (a  b ) | u | dx
2

u(x )   i 1 wii (x )  Φw
m
i (x )  exp( c || x  x i ||2 )
 V  (u  y)  0
u
1
1
T

( '( ) | u |) 

u
(

u
( '( ) | u |))
3
| u | | u |
| u |
(1) GD: Gradient Descent Time Marching
The PDEs are nonlinear and high dimensional, so we use
function approximations to find the numerical solutions.
(2) LagLE: Lagged Linear Equation Iteration
EXPERIMENTAL RESULTS
The proposed TV/EE methods are
compared with SVM, BPNN, and LR
(Laplacian Regularization) on
benchmark data sets for binary
classification, multi-class classification,
and regression.
EE
Key Lab. Of Machine Perception, School of EECS, Peking
University, China