Transcript Lecture 6

Overview
• Basis expansion
• Splines
• (Natural) cubic splines
• Smoothing splines
• Nonparametric logistic regression
• Multidimensional splines
• Wavelets
Data mining and statistical
learning - lecture 6
1
Linear basis expansion (1)
Linear regression
x1
x2
x3
y
1
-3
6
12
…
…
…
…
True model:
y  f ( x)  1 x1   2 x2   3 x3  
Question: How to find fˆ ?
Answer: Solve a system of linear equations to obtain ˆ1 , ˆ2 , ˆ3
Data mining and statistical
learning - lecture 6
2
Linear basis expansion (2)
Nonlinear model
x1
x2
x3
y
1
-3
-1
12
…
…
…
…
True model:
y  1 x1 x2   2 x2 e x3   3 sin x3   4 x12  
Question: How to find fˆ ?
Answer: A) Introduce new variables
u1  x1 x2 ,
u 2  x2 e x3 ,
u3  sin x3 ,
u 4  x12
Data mining and statistical
learning - lecture 6
3
Linear basis expansion (3)
Nonlinear model
B) Transform the data set
u1
u2
u3
-3
-1.1 -0.84 1
12
…
…
…
…
u4
y
True model:
y  1u1   2u2  3u3   4u4  
C) Apply linear regression to obtain
ˆ1 , ˆ2 , ˆ3 , ˆ4
Data mining and statistical
learning - lecture 6
4
Linear basis expansion (4)
Conclusion:
We can easily fit any model of the type
M
f  X     m hm  X 
m 1
i.e., we can easily undertake a linear basis expansion in X
Example: If the model is known to be nonlinear, but the exact
form is unknown, we can try to introduce interaction terms
f  X   1 X 1     p X p  11 X 12  12 X 1 X 2  
Data mining and statistical
learning - lecture 6
5
Piecewise polynomial functions
Assume X is one-dimesional
Def. Assume the domain [a, b] of X is split into intervals [a, ξ1],
[ξ 1, ξ 2], ..., [ξ n, b]. Then f(X) is said to be piecewise
polynomial if f(X) is represented by separate polynomials in
the different intervals.
Note The points ξ1,..., ξ n are called knots
Data mining and statistical
learning - lecture 6
6
Piecewise polynomials
Example. Continuous piecewise linear function
Alternative A. Introduce linear functions on each interval and a
set of constraints
 y1   1 x   1

 y2   2 x   2
y   x  
3
3
 3
(4 free parameters)
 y1 1   y 2 1 

 y 2  2   y 3  2 
INS. FIG 5.1
lower left
Alternative B. Use a basis expansion (4 free parameters)
h1  X   1, h2  X   X , h3  X    X  1  , h4  X    X   2 
Theorem. The given formulations are equivalent.
Data mining and statistical
learning - lecture 6
7
Splines
Definition A piecewise polynomial is called order-M spline if it has
continuous derivatives up to order M-1 at the knots.
Alternative definition An order-M spline is a function which can be
represented by basis functions ( K= #knots )
h j X   X
j 1
, j  1, , M
hM l  X    X   l  , l  1, , K
M 1
Theorem. The definitions above are equivalent.
Terrminology. Order-4 spline is called cubic spline
INS. FIG 5.2 LR
(look at basis and compare #free parameters)
Note. Cubic splines: knot-discontinuity is not visible
Data mining and statistical
learning - lecture 6
8
Variance of spline estimators – boundary effects
INSERT FIG 5.3
Data mining and statistical
learning - lecture 6
9
Natural cubic spline
Def. A cubic spline f is called natural cubic spline if the its 2nd
and 3rd derivatives are zero at a and b
Note It implies that f is linear on extreme intervals
Basis functions of natural cubic splines
N1  X   1, N 2  X   X , N k  2  d k  X   d K 1  X , k  1, ..., K  2
where d k  X  
 X   k 3   X   K 3
K  k
Data mining and statistical
learning - lecture 6
10
Fitting smooth functions to data
Minimize a penalized sum of squared residuals
N
RSS  f ,      y i  f  xi      f t  dt
2
2
i 1
where λ is smoothing parameter.
λ=0 : any function interpolating data
λ=+ : least squares line fit
Data mining and statistical
learning - lecture 6
11
Optimality of smoothing splines
Theorem The function f minimizing RSS for a given  is a natural
cubic spline with knots at all unique values of xi (NOTE: N
knots!)
The optimal spline can be computed as follows.
N
f  x    N j  x  j  N x  
T
j 1
RSS ,    y  N  y  N    T  N 
T
Nij
 N j  xi 

 N ij   N i'' t N 'j' t dt
ˆ  N T N  

N

1
NT y
Data mining and statistical
learning - lecture 6
12
A smoothing spline is a linear smoother
The fitted function

fˆ  N NT N   N

1
NT y  S  y
is linear in the response values.
Data mining and statistical
learning - lecture 6
13
Degrees of freedom of smoothing splines
The effective degrees of freedom is
dfλ = trace(Sλ)
i.e., the sum of the diagonal elements of S.
Data mining and statistical
learning - lecture 6
14
Smoothing splines and eigenvectors
It can be shown that
S   I  K 
1
where K is the so-called penalty matrix
Furthermore, the eigen-decomposition is
N
S     k  u k u Tk
k 1
 k   
1
1  d k
Note: dk and uk are eigenvalues and
eigenvectors, respectively, of K
Data mining and statistical
learning - lecture 6
15
Smoothing splines and shrinkage
N
S  y   u k  k   u Tk , y
k 1
• Smoothing spline decomposes vector y with respect to
basis of eigenvectors and shrinks respective contributions
• The eigenvectors ordered by ρ increase in complexity. The
higher the complexity, the more the contribution is shrunk.
Data mining and statistical
learning - lecture 6
16
Smoothing splines and local curve fitting
• Eigenvalues are reverse functions of λ. The higher λ, the
higher penalization.
• Smoother matrix is has banded nature -> local fitting method
N
• df   traceS    
k 1
1
1  d k
Data mining and statistical
learning - lecture 6
INSERT fig 5.8
17
Fitting smoothing splines in practice (1)
Reinsch form:
S   I  K 
1
Theorem. If f is natural cubic spline with values at knots f and
second derivative  at knots then
QT f  R
where Q and R are band matrices, dependent on ξ only.
Theorem.
1
K  QR Q
T
Data mining and statistical
learning - lecture 6
18
Fitting smoothing splines in practice (2)
Reinsch algorithm
• Evaluate QTy
• Compute R+λQTQ and find Cholesky decomposition (in linear
time!)
• Solve matrix equation (in linear time!)
• Obtain f=y-λQγ
Data mining and statistical
learning - lecture 6
19
Automated selection of smoothing parameters (1)
What can be selected:
Regression splines
• Degree of spline
• Placement of knots
->MARS procedure
Smoothing spline
• Penalization parameter
Data mining and statistical
learning - lecture 6
20
Automated selection of smoothing parameters (2)
Fixing the degrees of freedom
N
1
k 1
1  d k
df   traceS    
• If we fix dfλ then we can find λ by solving the equation
numerically
• One could try two different dfλ and choose one based on Ftests, residual plots etc.
Data mining and statistical
learning - lecture 6
21
Automated selection of smoothing parameters (3)
The bias-variance trade off
N
1
k 1
1  d k
df   traceS    
INSERT FIG. 5.9
EPE – integrated squared
prediction error,
CV- cross validation
Data mining and statistical
learning - lecture 6
22
Nonparametric logistic regression
Logistic regression model
log
Pr Y  1 | X  x 
 f (X )
Pr Y  0 | X  x 
Note: X is one-dimensional
What is f:
Linear -> ordinary logistic regression (Chapter 4)
• Enough smooth -> nonparametric logistic regression
(splines+others)
• Other choices are possible
Data mining and statistical
learning - lecture 6
23
Nonparametric logistic regression
Problem formulation:
Minimize penalized log-likelihood
2
1
min l p  f ,    l u  f ,       f t  dt
2
Good news: Solution is still a natural cubic spline.
Bad news: There is no analytic expression of that spline
function
Data mining and statistical
learning - lecture 6
24
Nonparametric logistic regression
How to proceed?
Use Newton-Rapson to compute spline numerically, i.e
•
Compute
l p 
l p  

,  lp 
2
 2l p  

T
(analytically)
1. Compute Newton direction using current value of parameter
and derivative information
2. Compute new value of parameter using old value and
update formula

new

old
  l 
1
2
p
l p
Data mining and statistical
learning - lecture 6
25
Multidimensional splines
How to fit data smoothly in higher dimensions?
A) Use basis of one dimensional functions and produce basis
by tensor product
g jk  X   h1 j  X 1 h2 k  X 2 ,
g  X     jk g jk  X 
Problem: Exponential
INS FIG. 6.10
growth of basis with dim
Data mining and statistical
learning - lecture 6
26
Multidimensional splines
How to fit data smoothly in higher dimensions?
B) Formulate a new problem
min
y
 f xi   J  f 
2
i
i
• The solution is thin-plate splines
• The similar properties for λ=0.
• The solution in 2 dimension is essentially sum of radial basis
functions

f x    0   T x    j x  x j
Data mining and statistical
learning - lecture 6

27
Wavelets
Introduction
• The idea: to fit bumpy function by removing noise
• Application area: Signal processing, compression
• How it works: The function is represented in the basis of
bumpy functions. The small coefficients are filtered.
Data mining and statistical
learning - lecture 6
28
Wavelets
Basis functions (Haar Wavelets, Symmlet-8 Wavelets)
INSERT FIG 5.13
Data mining and statistical
learning - lecture 6
29
Wavelets
Example
Insert FIG 5.14
Data mining and statistical
learning - lecture 6
30