Transcript Document

차원감소방법의 개요
박정희
충남대학교 컴퓨터공학과
Outline
•
•
•
•
•
•
dimension reduction
dimension reduction methods
Linear dimension reduction
Nonlinear dimension reduction
Graph-based dimension reduction
Applications
Dimension reduction
features
A1 A2 A3 …
C
B1
10
Y
:
Y
21
N
:
N
13
Y
:
Y
14
N
:
N
15
Y
:
Y
10
Y
:
Y
:
:
:
objects
…
…
dimenion reduction
(차원감소)
• feature extraction
(특징추출)
• feature selection
(특징선택)
• dimension: the number of features
B2
…
C
Dimension reduction
• Reduce the dimensionality of high
dimensional data
• Identify new meaningful underlying
features
• The computational overhead of the
subsequent processing stages is reduced
• Reduce noise effects
• Visualization of the data
Dimension reduction methods
•
•
•
•
•
•
•
•
PCA(principal component analysis)
LDA(Linear discriminant analysis)
LLE(Locally linear embedding)
Isomap
LPP(Locality preserving projection)
UDP(Unsupervised discriminant projection)
Kernel based nonlinear dimension reduction
…….
Linear dimension reduction
m
R
a
W: a matrix of size mxp
(p << m)
p
R
t
Wa
 How to find p column vectors of W?
- use a training data set
- what is objective criteria?
 Traditional linear dimension reduction methods
- PCA(principal component analysis)
- LDA(Linear discriminant analysis)
=
Principal component analysis
• Minimize information loss by dimension reduction
• Capture as much of variability as possible
PCA (or Karhunen-Loéve transform)
 given data set {a1,┉,an } ,
1 n
centroid c  n  ai
i 1
n
 total scatter matrix
St   (ai  c)(ai  c)T
i 1
 Find a projection vector w to maximize
n
n

T
T
T
T T
T
T
T
(
w
a

w
c
)(
w
a

w
c
)

w
(
a

c
)(
a

c
)
w

w
St w




i
i
i
i
i 1
 i 1

 Find the eigenvector w corresponding to the
St w  w
largest eigenvalue of St :
 eigenvectors w1 , … ,wp corresponding to the p largest
eigenvalues of St -> reduction to p-dimensional space
Linear Discriminant analysis(LDA)
• PCA seeks directions which are efficient for
representing data
• LDA seeks direction which are useful for discriminating
between data in different classes
LDA
 given data set {a1,┉,an } with class label information ,
1 n
global centroid c   ai
class centroid ci  1  a
n i 1
ni aclass i




Within-class
scatter matrix
Between-class
scatter matrix
r
Sw  
T
(
a

c
)(
a

c
)

i
i
i 1 aclass i
r
Sb   ni (ci  c)(ci  c)T
i 1
Maximize between-class scatter and
minimize within-class scatter
wT Sb w
max T
w Sww
Solve the generalized eigenvalue problem
Sb w  S w w
Nonlinear dimension reduction
• It is difficult to represent nonlinearly structured
data by linear dimension reduction
Locally linear embedding(LLE)
• Expectation: each data point and its neighbors lie on a
locally linear patch of the manifold
• [reference] Nonlinear dimensionality reduction by locally
linear embedding, A.T. Roweis and L.K. Saul, Science,
vol.290, pp.2323-2326, 2000
LLE
• Algorithm
1. Find the nearest neighbors of each data point
2. Express each data point xi as a linear combination of
its neighbors
xi   wij x j ,
where  wij  1
j
j
wij = 0 if xj is not a near neighbor of xi
3. Find the coordinates yi of each point xi in lowerdimensional space by using the weights wij found in
step 2
2
minimize ( y )   yi   wij y j
i
j
Isomap
• Preserve intrinsic geometry of the data, as captured
in the geodesic manifold distances between all pairs of
data points
•
• [reference] A Global Geometric Framework for Nonlinear
Dimensionality Reduction, J. B. Tenenbaum, V. de Silva
and J. C. Langford, Science, vol.290, pp.2319-2323, 2000
Isomap
• Algorithm
1. Find the nearest neighbors of each data point and
create a weighted graph by connecting a point to its
nearest neighbors
2. Redefine the distances between points to be the
length of the shortest path between the two points
3. Apply classical MDS(Multidimensional scaling)
Example by Isomap
Kernel-based dimension reduction

2
x
1
(x) =
2 x 1x 2
x 22
linear dimension reduction
How to define a nonlinear mapping ?
Kernel methods
 If a kernel function k(x,y) satisfies Mercer’s condition,
then there exists a mapping 
A
< x, y >

• Gaussian kernel
•
• Polynomial kernel
(A)
< (x), (y)>= k(x,y)
k ( x, y )  exp(  x  y
2
/ )
k(x,y) = (<x,y>+ß )d
 As long as an algorithm can be written in terms of
inner products, it can be performed on the feature
space
Nonlinear discriminant analysis
 As long as an algorithm can be written in terms of
inner products, it can be performed on the feature
space
r
 In LDA,
wT Sb w
max T
w Sww
Sw  
T
(
a

c
)(
a

c
)

i
i
i 1 aclass i
r
Sb   ni (ci  c)(ci  c)
i 1
 LDA written in terms of inner products
r
wT S w w  
T
T
w
(
a

c
)(
a

c
)
w

i
i
i 1 aclass i
r

 ( w a  w ci )(a w  ci w)
i 1 aclass i
T
T
T
T
T
Graph-based dimension reduction
• Graph G=<X, W>
X={xi, 1  i n} : n nodes of data points
W={ wij }1  i,j  n : similarity (or weight) matrix
W can be sparse by using -neighborhoods or k
nearest neighbors for edge construction
• Let yi be the embedding of xi
• [LPP(Locality preserving projection)] ensures that if xi
and xj are close then yi and yj are close as well
minimize
( y  y )2 w

ij
i
j
ij
Graph-based methods
• Let yi be the embedding of xi
minimize
|| yi  y j ||2 wij

ij
• Linearization:
T
T
2
||
g
x

g
x
||
wij
minimize 
i
j
ij
• Using a penalty graph <X, P=(pij )>
2
maximize
||
y

y
||
 i j pij
ij
2
||
y

y
||
 i j wij
ij
Applications: Face recognition
?
…
…
dimension
reduction
…
…
high data dimensionality
…
…
Applications: text mining
• Each document becomes a `term' vector,
– each term is a component (attribute) of the vector,
– the value of each component is the number of
times the corresponding term occurs in the
document.
• Document data has high dimensionality
team
coach
pla
y
ball
score
game
wi
n
lost
timeout
season
Document 1
3
0
5
0
2
6
0
2
0
2
Document 2
0
7
0
2
1
0
0
3
0
0
Document 3
0
1
0
0
1
2
2
0
3
0
Reference
• Introduction to data mining, P.Tan, M. Steinbach and V.
Kumar, Addison Wesley, 2006
• Pattern classification, R.Duda, P.Hart and D. Stork,
Wiley-interscience, 2001
• [LPP] Locality preserving projections, X.He and P.Niyogi,
Proc. conf. Neural information processing systems,
2003
• Graph embedding and extensions: a general
framework for dimensionality reduction, S. Yan, D. Xu,
H. Zhang, Q. Yang and S. Lin, IEEE transactions on
pattern analysis and machine intelligence, Vol. 29(1),
2007