Nonnegative Matrix Factorization with Sparseness Constraints
Download
Report
Transcript Nonnegative Matrix Factorization with Sparseness Constraints
Nonnegative Matrix Factorization
with Sparseness Constraints
S. Race
MA591R
Introduction to NMF
Factor A = WH
A – matrix of data
m non-negative scalar variables
n measurements form the columns of A
W – m x r matrix of “basis vectors”
H – r x n coefficient matrix
Describes how strongly each building
block is present in measurement vectors
Introduction to NMF con’t
Purpose:
“parts-based” representation of the data
Data compression
Noise reduction
Examples:
Term – Document Matrix
Image processing
Any data composed of hidden parts
Introduction to NMF con’t
Optimize accuracy of solution:
min || A-WH ||F where W,H ≥ 0
We can drop nonnegative constraints
min || A-(W.W)(H.H) ||
Many options for objective function
Many options for algorithm
W,H will depend on initial choices
Convergence is not always guaranteed
Common Algorithms
Alternating Least Squares
Paatero 1994
Multiplicative Update Rules
Lee-Seung 2000 Nature
Used by Hoyer
Gradient Descent
Hoyer 2004
Berry-Plemmons 2004
Why is sparsity important?
Nature of some data
Text-mining
Disease patterns
Better Interpretation of Results
Storage concerns
Non-negative Sparse Coding I
Proposed by Patrik Hoyer in 2002
Add a penalty function to the
objective to encourage sparseness
OBJ:
Parameter λ controls trade-off
between accuracy and sparseness
f is strictly increasing: f=Σ Hij works
Sparse Objective Function
The objective can always be
decreased by scaling W up, H down
Set W= cW and H=(1/c)H
Thus, alone the objective will simply
yield the NMF solution
Constraint on the scale of H or W is
needed
Fix norm of columns of W or rows of H
Non-negative Sparse Coding I
Pros
Simple, efficient
Guaranteed to reach global minimum
using multiplicative update rule
Cons
Sparseness controlled implicitly:
Optimal λ found by trial and error
Sparseness only constrained for H
NMF with sparseness constraints II
First need some way to define the
sparseness of a vector
A vector with one nonzero entry is
maximally sparse
A multiple of the vector of all ones, e,
is minimally sparse
CBS Inequality
How can we combine these ideas?
Hoyer’s Sparseness Parameter
sparseness(x)=
where n is the dimensionality of x
This measure indicates that we can
control a vector’s sparseness by
manipulating its L1 and L2 norms
Picture of Sparsity function for vectors w/ n=2
Implementing Sparseness Constraints
Now that we have an explicit measure
of sparseness, how can we
incorporate it into the algorithm?
Hoyer: at each step, project each
column of a matrix onto the nearest
vector of desired sparseness.
Hoyer’s Projection Algorithm
Problem: Given any vector, x, find the
closest (in the Euclidean sense) nonnegative vector s with a given L1
norm and a given L2 norm
We can easily solve this problem in
the 3 dimensional case and extend
the result.
Hoyer’s Projection Algorithm
Set si=xi + (L1-Σxi)/n for all i
Set Z={}
Iterate
Set mi=L1/(n-size(Z)) if i in Z, 0 otherwise
Set s=m+β(s-m) where β≥0 solves quadratic
If s, non-negative we’re finished
Set Z=Z U {i : si <0}
Set si=0 for all i in Z
Calculate c=(Σsi – L1)/(n-size(Z))
Set si=si-c for all i not in Z
The Algorithm in words
Project x onto hyperplane Σsi=L1
Within this space, move radially outward
from center of joint constraint hypersphere
toward point
If result non-negative, destination reached
Else, set negative values to zero and
project to new point in similar fashion
NMF with sparseness constraints
Step 1: Initialize W, H to random
positive matrices
Step 2: If constraints apply to W or H
or both, project each column or row
respectively to have unchanged L2
norm and desired L1 norm
NMF w/ Sparseness Algorithm
Step 3: Iterate
If sparseness constraints on W apply,
Set W=W-μw(WH-A)HT
Project columns of W as in step 2
Else, take standard multiplicative step
If sparseness constraints on H apply
Set H=H- μHWT(WH-A)
Project rows of H as in step 2
Else, take standard multiplicative step
Advantages of New Method
Sparseness controlled explicitly with a
parameter that is easily interpretted
Sparseness of W, H or both can be
constrained
Number of iterations required grows
very slowly with the dimensionality of
the problem
Dotted Lines
Represent Min and
Max Iterations
Solid Line shows
average number
required
An Example from Hoyer’s Work
Text Mining Results
Text to Matrix Generator
Dimitrios Zeimpekis and E. Gallopoulos
University of Patras
http://scgroup.hpclab.ceid.upatras.gr/sc
group/Projects/TMG/
NMF with sparseness constraints from
Hoyer’s web page