Logistic Regression

Download Report

Transcript Logistic Regression

Center for Evolutionary Functional Genomics
Large-Scale Sparse Logistic Regression
Jieping Ye
Arizona State University
Joint work with Jun Liu and Jianhui Chen
Center for Evolutionary Functional Genomics
 Prediction: Disease or not
 Confidence (probability)
 Identify Informative features
Sparse Logistic Regression
Center for Evolutionary Functional Genomics
Logistic Regression
Logistic Regression (LR) has been applied to
 Document classification (Brzezinski, 1999)
 Natural language processing (Jurafsky and Martin, 2000)
 Computer vision (Friedman et al., 2000)
 Bioinformatics (Liao and Chin, 2007)
Regularization is commonly applied to reduce overfitting and obtain a
robust classifier. Two well-known regularizations are:
 L2-norm regularization (Minka, 2007)
 L1-norm regularization (Koh et al., 2007)
Center for Evolutionary Functional Genomics
Sparse Logistic Regression
L1-norm regularization leads to sparse logistic regression (SLR)
 Simultaneous feature selection and classification
 Enhanced model interpretability
 Improved classification performance
Applications
 M.-Y. Park and T. Hastie, Penalized Logistic Regression for
Detecting Gene Interactions. Biostatistics, 2008.
 T. Wu et al. Genomewide Association Analysis by Lasso Penalized
Logistic Regression. Bioinformatics, 2009.
Center for Evolutionary Functional Genomics
Large-Scale Sparse Logistic Regression
Many applications involve
data of large dimensionality
 The MRI images used in
Alzheimer’s Disease study
contain more than 1 million
voxels (features)
Major Challenge
 How to scale sparse logistic
regression to large-scale
problems?
Center for Evolutionary Functional Genomics
The Proposed Lassplore Algorithm
 Lassplore (LArge-Scale SParse LOgistic
REgression) is a first-order method
 Each iteration of Lassplore involves the matrix-vector
multiplication only
Scale to large-size problems
Efficient for sparse data
Lassplore achieves the optimal convergence
rate among all first-order methods
Center for Evolutionary Functional Genomics
Outline
Logistic Regression
Sparse Logistic Regression
Lassplore
Experiments
Center for Evolutionary Functional Genomics
Logistic Regression (1)
Logistic regression model is given by
Prob(b | a )   ( w a  c )
T
aR
n
b  {1, 1}
is the sample
is the class label
1
1  exp  b( wT a  c ) 
Center for Evolutionary Functional Genomics
Logistic Regression (2)
Prob(bi | ai )   ( wT ai  c )
 Prob(b | a )
i
i
is maximized
i
a1 a2
am
m
{a
,
b
}
Given a set of m training data i i i 1 , we can compute w
and c by minimizing the average logistic loss:
overfitting
Center for Evolutionary Functional Genomics
L1-ball Constrained Logistic Regression
Favorable Properties:
 Obtaining sparse solution
 Performing feature selection and classification simultaneously
 Improving classification performance
How to solve the L1-ball constrained optimization problem?
Center for Evolutionary Functional Genomics
Gradient Method for Sparse Logistic Regression
Let us consider the gradient descent for solving the
optimization problem: min g ( x )
xG
xk
xk 1  xk  g '( xk ) / Lk
xk 1
Center for Evolutionary Functional Genomics
Euclidean Projection onto the L1-Ball
y
v1
v2
z
π(v1)
π(v2)
The Euclidean projection onto
the L1-ball (Duchi et al., 2008)
is a building block, and it can
be solved in linear time (Liu
and Ye, 2009).
π(v3)
0
z
v3
x
Center for Evolutionary Functional Genomics
Gradient Method & Nesterov’s Method (1)
Convergence rates:
g(.)
smooth and convex
Gradient Descent
O(1/k)
smooth and strongly
convex with
conditional number C
  C 1 
O
  C  1 

2k



Nesterov’s method
O(1/k2)
k

 
1
O  1 
 

C 


Nesterov’s method achieves the lower-complexity
bound of smooth optimization by first-order black-box
method, and thus is an optimal method.
Center for Evolutionary Functional Genomics
Gradient Method & Nesterov’s Method (2)
The theoretical number of iterations (up to a constant
factor) for achieving an accuracy of 10-8:
g(.)
smooth and convex
Gradient Descent
108
Nesterov’s method
104
smooth and strongly
convex with
conditional number
C= 104
4.6×104
1.8×103
Center for Evolutionary Functional Genomics
Characteristics of the Lassplore
First-order black-box Oracle based method
At each iteration, we only need to evaluate the function value and gradient
 Utilizing the Nesterov’s method (Nesterov, 2003)
Global convergence rate of O(1/k2) for the general case
Linear convergence rate for the strongly convex case
 An adaptive line search scheme
The step size is allowed to increase during the iterations
This line search scheme is applicable to the general smooth convex optimization
Center for Evolutionary Functional Genomics
Key Components and Settings
xk-1
xk+1
sk=xk+βk(xk-xk-1)
xk+1=sk-g'(sk)/Lk xk
xk
xk 1  xk  g '( xk ) / Lk
xk 1
sk
Previous schemes for Lk ,  k :
Nesterov’s constant scheme (Nesterov, 2003)
Nemirovski’s line search scheme (Nemirovski, 1994)
Center for Evolutionary Functional Genomics
Previous Line Search Schemes
Nesterov’s constant scheme (Nesterov, 2003):
 Lk is set to a constant value L, the Lipschitz continuous
gradient of the function g(.)
  k is dependent on the conditional number C
Nemirovski’s line search scheme (Nemirovski, 1994):
 Lk is allowed to increase, and upper-bounded by 2L
  k is identical for every function g(.)
Center for Evolutionary Functional Genomics
Proposed Line Search Scheme
Characteristics:
 Lk is allowed to adaptively tuned (increasing and decreasing) and
upper-bounded by 2L
  k is dependent on Lk
 It preserves the optimal convergence rate (technical proof refers
to the paper)
Center for Evolutionary Functional Genomics
Related Work
 Y. Nesterov. Gradient methods for minimizing composite
objective function (Technical Report 2007/76).
 S. Becker, J. Bobin, and E. J. Candès. NESTA: a fast and
accurate first-order method for sparse recovery. 2009.
 A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding
algorithm for linear inverse problems. SIAM Journal on Imaging
Sciences, 2, 183-202, 2009.
 K.-C. Toh and S. Yun. An accelerated proximal gradient
algorithm for nuclear norm regularized least squares problems.
Preprint, National University of Singapore, March 2009.
 S. Ji and J. Ye. An Accelerated Gradient Method for Trace
Norm Minimization. The Twenty-Sixth International Conference
on Machine Learning, 2009.
Center for Evolutionary Functional Genomics
Experiments: Data Sets
Center for Evolutionary Functional Genomics
Comparison of the Line Search Schemes
Comparison the proposed adaptive scheme (Adap)
with the one proposed by Nemirovski (Nemi)
Lk
Objective
Center for Evolutionary Functional Genomics
Pathwise Solutions: Warm Start vs. Cold Start
Center for Evolutionary Functional Genomics
Comparison with ProjectionL1 (Schmidt et al., 2007)
Adaptive Scheme
Center for Evolutionary Functional Genomics
Comparison with ProjectionL1 (Schmidt et al., 2007)
Adaptive Scheme
Center for Evolutionary Functional Genomics
Comparison with l1-logreg (Koh et al., 2007)
Center for Evolutionary Functional Genomics
Drosophila Gene Expression Image Analysis
BDGP
Fly-FISH
Drosophila embryogenesis is divided into 17 developmental stages (1-17)
Center for Evolutionary Functional Genomics
Sparse Logistic Regression: Application (2)
Center for Evolutionary Functional Genomics
Summary
The Lassplore algorithm for sparse logistic regression
 First-order black-box method
 Optimal convergence rate
 Adaptive line search scheme
Future work
 Apply the proposed approach for other mixed-norm
regularized optimization
 Biological image analysis
Center for Evolutionary Functional Genomics
The Lassplore Package
http://www.public.asu.edu/~jye02/Software/lassplore/
Center for Evolutionary Functional Genomics
Thank you!