Seeing the Light - Evolving Visually Guided Robots

Download Report

Transcript Seeing the Light - Evolving Visually Guided Robots

Dimension Reduction-Based
Penalized logistic Regression
for cancer classification Using
Microarray Data
By L. Shen and E.C. Tan
Name of student: Kung-Hua Chang
Date: July 8, 2005
SoCalBSI
California State University at Los Angeles
The Chicken Project
Background
 Microarray data have the characteristics that
the number of samples ismuch less than the
number of variables.
 This causes the “curse of dimensionality”
problem.
 In order to solve this problem, many
dimension reduction methods are used such
as Singular Value Decomposition and Partial
Least Squares.
Background (cont’d)
 Singular Value Decomposition and
Partial Least Squares.
 Given a m x n matrix X that stores all
of the gene expression data. Then X
can be approximated as:
Background (cont’d)
Background (cont’d)
Logistic regression and least square
regression.
They are ways to draw a line that
can approximate a set of points.
Background (cont’d)
The difference is that logistic
regression equations are solved
iteratively. A trial equation is fitted
and tweaked over and over in order
to improve the fit. Iterations stop
when the improvement from one
step to the next is suitably small.
Least square regression can be
solved explicitly.
Background (cont’d)
Penalized logistic regression is just
a logistic regression method except
that there is a cost function
associated with it.
Background (cont’d)
Support Vector Machine (SVM)
SVM tries a find a hyper-plane that can
separate different sets of data.
Not a linear model.
Hypothesis
The combination of dimension reductionbased penalized logistic regression has
the best performance compared to
support vector machine and least squares
regression.
Data Analysis
The above table shows the number of
training/testing cases in the seven publicly
available cancer data sets.
Data Analysis (cont’d)
Data Analysis (cont’d)
Data Analysis
Data Analysis
Generally, the partial least square
based classifier uses less time than
the singular value decomposition
based classifier.
Data Analysis (cont’d)
The penalized logistic regression
training requires solving a set of
linear equations iteratively until
convergence, while the least square
regression training requires solving
a set of linear equations only once.
So it’s reasonable to see that
penalized logistic regression uses
more time than the least square
regression.
Data Analysis (cont’d)
The overall time required by partial
least squares and SVD-based
regression method is much less
than that of support vector machine.
Data Analysis
Conclusion
The combination of dimension reduction
based penalized logistic regression has the
best performance compared to support
vector machine and least squares
regression.
References
 [1] L. Shen and E.C. Tan (to appear in June, 2005) "Dimension
Reduction-Based Penalized Logistic Regression for Cancer
Classification Using Microarray Data", IEEE/ACM Trans.
Computational Biology and Bioinformatics
 [2] SoCalBSI: http://instructional1.calstatela.edu/jmomand2/
 [3] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of
Statistical Learning; Data mining, Inference and Prediction.
Springer Verlag, New York, 2001.