Bayesian Factor Regression Models in the “Large p, Small n
Download
Report
Transcript Bayesian Factor Regression Models in the “Large p, Small n
Bayesian Factor Regression Models
in the “Large p, Small n” Paradigm
Mike West, Duke University
Presented by: John Paisley
Duke University
Outline
Empirical Factor Regression (SVD)
Latent Factor Regression
Sparse Factor Regression
Linear Regression &
Empirical Factor Regression
Linear Regression
SVD Regression
D is a diagonal matrix of singular values
Empirical Factor Regression
By definition,
Regression is now done in factor space using
generalized shrinkage (ridge regression)
priors on , e.g. RVM
Problem of inversion:
many-to-one mapping
has
is canonical “least-norm” inverse
Example: Biscuit Dough Data
NIR spectroscopy reflectance values are predictors
Response is fat content of dough samples
39 training, 39 testing: data are pooled and testing data
responses treated as missing values to be imputed
Top 16 factors used, based on size of singular values
Example: Biscuit Dough Data (2)
Left: Fitted and predicted vs true values
Right: Least-norm inverse of beta
~ 1700 nm range is absorbance region for fat
As can be seen, solution is not sparse
Latent Factor Regression
Loosen
to
Under proper constraints on B, this finds common structure in X
and isolates idiosyncrasies to noise
Now, variation in X has less effect on y
The implied prior is
When variance, Phi 0,
this reverts to empirical
linear regression
Sparse Latent Factor Regression
WRT gene expression profiling, “multiple biological
factors underlie patterns of gene expression variation, so
latent factor approaches are natural – we imagine that
latent factors reflect individual biological functions… This
is a motivating context for sparse models.”
Columns of B represents the genes involved in a particular
biological factor.
Rows of B represent a particular gene’s involvement across
biological factors.
Example: Gene Expression Data
p = 6128 genes measured using Affymetrix DNA microarrays
n = 49 breast cancer tumor samples
k = 25 factors
Factor 3 separates by
red: estrogen receptor
positive tumors
blue: ER negative
Example: Gene Expression Data
Comparison with results obtained using empirical SVD factors
Conclusion
Sparse factor regression modeling is a
promising framework for dimensionality
reduction of predictors.
Only those factors that are relevant (e.g.
factor 3) are of interest. Therefore, only those
genes with non-zero values in that column of
B are meaningful.