Transcript Regression
Regression
Usman Roshan
CS 675
Machine Learning
Regression
• Same problem as classification except that the
target variable yi is continuous.
• Popular solutions
– Linear regression (perceptron)
– Support vector regression
– Logistic regression (for regression)
Linear regression
• Suppose target values are generated by a
function yi = f(xi) + ei
• We will estimate f(xi) by g(xi,θ).
• Suppose each ei is being generated by a Gaussian
distribution with 0 mean and σ2 variance (same
variance for all ei).
• This implies that the probability of yi given the
input xi and variables θ (denoted as p(yi|xi,θ) is
normally distributed with mean g(xi,θ) and
variance σ2.
Linear regression
• Apply maximum likelihood to estimate g(x, θ)
• Assume each (xi,yi) i.i.d.
• Then probability of data given model
(likelihood) is P(X|θ) = p(x1,y1)p(x2,y2)…p(xn,yn)
• Each p(xi,yi)=p(yi|xi)p(xi)
• p(yi|xi) is normally distributed with mean
g(xi,θ) and variance σ2
Linear regression
Maximizing the log likelihood (like for classification) gives us
least squares (linear regression)
From Alpaydin 2010
Ridge regression
• Regularized linear regression also known as
ridge regression
2
n
w
1
T
2
• Minimize: å ( yi - (w xi + w0 )) + l
n i=1
2
• Has been used in statistics for a long time to
address singularity problems in linear
regression.
• Linear regression (or least squares) solution is
also given (XTX)-1XTy
• Ridge regression is given by (XTX+λI)-1XTy
Logistic regression
• Similar to linear regression derivation
• Here we predict with the sigmoid function
instead of a linear function
• We still minimize sum of squares between
predicted and actual value
• Output yi is constrained in the range [0,1]
Support vector regression
• Makes no assumptions about probability
distribution of the data and output (like
support vector machine).
• Change the loss function in the support vector
machine problem to the e-sensitive loss to
obtain support vector regression
Support vector regression
• Solved by applying Lagrange multipliers like in
SVM
• Solution w is given by a linear combination of
support vectors (like in SVM)
• The solution w can also be used for ranking
features.
• From a risk minimization
perspective the loss
n
1
would be å max(0,| yi - (wT xi + w0 ) | -e )
n i=1
• From a regularized perspective it is 2
w
1 n
T
max(0,| yi - (w xi + w0 ) | -e ) + l
å
n i=1
2
Applications
• Prediction of continuous phenotypes in mice
from genotype (Predicting unobserved phen…)
• Data are vectors xi where each feature takes on
values 0, 1, and 2 to denote number of alleles of
a particular single nucleotide polymorphism
(SNP)
• Data has about 1500 samples and 12,000 SNPs
• Output yi is a phenotype value. For example coat
color (represented by integers), chemical levels in
blood
Mouse phenotype prediction from
genotype
• Rank SNPs by Wald test
– First perform linear regression y = wx + w0
– Calculate p-value on w using t-test
•
•
•
•
•
•
•
•
t-test: (w-wnull)/stderr(w))
wnull = 0
T-test: w/stderr(w)
stderr(w) given by Σi(yi-wxi-w0)2 /(xi-mean(xi))
– Rank SNPs by p-values
– OR by Σi(yi-wxi-w0)
Rank SNPs by Pearson correlation coefficient
Rank SNPs by support vector regression (w vector in SVR)
Rank SNPs by ridge regression (w vector)
Run SVR and ridge regression on top k ranked SNP under
cross-validation.
CD8 phenotype in mice
From Al-jouie and Roshan, ICMLA workshop, 2015
MCH phenotype in mice
Fly startle response time prediction
from genotype
• Same experimental study as previously
• Using whole genome sequence data to predict
quantitative trait phenotypes in Drosophila
Melanogaster
• Data has 155 samples and 2 million SNPs
(features)
Fly startle response time
Rice phenotype prediction from
genotype
• Same experimental study as previously
• Improving the Accuracy of Whole Genome
Prediction for Complex Traits Using the Results
of Genome Wide Association Studies
• Data has 413 samples and 36901 SNPs
(features)
• Basic unbiased linear prediction (BLUP)
method improved by prior SNP knowledge
(given in genome-wide association studies)
Different rice phenotypes