Linear Regression - Lyle School of Engineering
Download
Report
Transcript Linear Regression - Lyle School of Engineering
CSE 5331/7331
Fall 2007
Regression
Margaret H. Dunham
Department of Computer Science and Engineering
Southern Methodist University
Some slides extracted from Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002.
CSE 5331/7331 F'07
© Prentice Hall
1
Table of Contents
Linear Regression
Nonlinear Regression
Logistic Regression
Metrics
CSE 5331/7331 F'07
2
Remember High School?
Y= mx + b
You need two points to determine a
straight line.
You need two points to find values for m
and b.
THIS IS REGRESSION
CSE 5331/7331 F'07
3
Regression
Predict future values based on past
values
Linear Regression assumes linear
relationship exists.
y = c 0 + c1 x 1 + … + c n x n
Find values to best fit the data
CSE 5331/7331 F'07
© Prentice Hall
4
Linear Regression
CSE 5331/7331 F'07
© Prentice Hall
5
Linear Regression
Assume data fits a predefined function
Determine best values for regression
coefficients c0,c1,…,cn.
Assume an error: y = c0+c1x1+…+cnxn+e
Estimate error using mean squared error for
training set:
CSE 5331/7331 F'07
© Prentice Hall
6
Linear Regression Poor Fit
Why
use sum of least squares?
http://curvefit.com/sum_of_squares.htm
Linear
doesn’t always work well
CSE 5331/7331 F'07
7
Nonlinear Regression
Data does not nicely fit a straight line
Fit data to a curve
Many possible functions
Not as easy and straightforward as
linear regression
How nonlinear regression works:
http://curvefit.com/how_nonlin_works.htm
CSE 5331/7331 F'07
8
Logistic Regression
Generalized linear model
Predict discrete outcome
– Binomial (binary) logistic regression
– Multinomial logistic regression
One dependent variable
Logistic Regression by Gerard E. Dallal
http://www.tufts.edu/~gdallal/logistic.htm
CSE 5331/7331 F'07
9
Logistic Regression (cont’d)
p
log(
) 0 1 x
1 p
Log Odds Function:
P is probability that outcome is 1
Odds – The probability the event occurs
divided by the probability that it does not
occur
Log Odds function is strictly increasing as p
increases
CSE 5331/7331 F'07
10
Why Log Odds?
Shape of curve is desirable
Relationship to probability
Range – to +
CSE 5331/7331 F'07
11
P-value
The probability that a variable has a
value greater than the observed value
http://en.wikipedia.org/wiki/P-value
http://sportsci.org/resource/stats/pvalue
s.html
CSE 5331/7331 F'07
12
Correlation
Examine the degree to which the values
for two variables behave similarly.
Correlation coefficient r:
• 1 = perfect correlation
• -1 = perfect but opposite correlation
• 0 = no correlation
CSE 5331/7331 F'07
© Prentice Hall
13
Covariance
Degree to which two variables vary in
the same manner
Correlation is normalized and
covariance is not
http://www.ds.unifi.it/VL/VL_EN/expect/e
xpect3.html
CSE 5331/7331 F'07
© Prentice Hall
14
Residual
Error
Difference between desired output and
predicted output
May actually use sum of squares
CSE 5331/7331 F'07
15