Linear Regression - Lyle School of Engineering

Download Report

Transcript Linear Regression - Lyle School of Engineering

CSE 5331/7331
Fall 2007
Regression
Margaret H. Dunham
Department of Computer Science and Engineering
Southern Methodist University
Some slides extracted from Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002.
CSE 5331/7331 F'07
© Prentice Hall
1
Table of Contents
Linear Regression
 Nonlinear Regression
 Logistic Regression
 Metrics

CSE 5331/7331 F'07
2
Remember High School?
Y= mx + b
 You need two points to determine a
straight line.
 You need two points to find values for m
and b.

THIS IS REGRESSION
CSE 5331/7331 F'07
3
Regression
Predict future values based on past
values
 Linear Regression assumes linear
relationship exists.
y = c 0 + c1 x 1 + … + c n x n
 Find values to best fit the data

CSE 5331/7331 F'07
© Prentice Hall
4
Linear Regression
CSE 5331/7331 F'07
© Prentice Hall
5
Linear Regression




Assume data fits a predefined function
Determine best values for regression
coefficients c0,c1,…,cn.
Assume an error: y = c0+c1x1+…+cnxn+e
Estimate error using mean squared error for
training set:
CSE 5331/7331 F'07
© Prentice Hall
6
Linear Regression Poor Fit
Why
use sum of least squares?
http://curvefit.com/sum_of_squares.htm
Linear
doesn’t always work well
CSE 5331/7331 F'07
7
Nonlinear Regression
Data does not nicely fit a straight line
 Fit data to a curve
 Many possible functions
 Not as easy and straightforward as
linear regression
 How nonlinear regression works:

http://curvefit.com/how_nonlin_works.htm
CSE 5331/7331 F'07
8
Logistic Regression
Generalized linear model
 Predict discrete outcome

– Binomial (binary) logistic regression
– Multinomial logistic regression
One dependent variable
 Logistic Regression by Gerard E. Dallal
http://www.tufts.edu/~gdallal/logistic.htm

CSE 5331/7331 F'07
9
Logistic Regression (cont’d)
p
log(
)   0  1 x
1 p

Log Odds Function:

P is probability that outcome is 1
Odds – The probability the event occurs
divided by the probability that it does not
occur
Log Odds function is strictly increasing as p
increases


CSE 5331/7331 F'07
10
Why Log Odds?



Shape of curve is desirable
Relationship to probability
Range –  to + 
CSE 5331/7331 F'07
11
P-value
The probability that a variable has a
value greater than the observed value
 http://en.wikipedia.org/wiki/P-value
 http://sportsci.org/resource/stats/pvalue
s.html

CSE 5331/7331 F'07
12
Correlation
Examine the degree to which the values
for two variables behave similarly.
 Correlation coefficient r:

• 1 = perfect correlation
• -1 = perfect but opposite correlation
• 0 = no correlation
CSE 5331/7331 F'07
© Prentice Hall
13
Covariance
Degree to which two variables vary in
the same manner
 Correlation is normalized and
covariance is not
 http://www.ds.unifi.it/VL/VL_EN/expect/e
xpect3.html

CSE 5331/7331 F'07
© Prentice Hall
14
Residual
Error
 Difference between desired output and
predicted output
 May actually use sum of squares

CSE 5331/7331 F'07
15