Linear Regression and Correlation
Download
Report
Transcript Linear Regression and Correlation
Linear Regression and
Correlation
Topic 18
Linear Regression
Is the link between two factors i.e.
one value depends on the other.
E.g. Drivers age – risk of accident.
Gender – time spent shopping
Car price – depends on age (of car)
Sales – depend on Marketing
Crickets and Temperature
Crickets make their
chirping sounds by
rapidly sliding one
wing over the other.
The faster they move
their wings, the
higher the chirping
sound that is
produced.
Crickets and Temperature
Analysing the data
First
graph the
data using
the XY
(Scatter)
option
Analysing the data
Then right
click on one
of the data
points and
select – Add
Trendline
Analysing the data
Select the
Linear
Regression
type
Analysing the data
Now right click on the Trendline and select Format
Trendline then select Options – finally select Display
equation on Chart
Analysing the data
We can now
predict the
Temperature.
Line of Best Fit
You can see differences between
the Measured Values and the
Calculated values – why?
Mean Squared Error (MSE)
The mean squared error or MSE of an
estimator is the expected value of the
square of the "error."
The error is the amount by which the
estimator differs from the quantity to be
estimated.
The difference occurs because of
randomness
or because the estimator doesn't account
for information that could produce a more
accurate estimate.
Root Mean Square Error
The root mean square error
(RMSE) is a frequently-used
measure of the difference between
values predicted by a model and the
values actually observed from the
thing being modelled or estimated.
The lower the value of the RMSE the
better the fit of observed to
calculated data.
RMSE
Stating the Error
For our Crickets we could then say:
Temperature Y = 1.8635X – 3.7532
Where X is the recorded beats per
second of the Crickets wings.
Accurate to + or – 2.07 o C
Correlation Coefficient
The correlation coefficient is a
measure of how well trends in the
predicted values follow trends in the
actual values.
It is a measure of how well the
predicted values from a forecast
model "fit" with the real-life data.
Correlation Coefficient
The correlation coefficient is a number between 0
and +/- 1.
If there is no relationship between the predicted
values and the actual values the correlation
coefficient is 0 or very low (the predicted values
are no better than random numbers).
As the strength of the relationship between the
predicted values and actual values increases, so
does the correlation coefficient.
A perfect fit gives a coefficient of +/- 1.0. Thus
the higher the correlation coefficient the better.
A demonstration
correlation
Correlation
Two main methods of calculating
correlations are:
Spearman's Rank Correlation
Coefficient and
Pearson's or the Product-Moment
Correlation Coefficient.
Spearman’s Rank Correlation
Coefficient
Spearman's Rank Correlation
Coefficient
In calculating this coefficient, we
use the Greek letter 'rho' or r
The formula used to calculate this
coefficient is:
r = 1 - (6 d2 ) / n(n2 - 1)
Pearson's or Product-Moment
Correlation Coefficient
The Pearson Correlation
Coefficient is denoted by the
symbol r. Its formula is based
on the standard deviations of
the x-values and the y-values:
Coefficient of Determination R Squared
Shows the amount
of variation in y that
depends on x
The version most
common in statistics
texts is based on an
analysis of variance
decomposition as
follows:
SST is the total sum of squares, SSR is the explained sum of
squares, and SSE is the residual sum of squares
Coefficient of Determination R Squared
Thankfully Excel
calculates this for you: