Transcript Document
Chapter 5
Regression
BPS - 3rd Ed.
Chapter 5
1
Objectives of Regression
To
describe the change in Y per unit
X
To predict the average level of Y at a
given level of X
BPS - 3rd Ed.
Chapter 5
2
“Returning Birds” Example
Plot data first to see if
relation can be described
by straight line
(important!)
Illustrative data from
Exercise 4.4
Y = adult birds joining
colony
X = percent of birds
returning, prior year
BPS - 3rd Ed.
Chapter 5
3
If data can be described by
straight line
…
describe relationship with equation
Y = (intercept) + (slope)(X)
May also be written:
Y = (slope)(X) + (intercept)
Intercept where line crosses Y axis
Slope “angle” of line
BPS - 3rd Ed.
Chapter 5
4
Linear Regression
Algebraic line every point falls on line:
exact y = intercept + (slope)(X)
line scatter cloud suggests a
linear trend:
Statistical
“predicted y” = intercept + (slope)(X)
BPS - 3rd Ed.
Chapter 5
5
Regression Equation
ŷ = a + bx, where
– ŷ (“y-hat”) is the predicted value of Y
– a is the intercept
The TI calculators
reverse a & b!
– b is the slope
– x is a value for X
Determine
BPS - 3rd Ed.
a & b for “best fitting line”
Chapter 5
6
What Line Fits Best?
If we try to draw the
line by eye, different
people will draw
different lines
We need a method
to draw the “best
line”
This method is
called “least
squares”
BPS - 3rd Ed.
Chapter 5
7
The “least squares” regression line
Each point has:
Residual
= observed y – predicted y
= distance of point from prediction line
The least squares line
minimizes the sum of
the square residuals
BPS - 3rd Ed.
Chapter 5
8
Calculating Least Squares
Regression Coefficients
Formula
(next slide)
Technology
– TI-30XIIS
– Two variable Applet
– Other
BPS - 3rd Ed.
Chapter 5
9
Formulas
b
= slope coefficient
a = intercept coefficient
sy
br
sx
a y bx
where sx and sy are the standard deviations of
the two variables, and r is their correlation
BPS - 3rd Ed.
Chapter 5
10
Technology: Calculator
BEWARE!
TI calculators label the slope and
intercept backwards!
BPS - 3rd Ed.
Chapter 5
11
Regression Line
For
the “bird data”:
a = 31.9343
b = 0.3040
The linear regression equation is:
ŷ = 31.9343 0.3040x
The slope (-0.3040) represents the
average change in Y per unit X
BPS - 3rd Ed.
Chapter 5
12
Use of Regression for Prediction
Suppose an individual colony has 60% returning (x =
60). What is the predicted number of new birds for
this colony?
Answer:
ŷ = a + bx = 31.9343 (0.3040)(60) = 13.69
Interpretation: the regression model predicts
13.69 new birds (ŷ) for a colony with x = 60.
BPS - 3rd Ed.
Chapter 5
13
Prediction via Regression Line
Number of new birds and Percent returning
When X = 60,
the regression
model predicts
Y = 13.69
BPS - 3rd Ed.
Chapter 5
14
Case Study
Per Capita Gross Domestic Product
and Average Life Expectancy for
Countries in Western Europe
BPS - 3rd Ed.
Chapter 5
15
Regression Calculation
Case Study
Country
Austria
Belgium
Finland
France
Germany
Ireland
Italy
Netherlands
Switzerland
United Kingdom
BPS - 3rd Ed.
Per Capita GDP (x)
21.4
23.2
20.0
22.7
20.8
18.6
21.5
22.0
23.8
21.2
Chapter 5
Life Expectancy (y)
77.48
77.53
77.32
78.63
77.17
76.39
78.51
78.15
78.99
77.37
16
Life Expectancy and GDP
(Europe)
Case Study (Life Expectancy)
Life expectancy (yrs)
79
78
77
76
18
19
20
21
22
23
24
Per Capital GDP
BPS - 3rd Ed.
Chapter 5
17
Regression Calculation
by Hand (Life Expectancy Study)
Calculations:
x 21.52
s x 1.532
y 77.754
sy 0.795
r 0.809
0.795
br
(0.809)
0.420
sx
1.532
a y bx 77.754 - (0.420)(21 .52) 68.716
sy
ŷ = 68.716 + 0.420x
BPS - 3rd Ed.
Chapter 5
18
BPS/3e Two Variable Applet
BPS - 3rd Ed.
Chapter 5
19
Applet: Data Entry
BPS - 3rd Ed.
Chapter 5
20
Applet: Calculations
BPS - 3rd Ed.
Chapter 5
21
Applet: Scatterplot
BPS - 3rd Ed.
Chapter 5
22
Applet: least squares line
BPS - 3rd Ed.
Chapter 5
23
Interpretation
Life Expectancy Case Study
ŷ = 68.716 + (0.420)X
Slope: For each increase in GDP
0.420 years increase in life expectancy
Prediction example: What is the life
expectancy in a country with a GDP of
20.0?
ANSWER:
ŷ = 68.716 + (0.420)(20.0) = 77.12
Model:
BPS - 3rd Ed.
Chapter 5
24
Coefficient of Determination (R2)
(Fact 4 on p. 111)
“Coefficient
of determination, (R2)
Quantifies the fraction of the Y “mathematically
explained” by X
Examples:
r=1:
R2=1:
r=.7:
R2=.49: regression line explains almost half
(49%) of the variation in Y
BPS - 3rd Ed.
regression line explains all (100%) of
the variation in Y
Chapter 5
25
We are NOT going to cover the analysis of
residual plots (pp. 113-116)
BPS - 3rd Ed.
Chapter 5
26
Outliers and Influential Points
An
outlier is an observation that lies far
from the regression line
Outliers
in the y direction have large
residuals
Outliers
in the x direction are influential
– removal of influential point would markedly
change the regression and correlation values
BPS - 3rd Ed.
Chapter 5
27
Outliers:
Case Study
Gesell Adaptive Score and Age at First Word
After removing
child 18
r2 = 11%
From all the data
r2 = 41%
BPS - 3rd Ed.
Chapter 5
28
Cautions
About Correlation and Regression
Describe
Are
only linear relationships
influenced by outliers
Cannot
be used to predict beyond the
range of X (do not extrapolate)
Beware
of lurking variables (variables other
than X and Y)
– Association does not always equal causation!
BPS - 3rd Ed.
Chapter 5
29
Do not extrapolate (Sarah’s height)
Sarah’s
BPS - 3rd Ed.
100
height (cm)
height is
plotted against her
age
Can you predict her
height at age 42
months?
Can you predict her
height at age 30
years (360 months)?
Chapter 5
95
90
85
80
30 35 40 45 50 55 60 65
age (months)
30
Do not extrapolate (Sarah’s height)
Regression
ŷ = 71.95 + .383(42) = 88
(Reasonable)
At age 360 months:
210
190
height (cm)
equation:
ŷ = 71.95 + .383(X)
At age 42 months:
170
150
130
110
ŷ = 71.95 + .383(360) =
209.8
(That’s over 17 feet
tall!)
BPS - 3rd Ed.
Chapter 5
90
70
30
90 150 210 270 330 390
age (months)
31
Caution: Correlation does not always
mean causation
Even very strong correlations
may not correspond to a causal
relationship between x and y
(Beware of the lurking variable!)
BPS - 3rd Ed.
Chapter 5
32