13-w11-stats250-bgunderson-chapter-14

Transcript 13-w11-stats250-bgunderson-chapter-14

Author(s): Brenda Gunderson, Ph.D., 2011
License: Unless otherwise noted, this material is made available under the
terms of the Creative Commons Attribution–Non-commercial–Share
Alike 3.0 License: http://creativecommons.org/licenses/by-nc-sa/3.0/
We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your
ability to use, share, and adapt it. The citation key on the following slide provides information about how you
may share and adapt this material.
Copyright holders of content included in this material should contact [email protected] with any
questions, corrections, or clarification regarding the use of content.
For more information about how to cite these materials visit http://open.umich.edu/education/about/terms-of-use.
Any medical information in this material is intended to inform and educate and is not a tool for self-diagnosis
or a replacement for medical evaluation, advice, diagnosis or treatment by a healthcare professional. Please
speak to your physician if you have questions about your medical condition.
Viewer discretion is advised: Some medical content is graphic and may not be suitable for all viewers.
Attribution Key
for more information see: http://open.umich.edu/wiki/AttributionPolicy
Use + Share + Adapt
{ Content the copyright holder, author, or law permits you to use, share and adapt. }
Public Domain – Government: Works that are produced by the U.S. Government. (17 USC § 105)
Public Domain – Expired: Works that are no longer protected due to an expired copyright term.
Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the public domain.
Creative Commons – Zero Waiver
Creative Commons – Attribution License
Creative Commons – Attribution Share Alike License
Creative Commons – Attribution Noncommercial License
Creative Commons – Attribution Noncommercial Share Alike License
GNU – Free Documentation License
Make Your Own Assessment
{ Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. }
Public Domain – Ineligible: Works that are ineligible for copyright protection in the U.S. (17 USC § 102(b)) *laws in
your jurisdiction may differ
{ Content Open.Michigan has used under a Fair Use determination. }
Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act. (17 USC § 107) *laws in your
jurisdiction may differ
Our determination DOES NOT mean that all uses of this 3rd-party content are Fair Uses and we DO NOT guarantee that
your use of the content is Fair.
To use this content you should do your own independent analysis to determine whether or not your use will be Fair.
Recall our Regression Example Exam 2 vs Final
Exam 2
33
65
44
64
60
40
Final
53
80
78
93
88
58
Least Squares
Regression Line:
yˆ  21.67  1.046( x)
r2 = 0.791
r = 0.889
On to Inference: Sample reg line vs Population reg line
On to Inference: Sample versus Population

pg 192
Regression Line for the Sample
From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012.
Used with permission.
On to Inference: Sample versus Population

Regression Line for the Population
From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012.
Used with permission.
Inference in Linear Regression
Linear Model:
Response y = [b0 + b1(x)] + e
= [Population relationship] + Randomness
For each x, the population of y values are
normally distributed
with some mean (may depend on x in linear way)
and a std deviation s that does not depend on x
From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012.
Used with permission.
Inference in Linear Regression
For each x, the population of y values are
normally distributed with some mean (may depend on x in linear
way) and a std deviation s that does not depend on x
Inference in Linear Regression
e’s = true error terms (not observe),
and have normal distribution
with mean 0 and std deviation s.
We cannot see e’s --but can see residuals (observed errors);
so use residuals to assess if all ok about true
error assumptions.
Goals in Regression:
pg 194
1.
Estimate regression line based on data.
2.
Measure strength of the linear relationship
with the correlation.
3.
Use estimated equation for predictions.
4.
Assess if the linear relationship is
statistically significant.
5.
Provide interval estimates (CIs) for our predictions.
6.
Understand and check the assumptions
of our model.
Estimating Std Dev for Regression

Measuring the average size of the residuals.

s=
Note: Why n – 2?
Estimating the Standard Deviation:
Exam 2 and Final Exam Scores
Model Summaryb
Model
1
Adjus ted
R Square
.738
R
R Square
.889 a
.791
Std. Error of
the Es timate
8.24671
a. Predictors : (Constant), exam 2 s cores (out of 75)
b. Dependent Variable: final exam scores (out of 100)
ANOVAb
Model
1
Regress ion
Res idual
Total
Sum of
Squares
1027.967
272.033
1300.000
df
1
4
5
Mean Square
1027.967
68.008
a. Predictors : (Constant), exam 2 s cores (out of 75)
b. Dependent Variable: final exam scores (out of 100)
F
15.115
Sig.
.018 a
Significant Linear Relationship?
(pg 195)
H0: b1 = 0 versus Ha: b1 ≠ 0

What happens if the null hypothesis is true?
t
sample statistic - null value
standard error of the sample statistic
t-test for the population slope b1
To test H0: b1 = 0
we would use
where
b1  0
t
s.e.(b1 )
SE (b1) 
s
2


x

x

and degrees of freedom for t-distribution are n – 2.
Could be modified to test a variety of hypotheses.
Try It! Significant Linear Relationship
between Exam 2 and Final Scores?
Is there a significant (non-zero) linear relationship
between exam 2 score and final exam score?
Is exam 2 a useful linear predictor for final score?
Test H0: b1 = 0 versus Ha: b1 ≠ 0 at the 5% level.
A = Yes or B = No

pg 196
Based on previous t-test at 5% significance
level, do you think a 95% confidence interval
for true slope would contain the value of 0?
Exam 2 and Final Exam Scores
Confidence Interval for population slope b1
b1  t * SE b1 
where df = n-2 for the t* value
Compute the 95% CI for the population slope
Could you interpret the 95% confidence level.?
Inference about the population slope using SPSS
Coefficientsa
Model
1
(Cons tant)
exam 2 scores (out of 75)
Uns tandardized
Coefficients
B
Std. Error
21.667
14.125
1.046
.269
a. Dependent Variable: final exam s cores (out of 100)
Standardized
Coefficients
Beta
.889
t
1.534
3.888
Sig.
.200
.018
SPSS ANOVA F-test for Regression
Note: Third way to test H0: b1 = 0 versus Ha: b1 ≠ 0
ANOVAb
Model
1
Regress ion
Res idual
Total
Sum of
Squares
1027.967
272.033
1300.000
df
1
4
5
Mean Square
1027.967
68.008
a. Predictors : (Constant), exam 2 s cores (out of 75)
b. Dependent Variable: final exam scores (out of 100)
F
15.115
Sig.
.018 a
Recap pages 195-196
Learning about the popul slope b1
1. T-test for b1 …
b1  null
t
s.e.(b1 )
df = n – 2
2. CI for b1 … b1  t * SE b1 
df = n – 2
3. F-test for b1 … F  MSRegr
MSerror
df = 1, n – 2
Which of the following could be used to test
H0: b1 = 0 vs Ha: b1 ≠ 0? Select all that apply.
A)
B)
C)
t-test
CI
F-test
Which of the following could be used to test
H0: b1 = 2 vs Ha: b1 ≠ 2? Select all that apply.
A)
B)
C)
t-test
CI
F-test
Which of the following could be used to test
H0: b1 = 0 vs Ha: b1 > 0? Select all that apply.
A)
B)
C)
t-test
CI
F-test
Predicting for Individuals versus Estimating the Mean
yˆ  21.67  1.046( x)

How would you predict the final exam score
for Barb who scored 60 points on exam 2?

How would you estimate the mean final exam score
for all students who scored 60 points on exam 2?
 estimate for predicting a future observation
and for estimating the mean response are same.
What about their standard errors?
Predicting for Individuals versus Estimating the Mean
A population of individuals and a population of means…




Std dev for a population of individuals?
Std dev for a population of means?
Which standard deviation is larger?
So a prediction interval for an individual response
will be (wider or narrower) than a
confidence interval for a mean response.
Predicting for Individuals versus Estimating the Mean
Confidence interval for a mean response:
yˆ  t *s.e.(fit)
where
(x  x) 2
1
s.e.(fit )  s

n   x i  x 2
df = n – 2
Prediction interval for an individual response:
yˆ  t * s.e.(pred)
where
s.e.(pred)  s 2  s.e.(fit ) 
2
df = n – 2
Try It! Exam 2 versus Final Exam

Construct a 95% CI for mean final exam score for all
students who scored x = 60 points on exam 2.
2
Recall:
n = 6, x  51 ,  x  x   S XX  940 ,
ŷ  21.67 +1.046(x), and s = 8.24761.
Confidence interval for a mean response:
yˆ  t *s.e.(fit)
where
(x  x) 2
1
s.e.(fit )  s

n   x i  x 2
Prediction interval for an individual response:
yˆ  t * s.e.(pred)
df = n – 2
Try It! Exam 2 versus Final Exam

Construct a 95% PI for the final exam score for a student
who scored x = 60 points on exam 2.
Confidence interval for a mean response:
yˆ  t *s.e.(fit)
where
(x  x) 2
1
s.e.(fit )  s

n   x i  x 2
df = n – 2
Prediction interval for an individual response:
yˆ  t * s.e.(pred)
where
s.e.(pred)  s 2  s.e.(fit ) 
2
df = n – 2

13-w11-stats250-bgunderson-chapter-14

Transcript 13-w11-stats250-bgunderson-chapter-14

Directory