A Broad Overview of Key Statistical Concepts

Download Report

Transcript A Broad Overview of Key Statistical Concepts

Diagnostics – Part II
Using statistical tests to check to see
if the assumptions we made about the
model are realistic
Diagnostic methods
• Some simple (but subjective) plots. (Then)
• Some formal statistical tests. (Now)
Simple linear regression model
The response Yi is a function of a systematic linear
component and a random error component:
Yi   0  1 X i   i
with assumptions that:
•
•
•
•
Error terms have mean 0, i.e., E(i) = 0.
i and j are uncorrelated (independent).
Error terms have same variance, i.e., Var(i) = 2.
Error terms i are normally distributed.
Why should we keep NAGGING
ourselves about the model?
• All of the estimates, confidence intervals,
prediction intervals, hypothesis tests, etc.
have been developed assuming that the
model is correct.
• If the model is incorrect, then the formulas
and methods we use are at risk of being
incorrect. (Some are more forgiving than
others.)
Summary of the tests we’ll learn …
• Durbin-Watson test for detecting correlated
(adjacent) error terms.
• Modified Levene test for constant error
variance.
• (Ryan-Joiner) correlation test for normality
of error terms.
The Durbin-Watson test for
uncorrelated (adjacent) error terms
n
Durbin-Watson test statistic
D
 e
t 2
 et 1 
2
t
n
2
e
t
t 1
Compare D to Durbin-Watson test bounds in Table B.7:
• If D > upper bound (dU), conclude no correlation.
• If D < lower bound (dL), conclude positive correlation.
• If D is between the two bounds, the test is inconclusive.
Example: Blaisdell Company
Regression Plot
Company = -1.45475 + 0.176283 Industry
S = 0.0860563
R-Sq = 99.9 %
R-Sq(adj) = 99.9 %
Seasonally adjusted
quarterly data, 1988 to
1992.
Company Sales ($ millions)
29
28
27
26
25
Reasonable fit, but are
the error terms positively
auto-correlated?
24
23
22
21
130
140
150
Industry Sales
($ millions)
160
170
Blaisdell Company Example:
Durbin-Watson test
• Stat >> Regression >> Regression. Under
Options…, select Durbin-Watson statistic.
• Durbin-Watson statistic = 0.73
• Table B.7 with level of significance α=0.01,
(p-1)=1 predictor variable, and n=20 (5 years,
4 quarters each) gives dL= 0.95 and dU=1.15.
• Since D=0.73 < dL=0.95, conclude error terms
are positively auto-correlated.
For completeness’ sake … one more
thing about Durbin-Watson test
• If test for negative auto-correlation is
desired, use D*=4-D instead. If D* < dL,
then conclude error terms are negatively
auto-correlated.
• If two-sided test is desired (both positive
and negative auto-correlation possible),
conduct both one-sided tests, D and D*,
separately. Level of significance is then 2α.
Modified Levene Test for
nonconstant error variance
• Divide the data set into two roughly equal-sized
groups, based on the level of X.
• If the error variance is either increasing or
decreasing with X, the absolute deviations of the
residuals around their group median will be larger
for one of the two groups.
• Two-sample t* to test whether mean of absolute
deviations for one group differs significantly from
mean of absolute deviations for second group.
Modified Levene Test in Minitab
• Use Manip >> Code >> Numeric to numeric … to
create a GROUP variable based on the values of X.
• Stat >> Regression >> Regression. Under Storage
…, select residuals.
• Stat >> Basic statistics >> 2 Variances … Specify
Samples (RESI1) and Subscripts (GROUP). Select
OK. Look in session window for Levene P-value.
Example: How is plutonium activity
related to alpha particle counts?
Regression Plot
alpha = 0.0070331 + 0.0055370 plutonium
S = 0.0125713
R-Sq = 91.6 %
R-Sq(adj) = 91.2 %
0.15
alpha
0.10
0.05
0.00
0
10
plutonium
20
A residual versus fits plot suggesting
non-constant error variance
Plutonium Alpha Example:
Modified Levene’s Test
Levene's Test (any continuous distribution)
Test Statistic: 9.452
P-Value
: 0.006
It is highly unlikely (P=0.006) that we’d get such an extreme
Levene statistic (L=9.452) if the variances of the two groups
were equal.
Reject the null hypothesis at the 0.01 level, and conclude that
the error variances are not constant.
(Ryan-Joiner) Correlation test for
normality of error terms in Minitab
• H0: Error terms are normally distributed vs.
HA: Error terms are not normally distributed
• Stat >> Regression >> Regression. Under
storage…, select residuals.
• Stat >> Basic statistics >> Normality Test.
Select residuals (RESI1) and request RyanJoiner test. Select OK.
100 chi-square (1 df) data values
40
Percent
30
20
10
0
0
5
chi
10
Normal probability plot and test for
100 chi-square (1 df) data values
100 normal(0,1) data values
Percent
20
10
0
-2.5 -2.0 -1.5 -1.0 -0.5 -0.0 0.5
normal
1.0 1.5 2.0 2.5
Normal probability plot and test for
100 normal(0,1) data values
Normal probability plot for Tree
diameter (X) and C-dating Age (Y)
Normal Probability Plot of the Residuals
(response is Age)
2
Normal Score
1
0
-1
-2
-400
-200
0
200
Residual
400
600
Tree diameter and Age Example:
Ryan-Joiner Correlation Test
Some closing comments
• Checking of assumptions is important, but
be aware of the “robustness” of your
methods, so you don’t get too hung up.
• Model checking is an art as well as a
science.
• Do not think that there is some definitive
correct answer “in the back of the book.”
• Use your knowledge of the subject matter.