Correlation and Regression
Download
Report
Transcript Correlation and Regression
Correlation and
Regression
It’s the Last Lecture
Hooray!
Correlation
• Analyze Correlate Bivariate…
– Click over variables you wish to
correlate
– Options Can select descriptives and
pairwise vs. listwise deletion
• Pairwise deletion – only cases with data for
all variables are included (default)
• Listwise deletion - only cases with data for
both variables are included
Correlation
• Assumptions:
– Linear relationship between variables
• Inspect scatterplot
– Normality
• Shapiro-Wilk’s W
• Other issues:
– Range restriction & Heterogenous subgroups
• Identified methodologically
– Outliers
• Inspect scatterplot
Correlations
Time 1 Generality=Mean
of all ASQ Stability and
Globality s cores for bad
events
Time 2 Generality=Mean
of all ASQ Stability and
Globality s cores for bad
events
Total BDI Score=Sum of
all 21 BDI items
Total BDI Score=Sum of
all 21 BDI items
Time 1
Generality=
Mean of all
ASQ Stability
and Globality
s cores for bad
events
Pears on Correlation
1
Sig. (2-tailed)
N
112
Pears on Correlation
Sig. (2-tailed)
.507**
N
Time 2
Generality=
Mean of all
ASQ Stability
Total BDI
and Globality
Score=Sum
s cores for bad
of all 21 BDI
events
items
.507**
.097
Total BDI
Score=Sum
of all 21 BDI
items
.131
.000
.310
.198
98
112
98
1
.098
.160
.335
.116
.000
98
98
98
98
Pears on Correlation
Sig. (2-tailed)
N
.097
.310
.098
.335
1
.692**
.000
112
98
112
Pears on Correlation
Sig. (2-tailed)
N
.131
.198
98
.160
.116
98
**. Correlation is s ignificant at the 0.01 level (2-tailed).
.692**
.000
98
98
1
98
Correlation
• Partial Correlation – removes
variance from a 3rd variable, like
ANCOVA
– Analyze Correlate Partial…
Correlations
Control Variables
-none-a
Time 2 Generality=Mean
of all ASQ Stability and
Globality s cores for bad
events
Correlation
Total BDI Score=Sum of
all 21 BDI items
Correlation
Significance (2-tailed)
df
Correlation
Significance (2-tailed)
df
Time 1 Generality=Mean
of all ASQ Stability and
Globality s cores for bad
events
Time 1 Generality=Mean
of all ASQ Stability and
Globality s cores for bad
events
Significance (2-tailed)
df
Time 2 Generality=Mean
of all ASQ Stability and
Globality s cores for bad
events
Correlation
Total BDI Score=Sum of
all 21 BDI items
Correlation
Significance (2-tailed)
df
a. Cells contain zero-order (Pearson) correlations.
Significance (2-tailed)
df
Time 2
Generality=
Mean of all
ASQ Stability
and Globality
s cores for bad
events
1.000
Total BDI
Score=Sum
of all 21 BDI
items
.160
Time 1
Generality=
Mean of all
ASQ Stability
and Globality
s cores for bad
events
.507
.
.116
.000
0
96
96
.160
.116
96
.507
.000
1.000
.
0
.131
.198
.131
.198
96
1.000
.
96
96
0
1.000
.109
.
.286
0
95
.109
.286
95
1.000
.
0
Regression
• Analyze Regression Linear…
– Use if both predictor(s) and criterion
variables are continuous
– Dependent = Criterion
– Independent = Predictor(s)
– Statistics…
• Regression Coefficients (b & β)
– Estimates
– Confidence intervals
– Covariance matrix
Regression
– Statistics…
•
•
•
•
•
Model fit
R square change
Descriptives
Part and partial correlations
Collinearity diagnostics
– Recall that you don’t want your predictors to be too
highly related to one another
– Collinearity/Mulitcollinearity – when predictors are too
highly correlated with one another
– Eigenvalues of the scaled and uncentered crossproducts matrix, condition indices, and variancedecomposition proportions are displayed along with
variance inflation factors (VIF) and tolerances for
individual predictors
– Tolerances should be > .2; VIF should be < 4
Regression
– Statistics…
• Residuals
– Durbin-Watson
» Tests correlation among residuals (i.e.
autocorrelation) - significant correlation implies
nonindependent data
» Clicking on this will also display a histogram of
residuals, a normal probability plot of residuals,
and the case numbers and standardized residuals
for the 10 cases with the largest standardized
residuals
– Casewise diagnostics
» Identifies outliers according to pre-specified criteria
Regression
– Plots…
• Plot standardized residuals (*ZRESID) on yaxis and standardized predicted values
(*ZPRED) on x-axis
• Check “Normal probability plot” under
“Standardized Residual Plots”
Regression
• Assumptions:
– Observations are independent
– Linearity of Regression
• Look for residuals that get larger at extreme
values, i.e. if residual are normally
distributed
– Save unstandardized residuals
» Click Save… Under “Residuals” click
“Unstandardized” when you run your
regression,
– Run a Shapiro-Wilk’s W test on this variable
(RES_1)
Regression
– Normality in Arrays
• Examine normal probability plot of the residuals,
residuals should resemble normal distribution curve
BAD
GOOD
Regression
– Homogeneity of Variance in Arrays
• Look for residuals getting more spread out as a
function of predicted value – i.e. cone shaped patter
in plot of standardized residuals vs. standardized
predicted values
BAD
GOOD
Regression Output
Variables Entered/Removedb
Descriptive Statistics
Time 2 Total BDI Score
Time 1 Total BDI Score
Time 1 Pess imis m
Mean
7.8980
9.17
4.7819
Std. Deviation
6.86916
6.430
.44929
N
98
98
98
Model
1
Variables
Entered
Time 1
Pes simis
m, Time 1
Total BDI
a
Score
Variables
Removed
.
Method
Enter
a. All reques ted variables entered.
b. Dependent Variable: Time 2 Total BDI Score
Correlations
Pears on Correlation
Sig. (1-tailed)
N
Time 2 Total BDI Score
Time 1 Total BDI Score
Time 1 Pess imis m
Time 2 Total BDI Score
Time 1 Total BDI Score
Time 1 Pess imis m
Time 2 Total BDI Score
Time 1 Total BDI Score
Time 1 Pess imis m
Time 2 Total
BDI Score
1.000
.692
.131
.
.000
.099
98
98
98
Time 1 Total
BDI Score
.692
1.000
.084
.000
.
.205
98
98
98
Time 1
Pes simism
.131
.084
1.000
.099
.205
.
98
98
98
Regression Output
Model Summaryb
Change Statis tics
Model
1
R
R Square
.696 a
.484
Adjus ted
R Square
.473
Std. Error of
the Es timate
4.98632
R Square
Change
.484
F Change
44.543
df1
df2
2
95
Sig. F Change
.000
DurbinWatson
1.951
a. Predictors : (Cons tant), Time 1 Pes simism, Time 1 Total BDI Score
b. Dependent Variable: Time 2 Total BDI Score
ANOVAb
Model
1
Regress ion
Res idual
Total
Sum of
Squares
2214.959
2362.020
4576.980
df
2
95
97
Mean Square
1107.480
24.863
F
44.543
Sig.
.000 a
a. Predictors : (Constant), Time 1 Pes s imis m, Time 1 Total BDI Score
b. Dependent Variable: Time 2 Total BDI Score
Coefficientsa
Model
1
(Cons tant)
Time 1 Total BDI Score
Time 1 Pess imis m
Uns tandardized
Coefficients
B
Std. Error
-4.193
5.419
.732
.079
1.123
1.131
a. Dependent Variable: Time 2 Total BDI Score
Standardized
Coefficients
Beta
.686
.073
t
-.774
9.270
.993
Sig.
.441
.000
.323
Zero-order
.692
.131
Correlations
Partial
.689
.101
Part
.683
.073
Collinearity Statis tics
Tolerance
VIF
.993
.993
1.007
1.007
Regression Output
Collinearity Diagnosticsa
Model
1
Dimens ion
1
2
3
Eigenvalue
2.761
.235
.004
Condition
Index
1.000
3.428
25.224
Variance Proportions
Time 1 Total
Time 1
(Cons tant)
BDI Score
Pes simism
.00
.04
.00
.01
.96
.01
.99
.00
.99
a. Dependent Variable: Time 2 Total BDI Score
Residuals Statisticsa
Predicted Value
Res idual
Std. Predicted Value
Std. Res idual
Minimum
.4173
-9.50501
-1.565
-1.906
Maximum
24.5050
19.97385
3.475
4.006
a. Dependent Variable: Time 2 Total BDI Score
Mean
7.8980
.00000
.000
.000
Std. Deviation
4.77856
4.93465
1.000
.990
N
98
98
98
98
Logistic Regression
• Analyze Regression Binary
Logistic…
– Use if criterion is dichotomous [no
assumptions about predictor(s)]
– Use “Multinomial Logistic…” if criterion
polychotomous (3+ groups)
• Don’t worry about that though
Logistic Regression
• Assumptions:
– Observations are independent
– Criterion is dichotomous
• No stats needed to show either one of these
• Important issues:
– Outliers
• Save Influence Check “Cook’s” and “Leverage
values”
• Cook’s statistic – outlier = any variable > 4/(n-k-1),
where n = # of cases & k = # of predictors
• Leverage values – outlier = anything > .5
Logistic Regression
– Multicollinearity
• Tolerance and/or VIF statistics aren’t easily
obtained with SPSS, so you’ll just have to
let this one go
• Options…
– Classification plots
• Table of actual # of S’s in each criterion
group vs. predicted group membership –
Shows, in detail, how well regression
predicted data
Logistic Regression
• Options…
– Hosmer-Lemeshow goodness-of-fit
• More robust than traditional χ2 goodness-offit statistic, particularly for models with
continuous covariates and small sample
sizes
– Casewise listing of residuals
• Helps ID cases with large residuals (outliers)
Logistic Regression
• Options…
– Correlations of estimates
• Just what it sounds like, correlations among
predictors
– Iteration history
– CI for exp(B)
• Provides confidence intervals for standardized logistic
regression coefficient
• Categorical…
– If any predictors are discrete, they must be
identified here, as well as which group is the
reference group (identified as 0 vs. 1)
Logistic Regression Output
Case Processing Summary
a
Unweighted Cas es
Selected Cases
Included in Analysis
Mis sing Cas es
Total
Uns elected Cases
Total
N
112
0
112
0
112
Percent
100.0
.0
100.0
.0
100.0
a. If weight is in effect, s ee clas sification table for the total
number of cas es .
Iteration Historya, b,c
Iteration
Step
1
0
2
3
4
5
-2 Log
likelihood
87.117
84.443
84.397
84.397
84.397
Coefficients
Constant
1.500
1.885
1.945
1.946
1.946
a. Constant is included in the model.
b. Initial -2 Log Likelihood: 84.397
c. Estimation terminated at iteration number 5 because
parameter estimates changed by les s than .001.
Dependent Variable Encoding
Original Value
Attritor
Non-Attritor
Internal Value
0
1
Logistic Regression Output
Classification Tablea, b
Predicted
Step 0
Obs erved
attritor
attritor
Attritor
Non-Attritor
0
14
0
98
Attritor
Non-Attritor
Percentage
Correct
.0
100.0
87.5
Overall Percentage
a. Cons tant is included in the model.
b. The cut value is .500
Variables in the Equation
Step 0
Cons tant
B
1.946
S.E.
.286
Wald
46.385
df
Sig.
.000
1
Exp(B)
7.000
Variables not in the Equation
Step
0
Variables
Overall Statis tics
t1gen
t1bidtot
Score
1.905
.892
3.078
df
1
1
2
Sig.
.168
.345
.215
Logistic Regression Output
Iteration Historya, b,c,d
-2 Log
likelihood
85.242
81.564
81.407
81.407
81.407
Iteration
Step
1
1
2
3
4
5
Coefficients
t1gen
.410
.777
.920
.930
.930
Constant
-.254
-1.417
-1.942
-1.981
-1.981
t1bidtot
-.021
-.038
-.043
-.044
-.044
a. Method: Enter
b. Constant is included in the model.
Model Summary
c. Initial -2 Log Likelihood: 84.397
d. Estimation terminated at iteration number 5 because
parameter estimates changed by les s than .001.
Step
1
-2 Log
Cox & Snell
likelihood
R Square
a
81.407
.026
Nagelkerke
R Square
.050
a. Estimation terminated at iteration number 5 because
parameter estimates changed by les s than .001.
Omnibus Tests of Model Coefficients
Step 1
Step
Block
Model
Chi-square
2.990
2.990
2.990
df
2
2
2
Sig.
.224
.224
.224
Logistic Regression Output
Hosmer and Lemeshow Test
Step
1
Chi-square
6.603
df
Sig.
.580
8
Contingency Table for Hosmer and Lemeshow Test
Step
1
1
2
3
4
5
6
7
8
9
10
attritor = Attritor
Obs erved Expected
4
2.776
1
1.993
0
1.672
3
1.424
1
1.310
2
1.318
1
1.042
1
.925
1
.819
0
.722
attritor = Non-Attritor
Obs erved Expected
7
8.224
10
9.007
11
9.328
8
9.576
10
9.690
10
10.682
10
9.958
10
10.075
10
10.181
12
11.278
Total
11
11
11
11
11
12
11
11
11
12
Classification Tablea
Predicted
Step 1
Obs erved
attritor
Overall Percentage
a. The cut value is .500
Attritor
Non-Attritor
attritor
Attritor
Non-Attritor
0
14
0
98
Percentage
Correct
.0
100.0
87.5
Logistic Regression Output
Variables in the Equation
Step
a
1
t1gen
t1bidtot
Constant
B
.930
-.044
-1.981
S.E.
.639
.041
2.960
Wald
2.114
1.152
.448
df
1
1
1
Sig.
.146
.283
.503
a. Variable(s ) entered on s tep 1: t1gen, t1bidtot.
Correlation Matrix
Step
1
Constant
t1gen
t1bidtot
Constant
1.000
-.985
-.038
t1gen
-.985
1.000
-.108
t1bidtot
-.038
-.108
1.000
Exp(B)
2.534
.957
.138
95.0% C.I.for EXP(B)
Lower
Upper
.724
8.873
.884
1.037
Logistic Regression Output
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Step number: 1
Observed Groups and Predicted Probabilities
32 ô
ô
ó
ó
ó
ó
F
ó
ó
R
24 ô
ô
E
ó
N ó
Q
ó
N ó
U
ó
NN ó
E
16 ô
NNNN ô
N
ó
NNNN ó
C
ó
NNNN ó
Y
ó
N NNNN ó
8ô
NNNNNN ô
ó
N NNNNNN ó
ó
N NN NNANNNN ó
ó
N N AA ANNAAAAAANN ó
Predicted òòòòòòòòòòòòòòôòòòòòòòòòòòòòòôòòòòòòòòòòòòòòôòòòòòòòòòòòòòòò
Prob: 0
.25
.5
.75
1
Group: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Predicted Probability is of Membership for Non-Attritor
The Cut Value is .50
Symbols: A - Attritor
N - Non-Attritor
Each Symbol Represents 2 Cases.