Data Analysis - University of Western Ontario

download report

Transcript Data Analysis - University of Western Ontario

Soc 3306a Lecture 9:
Multivariate 2
More on Multiple Regression:
Building a Model and Interpreting
Assumptions for Multiple Regression
Random sample
 Distribution of y is relatively normal
 Check
Standard deviation of y is constant for
each value of x
 Check
scatterplots (Figure 1)
Problems to Watch For…
Violation of assumptions, especially
normality of DV and heteroscedasticity
(Figure 1)
 Simpson’s Paradox
 Multicollinearity
Building a Model in SPSS (Figure 2)
Should be driven by your theory
You can add your variables on at a time, checking
at each step whether there is significant
improvement in the explanatory power of the
model. Use Method=Enter.
In Block 1, enter your main IV. Under Statistics, ask
for R2 change.
Click next, and enter additional IV.
Check the Change Statistics in the Model
Summary watch changes in R2 and coefficients
(esp. partial correlations) carefully.
Multiple Correlation R (Figure 1)
Measures correlation of all IV’s with DV
 Is the correlation of y values with the
predicted y values
 Always positive (between 0 and +1)
Coefficient of Determination R2
Measures the proportional reduction in
error (PRE) in predicting y using the
prediction equation (taking x into account)
rather than the mean of y
 R2 = (TSS – SSE)/TSS
 This is the explained variation in y
TSS = Total variability around the mean of y
 SSE = Residual sum of squares or error
 This
is the unexplained variability
 This
is the regression sum of squares
 The explained variability in y
F Statistic and p-value
This is an ANOVA table
 F is the ratio of the regression mean
square (RSS/df) and the residual (error)
mean square (SSE/df)
 The larger the F, the smaller the p-value
 Very small p-value (<.01 or .001) is strong
evidence for the significance of the model
Slope (b), β, t-statistic and p-value
Slope is measured in actual units of variables.
Change in y for 1 unit of x
In multiple regression, each slope is controlled
for all other x variables
β is standardized slope – can compare strength
t = b/se with df= n-(k+1), note: k = # of predictors
Small p-value indicates significant relationship
with y, controlling for other variables in model
Note: in bivariate regression, t2 = F and β = r
Simpson’s Paradox (Figure 3)
Indicates a spurious relationship
 See printouts in Figure 1
 Indicated by change in the sign of partial
 Can also check the partial regression plots
(ask for all partial plots under Plots)
Multicollinearity (Figure 1 and 2)
Two independent variables in the model, i.e. x1 and
x2, are correlated with y but also highly correlated
(>.700) with each other
Both are explaining the same proportion of variation
in y but adding x2 to the model does not increase
explanatory value (R, R2)
Check correlation between IV’s in correlation matrix.
Ask for and check partial correlations in multiple
regression (Part and Partial under Statistics)
If partial correlation in multiple model much lower
than bivariate correlation, multicollinearity indicated
A Few Tips for SPSS Mini 6
Review powerpoint for Lectures 8 and 9
Read assignment over carefully before starting.
When creating your model, build your model
carefully one block at a time.
Watch for spurious relationships. Revise model if
Drop any unnecessary variables (i.e. evidence
of multicollinearity or new variables that do not
appreciably increase R2.) Keep your model
simple. Aim for good explanatory value with the
least variables possible.