#### Transcript Data Analysis - University of Western Ontario

Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients Assumptions for Multiple Regression Random sample Distribution of y is relatively normal Check histogram Standard deviation of y is constant for each value of x Check scatterplots (Figure 1) Problems to Watch For… Violation of assumptions, especially normality of DV and heteroscedasticity (Figure 1) Simpson’s Paradox Multicollinearity Building a Model in SPSS (Figure 2) Should be driven by your theory You can add your variables on at a time, checking at each step whether there is significant improvement in the explanatory power of the model. Use Method=Enter. In Block 1, enter your main IV. Under Statistics, ask for R2 change. Click next, and enter additional IV. Check the Change Statistics in the Model Summary watch changes in R2 and coefficients (esp. partial correlations) carefully. Multiple Correlation R (Figure 1) Measures correlation of all IV’s with DV Is the correlation of y values with the predicted y values Always positive (between 0 and +1) Coefficient of Determination R2 Measures the proportional reduction in error (PRE) in predicting y using the prediction equation (taking x into account) rather than the mean of y R2 = (TSS – SSE)/TSS This is the explained variation in y TSS, SSE and RSS TSS = Total variability around the mean of y SSE = Residual sum of squares or error This is the unexplained variability RSS = TSS – SSE This is the regression sum of squares The explained variability in y F Statistic and p-value This is an ANOVA table F is the ratio of the regression mean square (RSS/df) and the residual (error) mean square (SSE/df) The larger the F, the smaller the p-value Very small p-value (<.01 or .001) is strong evidence for the significance of the model Slope (b), β, t-statistic and p-value Slope is measured in actual units of variables. Change in y for 1 unit of x In multiple regression, each slope is controlled for all other x variables β is standardized slope – can compare strength t = b/se with df= n-(k+1), note: k = # of predictors Small p-value indicates significant relationship with y, controlling for other variables in model Note: in bivariate regression, t2 = F and β = r Simpson’s Paradox (Figure 3) Indicates a spurious relationship See printouts in Figure 1 Indicated by change in the sign of partial correlations Can also check the partial regression plots (ask for all partial plots under Plots) Multicollinearity (Figure 1 and 2) Two independent variables in the model, i.e. x1 and x2, are correlated with y but also highly correlated (>.700) with each other Both are explaining the same proportion of variation in y but adding x2 to the model does not increase explanatory value (R, R2) Check correlation between IV’s in correlation matrix. Ask for and check partial correlations in multiple regression (Part and Partial under Statistics) If partial correlation in multiple model much lower than bivariate correlation, multicollinearity indicated A Few Tips for SPSS Mini 6 Review powerpoint for Lectures 8 and 9 Read assignment over carefully before starting. When creating your model, build your model carefully one block at a time. Watch for spurious relationships. Revise model if needed. Drop any unnecessary variables (i.e. evidence of multicollinearity or new variables that do not appreciably increase R2.) Keep your model simple. Aim for good explanatory value with the least variables possible.