Transcript July 28

Logistic Regression
July 28, 2008
Ivan Katchanovski, Ph.D.
POL 242Y-Y
Binary Logistic Regression
• Appropriate when the dependent variable is a dummy
variable
– Dummy variable: a variable that includes two categories
which assume values 1 and 0
– Example: “Conservative party supporter”: Yes=1; No=0
– Binary: two values
• One or many independent variables
• Assumes non-linear relationship
2
Regression Coefficients and Odds
Ratio
• Regression coefficients:
– Interpretation is similar to interpretation of unstandardized
regression coefficients in linear regression
• Effect of a change of one unit of an independent variable on the
logged odds of the dependent variable
• Logged odds are not very easy to grasp
• Odds Ratio:
– Effect of a change of one unit of an independent variable
on the change in the odds of the dependent variable
•
•
•
•
Better to grasp
If odds ratio more than 1: positive relationship
If odds ratio less than 1: negative relationship
If odds ratio equal to1: no relationship
3
Statistical Significance
• Statistical significance of a regression
coefficient:
– Statistically significant if
p(obtained)<p(critical)=.05 or .01 or .001
– Statistically nonsignificant if
p(obtained)>p(critical)=.05
• Direction of association should be reported
only for statistically significant regression
coefficients
4
Pseudo R Square
• R Square analogs in logistic regression
– Power of independent variables in predicting the dependent
variable
• Cox & Snell R square
– Ranges between 0 (no association) and less than 1
(perfect association)
• Nagelkerke R square
– Adjusts Cox & Snell R square so that its
maximum value can equal 1
– Ranges between 0 (no association) and 1 (perfect
association)
5
Example: Multiple Research
Hypotheses
• First : The level of economic development has a
positive effect on the odds that countries are
democratic
• Second: Former British colonies are more likely to
be democratic compared to other countries
• Third : Protestant countries are more likely to be
democratic compared to other countries
• Fourth: Ethnic and linguistic homogeneity has a
positive effect on the odds of countries being
democratic
6
Example: Variables
• Dataset: World
• Dependent Variable:
– Democracy (Is country democratic?)
• Dummy variable
• Independent Variables:
– GDP per capita ($1000)
• Interval-ratio
– Ethno-linguistic heterogeneity
• Ordinal treated as interval-ratio
– Colony variable
• Transformed into dummy variables
– Religious culture variable
• Transformed into dummy variables
7
Binary Logistic Regression: SPSS
Commands
• SPSS Command: Analyze-Regression-Binary
Logistic
• “Dependent” box: Select the dependent
variable
• “Covariates” box: Select independent variables
• Method: “Enter”
8
Table: Determinants of democracy
Regression coefficients B
Odds ratio Exp(B)
(Standard error)
GDP per cap ($1000)
.336***
1.399
(.105)
French colony
-1.619
.198
(1.148)
Spanish colony
.433
1.542
(.823)
Other country
1.026
2.789
(1.035)
Catholic
.389
1.476
(1.218)
Muslim
-1.091
.336
(1.253)
Other religion
-.194
.824
(1.149)
Ethno-linguistic heterogeneity
-.349
.705
(.434)
Constant
-.633
.531
(1.557)
Nagelkerke R square
.645
N
92
*** Statistically significant at the .01 level, ** statistically significant at the .05 level, *
statistically significant at the .1 level
9
Example: Statistical Significance
• Number of cases: N=92
• .1 or 10% significance level can be used
• Regression coefficient of the GDP variable:
• SPSS: p(obtained)=.001 <p(critical)=.01=1%
• Statistically significant at the .01 or 1% level
• Regression coefficients of the other independent variables:
• SPSS: p(obtained)=from .159 to .866 >p(critical)=.1
• Statistically insignificant
10
Example: Regression Coefficients and
Odds Ratio
• Regression Coefficient of GDP per capita
variable=.336
• Increase of $1000 in the level of GDP per capita
increases the logged odds of country being
democratic by .336
• Odds ratio of GDP per capita:
• Increase of $1000 in the level of GDP per capita
increases the odds of country being democratic by
about 1.4 times
11
Example: Interpretation
• Nagelkerke R square=.645
• The logistic regression model has a strong predictive power
• The first research hypothesis is supported by logistic
regression analysis
• The level of economic development has a positive and
statistically significant effect on the odds of countries being
democracies
• All other research hypotheses are not supported by
logistic regression analysis
12