Transcript Session 10
Education 795 Class Notes
Applied Research
Logistic Regression
Note set 10
Today’s Agenda
Announcements (ours and yours)
Q/A
Applied Research
Logistic regression
Pure vs. Applied Research
Pure research
‘Pure research is that type of research which is
directed towards increase of knowledge in
science… where the primary aim is a fuller
understanding of the subject under study rather
than the application thereof’ (NSF, 1959)
Applied research
‘Research carried out for the purpose of solving
practical problems’ (Pedhazur & Pedhazur, 1991)
What do You Think?
The pure researcher believes the applied
scientists are not creative, that applied work
attracts only mediocre men/women, and that
applied research is like working from a
cookbook.
The applied researcher believes the pure
scientist to be a snob, working in his/her ivory
tower and afraid to put his/her findings to a
real test… like Bacon’s spider, spinning webs
out off his substance.
(Storer, 1966, p. 108)
Sociobehavioral Research and
Policy Advocacy
Another endless debate over whether
scientists should limit the presentation
of their findings or also act as
advocates for policies they presumably
support…
Poem for the Day
Thou shalt not answer questionnaires
Or quizzes upon World-Affairs,
Nor with compliance
Take any test. Thou shalt not sit
With statisticians nor commit
A social science
(Auden, 1950, p. 69)
We Turn Now to
Logistic Regression
When / why do we use logistic regression?
Theory behind logistic regression
Running logistic regression on SPSS
Interpreting logistic regression analysis
When and Why?
To test predictors when the outcome
variable is a categorical dichotomy
(yes/no, pass/fail, survive/die)
Used because having a categorical
dichotomy as an outcome variable
violates the assumption of linearity and
homogeneity in normal regression
Dichotomous Outcomes
Aldrich & Nelson present two solutions
to the violation of the linear regression
assumptions for dichotomous outcomes
Linear Probability Models
Weighted Least Squares (which we will not
cover)
Nonlinear Probability Models
Logit (Most commonly used)
Probit (which we will not cover)
Nonlinear Probability Model
Log (P/(1-P)=b0+b1X1+…+bnXn
Let’s look at the left hand side
P=probability of success (defined by the
researcher, e.g. pass, graduate, survive)
1-P=probability of failure (1-probability of success)
This ratio demands that the estimated coefficients
remain positive
Taking the Logarithm of this ratio restricts the
range to be from 0 to 1
The left hand side is commonly referred to as the
“logit” or the log(odds)
Odds Ratio
P/(1-P) is called an odds.
Simple Example: Hat with 5 red chips and 10
green chips. You win if you pull a red chip.
Probability of winning is 1/3
Probability of not winning is 2/3
Odds of winning = 1/3 / 2/3 = ½ or 1:2
In other words, your odds of winning are
1 to 2 (there are 2 green chips for every 1 red
chip, in lay terms, you are less likely to win
then to lose)
Odds Ratio
Let’s reverse the example
Simple Example: Hat with 10 red chips and 5
green chips. You win if you pull a red chip.
Probability of winning is 2/3
Probability of not winning is 1/3
Odds of winning = 2/3 / 1/3 = 2 or 2:1
In other words, your odds of winning are
2 to 1 (there are 1 green chips for every 2 red
chip, in lay terms, you are more likely to win
then to lose)
Morale of the Story
For Odds Ratios Less than 1, success
is less likely
For Odds Ratios Greater than 1,
success is more likely
Logistic Function
Continuous
Smooth S-shaped curve
Takes on values between 0<=p<=1
Increases monotonically
Symmetric around 0
Let’s take a look
The S-Shaped Curve
P(Y=1)
Solving for P
Single predictor
Multiple predictors
Assumptions of Nonlinear Model
Random sample
Independent observations
X’s are independent (minimal
collinearity among the predictors)
Estimation
Technique called maximum likelihood
estimation is used in logit models
Interpretation Issues
Unlike the coefficients in a linear regression
model, logistic regression results cannot be
interpreted as the rate of change in the
expected value the dependent variable, but
the change in the probability of Y = 1 for any
particular X
The rate of change in the probability of Y = 1
is dependent upon the value of X (and all
other Xs, if there are any in the analysis)
Interpreting Coefficients
The output from SPSS will include the b’s
(the estimated log odds) and the exp(b) the
odds.
How do we interpret the odds?
If exp(b)=1.6, then a one unit increase in X predicts
a 60% increase in the odds of success.
If exp(b)=2.6 then a one unit increase in X predicts
a 160% increase in the odds of success
If exp(b)=.7 then a one unit increase in X predicts a
30% decrease in the odds of success
A Simple Example
Many natural applications of logistic
regression are from medicine
Understanding Coronary Heart Disease
(CHD):
100 Cases
2 Variables
Age (measured in years)
Evidence of CHD (1 = Yes, 0 = No)
Sample Output
Sample Output 2
Output 3
Exp(B) indicates the change in
odds resulting from a unit change
in the predictor:
Exp(B) > 1: as X Probability
Exp(B) < 1: as X Probability
•Wald is similar to t-statistic
•Tests the null b = 0
In Class Project
Run a logistic regression with one
predictor.
Interpret the odds ratio
For Next Week
Read and Discuss
The Status of Women and Minorities
Among Community College Faculty.
Perna. L.W. (2003) Research in Higher
Education 44(2) p. 204-240