Long-Term Correlates of Family Foster Care
Download
Report
Transcript Long-Term Correlates of Family Foster Care
Binary Logistic Regression
“To be or not to be, that is the
question..”(William Shakespeare, “Hamlet”)
Binary Logistic Regression
Also known as “logistic” or sometimes
“logit” regression
Foundation from which more complex
models derived
e.g., multinomial regression and ordinal logistic
regression
Dichotomous Variables
Two categories indicating whether an event
has occurred or some characteristic is
present
Sometimes called “binary” or “binomial”
variables
Dichotomous DVs
Placed in foster care or not
Diagnosed with a disease or not
Abused or not
Pregnant or not
Service provided or not
Single (Dichotomous) IV Example
DV = continue fostering, 0 = no, 1 = yes
Customary to code category of interest 1 and
the other category 0
IV = married, 0 = not married, 1 = married
N = 131 foster families
Are two-parent families more likely to
continue fostering than one-parent
families?
Crosstabulation
Table 2.1
Relationship between marital status and
continuation is statistically significant
[2(1, N = 131) = 5.65, p = .017]
A higher percentage of two-parent families
(62.20%) than single-parent families
(40.82%) planned to continue fostering
Strength & Direction of
Relationships
Different ways to quantify the relationship
between IV(s) and DV
Probabilities
Odds
Odds Ratio (OR)
• Also abbreviated as eB, Exp(B) (on SPSS output), or
exp(B)
% change
Roadmap to Computations
Probabilities
Odds
p/1-p
Odds Ratios
Odds(1) / Odds(0)
% change
100(OR - 1)
Probabilities
Percentages in Table 2.1 as probabilities
(e.g., 62.20% as .6220)
p
• Probability that event will occur (continue)
• e.g., probability that one-parent families plan to
continue is .4082
1 – p
• Probability that event will not occur (not continue)
• e.g., probability that one-parent families do not plan
to continue is .5918 (1 - .4082)
Odds
Ratio of probability that event will occur to
probability that it will not
e.g., odds of continuation for one-parent
families are .69 (.4082 / .5918)
p
odds
1 p
Can range from 0 to positive infinity
Probabilities and Odds
Table 2.2
Odds = 1
Both outcomes equally likely
Odds > 1
Probability that event will occur greater than
probability that it will not
Odds < 1
Probability that event will occur less than
probability that it will not
Odds Ratio (OR)
Odds of the event for one value of the IV
(two-parent families) divided by the odds
for a different value of the IV, usually a
value one unit lower (one-parent families)
e.g., odds of continuing for two-parent
families more than double the odds for
one-parent families
OR = 1.6455 / .6898 = 2.39
OR (cont’d)
Plays a central role in quantifying the
strength and direction of relationships
between IVs and DVs in binary, multinomial,
and ordinal logistic regression
OR < 1 indicates a negative relationship
OR > 1 indicates a positive relationship
OR = 1 indicates no linear relationship
ORs > 1
e.g., OR of 2.39
A one-unit increase in the independent variable
increases the odds of continuing by a factor of
2.39
The odds of continuing are 2.39 times higher
for two-parent compared to one-parent families
ORs < 1
e.g., OR = .50
A one-unit increase in the independent variable
decreases the odds of continuing by a factor of
.50
The odds that two-parent families will continue
are .50 (or one-half) of the odds that oneparent families will continue
ORs < 1 (cont’d)
Compute reciprocal (i.e., 1 / .50 = 2.00)
Express relationship as opposite event of
interest (e.g., discontinuing)
A one-unit increase in the independent variable
increases the odds of discontinuing by a factor
of 2.00
The odds that two-parent families will
discontinue are 2.00 times (or twice) the odds
of one-parent families
OR to Percentage Change
% change = 100(OR – 1)
Alternative way to express OR
e.g., A one-unit increase in the independent
variable increases the odds of continuing by
139.00%
• 100(2.39 – 1) = 139.00
e.g., A one-unit increase in the independent
variable decreases the odds of continuing by
50.00%
• 100(.50 – 1) = -50.00
Comparing OR > 1 and OR < 1
Compute reciprocal of one of the ORs
e.g., OR of 2.00 and an OR of .50
Reciprocal of .50 is 2.00 (1 / .50 = 2.00)
ORs are equal in size (but not in direction of
the relationship)
Qualitative Descriptors for OR
Table 2.3
Use cautiously with IVs that aren’t
dichotomous
Question & Answer
Are two-parent families more likely to
continue fostering than one-parent
families?
Yes. The odds of continuing are 2.39 times
(139%) higher for two-parent compared to oneparent families. The probability of continuing is
.41 for one-parent families and .62 for twoparent families.
Binary Logistic Regression
Example
DV = continue fostering, 0 = no, 1 = yes
Customary to code category of interest 1 and
the other category 0
IV = married, 0 = not married, 1 = married
N = 131 foster families
Are two-parent families more likely to
continue fostering than one-parent
families?
Statistical Significance
Table 2.4
Relationship between marital status and
continuation is statistically significant (Wald 2
= 5.544, p = .019)
Direction of Relationship
B = slope
Positive slope, positive relationship
• OR > 1
Negative slope, negative relationship
• OR < 1
0 slope, no linear relationship
• OR = 1
Direction/Strength of
Relationship
Positive relationship between marital
status and continuation
Two-parent families more likely to continue
B = .869
Exp(B) = OR = 2.385
• % change = 100(2.385 - 1) = 139%
The odds of continuing are 2.39 times (139%)
higher for two-parent compared to one-parent
families
Roadmap to Computations
Logits
ln(p / 1 – p) =
L short for ln(p / 1 – p)
Odds
eL
Odds Ratios
Odds(1) / Odds(0)
% change
100(OR - 1)
Probabilities
eL / (1 + eL)
Binary Logistic Regression Model
ln(π/ (1 - π)) = α + 1X1 + 1X2 + … kXk, or
ln(π / (1 - π)) =
π is the probability of the event
(eta) is the abbreviation for the linear
predictor (right hand side of this equation)
k = number of independent variables
Logit Link
ln(π / (1 - π))
Log of the odds that the DV equals 1 (event
occurs)
Connects (i.e., links) DV to linear combination of
IVs
Estimated Logits (L)
ln(p / 1 - p) = a + B1X1 + B1X2 + … BkXk
ln(p / 1 – p)
Log of the odds that the DV equals 1 (event
occurs)
Estimated logit, L
Does not have intuitive or substantive meaning
Useful for examining curvilinear relationships
and interaction effects
Primarily useful for estimating probabilities,
odds, and ORs
Estimated Logits (L)
L(Continue) = a + BMarriedXMarried
L(Continue) = -.372 + (.869)(XMarried)
a = intercept
B = slope
Logit to Odds
If L = 0:
Odds = eL = e0 = 1.00
If L = .50:
Odds = eL = e.50 = 1.65
If L = 1.00:
Odds = eL = e1.00 = 2.72
Logits to Odds (cont’d)
Table 2.4
One-parent families
• L(Continue) = -.372 = -.372 + (.869)(0)
• Odds of continuing = e-.372 = .69
Two-parent families
• L(Continue) = .497 = -.372 + (.869)(1)
• Odds of continuing = e.497 = 1.65
Odds to OR
OR = 1.65 / .69 = 2.39, or
e.869 = 2.39, labeled Exp(B)
Table 2.4
OR to Percentage Change
% change = 100(OR – 1)
e.g., A one-unit increase in the independent
variable increases the odds of continuing by
139.00%
• 100(2.39 – 1) = 139.00
e.g., A one-unit increase in the independent
variable decreases the odds of continuing by
50.00%
• 100(.50 – 1) = -50.00
Logits to Probabilities
p( Continue )
One-parent families, L(Continue) = -.372
p( Continue )
eL
eL
e .
.
.
.
e
.
Two-parent families, L(Continue) = .497
p( Continue )
e .
.
.
.
e
.
Question & Answer
Are two-parent families more likely to
continue fostering than one-parent
families?
Yes. The odds of continuing are 2.39 times
(139%) higher for two-parent compared to oneparent families. The probability of continuing is
.41 for one-parent families and .62 for twoparent families.
Single (Quantitative) IV Example
DV = continue fostering, 0 = no, 1 = yes
Customary to code category of interest 1 and
other category 0
IV = number of resources
N = 131 foster families
Are foster families with more resources
more likely to continue fostering?
Statistical Significance
Table 2.5
Relationship between resources and
continuation is statistically significant (Wald 2
= 4.924, p = .026)
H0: = 0, 0, ≤ 0, same as
H0: OR = 1, OR 1, OR ≤ 1
Likelihood ratio 2 better than Wald
Direction/Strength of
Relationship
Positive relationship between resources
and continuation
Families with more resources are more likely to
continue
B = .212
Exp(B) = OR = 1.237
• % change = 100(1.237 – 1) = 24%
The odds of continuing are 1.24 times (24%)
higher for each additional resource
Estimated Logits
L(Continue) = -1.227 + (.212)(X)
Figures
Resources.xls
Effect of Resources on
Continuation (Logits)
1.50
1.00
Logits
0.50
0.00
-0.50
-1.00
-1.50
1
Logits -1.01
2
3
4
5
6
7
8
9
10
11
-0.80
-0.59
-0.38
-0.16
0.05
0.26
0.47
0.68
0.90
1.11
Resources
Effect of Resources on
Continuation (Odds)
3.50
3.00
Odds
2.50
2.00
1.50
1.00
0.50
0.00
Odds
1
2
3
4
5
6
7
8
9
10
11
0.36
0.45
0.55
0.69
0.85
1.05
1.30
1.60
1.98
2.45
3.03
Resources
Effect of Resources on
Continuation (Probabilities)
0.80
0.70
Probabilities
0.60
0.50
0.40
0.30
0.20
0.10
0.00
1
Probabilities 0.27
2
3
4
5
6
7
8
9
10
11
0.31
0.36
0.41
0.46
0.51
0.56
0.62
0.66
0.71
0.75
Resources
Question & Answer
Are foster families with more resources
more likely to continue fostering?
Yes. The odds of continuing are 1.24 times
(24%) higher for each additional resource. The
probability of continuing is .31 for families with
two resources, .51 for families with 6
resources, and .71 for families with 10
resources.
Relationship of Linear Predictor
to Logits, Odds & p
Relationship between linear predictor and
logits is linear
Relationship between linear predictor and
odds is non-linear
Relationship between linear predictor and p
is non-linear
Challenge is to summarize changes in odds and
probabilities associated with changes in IVs in
the most meaningful and parsimonious way
Logit as Function of Linear
Predictor
Logit
3.00
2.00
1.00
.00
-1.00
-2.00
-3.00
-3.00
-2.00
-1.00
.00
1.00
Linear Predictor
2.00
3.00
Odds
Odds as Function of Linear
Predictor
21.00
18.00
15.00
12.00
9.00
6.00
3.00
.00
-3.00
-2.00
-1.00
.00
1.00
Linear Predictor
2.00
3.00
Probability
Probabilities as Function of
Linear Predictor
1.00
.90
.80
.70
.60
.50
.40
.30
.20
.10
.00
-3.00
-2.00
-1.00
.00
1.00
Linear Predictor
2.00
3.00
IVs to z-scores
z-scores (standard scores)
Only the IV (not DV)--semi-standardized slopes
One-unit increase in the IV refers to a onestandard-deviation increase
OR interpreted as expected change in the odds
associated with a one standard deviation
increase in the IV
Conversion to z-scores changes intercept, slope,
and OR, but not associated test statistics
Table 2.6 (compare to Table 2.5)
Figures
zResources.xls
Probabilities
Effect of zResources on
Continuation (Probabilities)
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
Probabilities
-3
-2
-1
0
1
2
3
0.26
0.34
0.44
0.54
0.64
0.73
0.80
Standardized Resources
Question & Answer
Are foster families with more resources
more likely to continue fostering?
Yes. The odds of continuing are 1.51 times
(51%) higher for each one standard deviation
(1.93) increase in resources. The probability of
continuing is .34 for families with resources
two standard deviations below the mean, .54 for
families with the mean number of resources
(6.60), and .73 for families with resources two
standard deviations above the mean.
IVs Centered
Centering
Typically center on mean
Useful when testing interactions, curvilinear
relationships, or when no meaningful 0 point
(e.g., no family with 0 resources)
Centering doesn’t change slope, OR, or
associated test statistics, but does change the
intercept
Table 2.7 (compare to Table 2.5)
Figures
cResources.xls
Probabilities
Effect of cResources on
Continuation (Probabilities)
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
Probabilities
-5
-4
-3
-2
-1
0
1
2
3
4
5
0.29
0.34
0.39
0.44
0.49
0.54
0.60
0.65
0.69
0.74
0.77
Centered Resources
Question & Answer
Are foster families with more resources
more likely to continue fostering?
Yes. The odds of continuing are 1.24 times
(24%) higher for each additional resource. The
probability of continuing is .34 for families with
4 resources below the mean, .54 for families
with the mean number of resources (6.60), and
.74 for families with 4 resources above the
mean.
Multiple IV Example
DV = continue fostering, 0 = no, 1 = yes
IV = married, 0 = not married, 1 = married
IV = number of resources (z-scores)
N = 131 foster families
Customary to code the category of interest as 1
and the other category as 0
Are foster families with more resources
more likely to continue fostering,
controlling for marital status?
Statistical Significance
Table 2.12
Relationship between set of IVs and
continuation is statistically significant (2 =
6.58, p = .037)
H0: 1 = 2 = k = 0, same as
H0: 1 = 2 = k = 1
• (psi) is symbol for population value of OR
Statistical Significance (cont’d)
Table 2.13
Relationship between resources and
continuation is not statistically significant,
controlling for marital status (2 = .92, p = .338)
Relationship between marital status and
continuation is not statistically significant,
controlling for resources (2 = 1.42, p = .234)
H0: = 0, 0, ≤ 0, same as
H0: = 1, 1, ≤ 1
• (psi) is symbol for population value of OR
Likelihood ratio 2 better than Wald
Statistical Significance (cont’d)
Table 2.9
Relationship between resources and
continuation is not statistically significant,
controlling for marital status (2 = .91, p = .340)
Relationship between marital status and
continuation is not statistically significant,
controlling for resources (2 = 1.41, p = .235)
H0: = 0, 0, ≤ 0, same as
H0: = 1, 1, ≤ 1
• (psi) is symbol for population value of OR
Wald 2, but likelihood ratio 2 better
Estimated Logits
L(Continue) = -.183 + (.228)(XzResources) + (.570)(XMarried)
ORs & Percentage Change
ORzResources = 1.256 (ns)
The odds of continuing are 1.26 times (26%)
higher for each one standard deviation (1.93)
increase in resources, controlling for marital
status
ORMarried = 1.769 (ns)
The odds of continuing are 1.77 times (77%)
higher for two-parent compared to one-parent
families, controlling for marital status
Figures
Married & zResources.xls
Effect of Resources and Marital
Status on Plans to Continue
Fostering (Odds)
3.50
3.00
Odds
2.50
2.00
1.50
1.00
0.50
0.00
-3
-2
-1
0
1
2
3
One-Parent
0.42
0.53
0.66
0.83
1.05
1.31
1.65
Two-Parent
0.74
0.93
1.17
1.47
1.85
2.32
2.92
Standardized Resources
Probabilities
Effect of Resources and Marital
Status on Plans to Continue
Fostering (Probabilities)
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
-3
-2
-1
0
1
2
3
One-Parent
0.30
0.35
0.40
0.45
0.51
0.57
0.62
Two-Parent
0.43
0.48
0.54
0.60
0.65
0.70
0.74
Standardized Resources
Presenting Odds and Probabilities
in Tables
Tables 2.10 and 2.11
Question & Answer
Are foster families with more resources
more likely to continue fostering,
controlling for marital status?
No (ns). The odds of continuing are 1.26 times
(26%) higher for each one standard deviation
(1.93) increase in resources, controlling for
marital status.
Cont’d
Question & Answer (cont’d)
For one-parent families the probability of
continuing is .35 for families with resources
two standard deviations below the mean, .45 for
families with the mean number of resources,
and .57 for families with resources two
standard deviations above the mean. For twoparent families the probability of continuing is
.48 for families with resources two standard
deviations below the mean, .60 for families with
the mean number of resources, and .70 for
families with resources two standard deviations
above the mean.
Comparing the Relative Strength
of IVs
Size of slope and OR depend on how the IV
is measured
When IVs measured the same way (e.g., two
dichotomous IVs or two continuous IVs
transformed to z-scores) relative strength can
be compared
Nothing comparable to standardized slope
(Beta)
Nested Models
IV1, IV2, IV3
IV1, IV2
IV1
IV2, IV3
IV2
IV2
IV1, IV3
IV3
IV1
IV3
Nested Models (cont’d)
One regression model is nested within another if it
contains a subset of variables included in the
model within which it’s nested, and same cases are
analyzed in both models
The more complex model called the “full model”
The nested model called the “reduced model.”
Comparison of full and reduced models allows you
to examine whether one or more variable(s) in the
full model contribute to explanation of the DV
Sequential Entry of IVs
Used to compare full and reduced models
e.g., family resources entered first, and then
marital status
Fchange used in linear regression
Sequential Entry of IVs (cont’d)
SPSS GZLM doesn’t allow sequential of IVs
Estimate models separately and compare
omnibus likelihood ratio 2 values
Reduced model 2(1) = 5.168
Full model 2(2) = 6.585
2 difference = 6.585 – 5.168 = 1.417
df difference = 2 – 1
p = .234
Chi-square Difference.xls
Assumptions Necessary for
Testing Hypotheses
No assumptions unique to binary logistic
regression other than ones discussed in
GZLM lecture
Model Evaluation
Evaluate your model before you test
hypotheses or interpret substantive
results
Outliers
Analogs of R2
Outliers
Atypical cases
Can lead to flawed conclusions
Can provide theoretical insights
Common causes
Data entry errors
Model misspecification
Rare events
Outliers (cont’d)
Leverage
Residuals
Standardized or unstandardized deviance
residuals
Influence
Cook’s D
Leverage
Think of a seesaw
Leverage value for each case
Cases with greater leverage can exert a
disproportionately large influence
Leverage value for each case
No clear benchmarks
Identify cases with substantially different
leverage values than those of other cases
Residuals
Difference between actual and estimated
values of the DV for a case
Residual for each case
Large residual indicates a case for which
model fits poorly
Residuals (cont’d)
Standardized or unstandardized deviance
residuals
Not normally distributed
Values less than -2 or greater than +2 warrant
some concern
Values less than -3 or greater than +3 merit
close inspection
Influence
Cases whose deletion result in substantial
changes to regression coefficients
Cook’s D for each case
Approximate aggregate change in regression
parameters resulting from deletion of a case
Values of 1.0 or more indicate a problematic
degree of influence for an individual case
Index Plot
Scatterplot
Horizontal axis (X)
• Case id
Vertical axis (Y)
• Leverage values, or
• Residuals, or
• Cook’s D
Index Plot: Leverage Values
Index Plot: Standardized
Deviance Residuals
Index Plot: Cook’s D
Analogs of R2
None in standard use and each may give
different results
Typically much smaller than R2 values in
linear regression
Difficult to interpret
Multicollinearity
SPSS GZLM doesn’t compute
multicollinearity statistics
Use SPSS linear regression
Problematic levels
Tolerance < .10 or
VIF > 10
Additional Topics
Polychotomous IVs
Curvilinear relationships
Interactions
Overview of the Process
Select IVs and decide whether to test
curvilinear relationships or interactions
Carefully screen and clean data
Transform and code variables as needed
Estimate regression model
Examine assumptions necessary to
estimate binary regression model, examine
model fit, and revise model as needed
Overview of the Process (cont’d)
Test hypotheses about the overall model
and specific model parameters, such as
ORs
Create tables and graphs to present
results in the most meaningful and
parsimonious way
Interpret results of the estimated model
in terms of logits, probabilities, odds, and
odds ratios, as appropriate
Additional Regression Models for
Dichotomous DVs
Binary probit regression
Substantive results essentially indistinguishable
from binary logistic regression
Choice between this and binary logistic
regression largely one of convenience and
discipline-specific convention
Many researchers prefer binary logistic
regression because it provides odds ratios
whereas probit regression does not, and binary
logistic regression comes with a wider variety
of fit statistics
Additional Regression Models for
Dichotomous DVs (cont’d)
Complementary log-log (clog-log) and loglog models
Probability of the event is very small or large
Loglinear regression
Limited to categorical IVs
Discriminant analysis
Limited to continuous IVs