Long-Term Correlates of Family Foster Care

Download Report

Transcript Long-Term Correlates of Family Foster Care

Binary Logistic Regression
“To be or not to be, that is the
question..”(William Shakespeare, “Hamlet”)
Binary Logistic Regression

Also known as “logistic” or sometimes
“logit” regression

Foundation from which more complex
models derived
 e.g., multinomial regression and ordinal logistic
regression
Dichotomous Variables

Two categories indicating whether an event
has occurred or some characteristic is
present

Sometimes called “binary” or “binomial”
variables
Dichotomous DVs





Placed in foster care or not
Diagnosed with a disease or not
Abused or not
Pregnant or not
Service provided or not
Single (Dichotomous) IV Example

DV = continue fostering, 0 = no, 1 = yes
 Customary to code category of interest 1 and
the other category 0
IV = married, 0 = not married, 1 = married
 N = 131 foster families


Are two-parent families more likely to
continue fostering than one-parent
families?
Crosstabulation

Table 2.1

Relationship between marital status and
continuation is statistically significant
[2(1, N = 131) = 5.65, p = .017]

A higher percentage of two-parent families
(62.20%) than single-parent families
(40.82%) planned to continue fostering
Strength & Direction of
Relationships

Different ways to quantify the relationship
between IV(s) and DV
 Probabilities
 Odds
 Odds Ratio (OR)
• Also abbreviated as eB, Exp(B) (on SPSS output), or
exp(B)
 % change
Roadmap to Computations
Probabilities
Odds
p/1-p
Odds Ratios
Odds(1) / Odds(0)
% change
100(OR - 1)
Probabilities

Percentages in Table 2.1 as probabilities
(e.g., 62.20% as .6220)
p
• Probability that event will occur (continue)
• e.g., probability that one-parent families plan to
continue is .4082
1 – p
• Probability that event will not occur (not continue)
• e.g., probability that one-parent families do not plan
to continue is .5918 (1 - .4082)
Odds

Ratio of probability that event will occur to
probability that it will not
 e.g., odds of continuation for one-parent
families are .69 (.4082 / .5918)
p
odds 
1 p

Can range from 0 to positive infinity
Probabilities and Odds
Table 2.2
 Odds = 1

 Both outcomes equally likely

Odds > 1
 Probability that event will occur greater than
probability that it will not

Odds < 1
 Probability that event will occur less than
probability that it will not
Odds Ratio (OR)

Odds of the event for one value of the IV
(two-parent families) divided by the odds
for a different value of the IV, usually a
value one unit lower (one-parent families)

e.g., odds of continuing for two-parent
families more than double the odds for
one-parent families
 OR = 1.6455 / .6898 = 2.39
OR (cont’d)

Plays a central role in quantifying the
strength and direction of relationships
between IVs and DVs in binary, multinomial,
and ordinal logistic regression
 OR < 1 indicates a negative relationship
 OR > 1 indicates a positive relationship
 OR = 1 indicates no linear relationship
ORs > 1

e.g., OR of 2.39
 A one-unit increase in the independent variable
increases the odds of continuing by a factor of
2.39
 The odds of continuing are 2.39 times higher
for two-parent compared to one-parent families
ORs < 1

e.g., OR = .50
 A one-unit increase in the independent variable
decreases the odds of continuing by a factor of
.50
 The odds that two-parent families will continue
are .50 (or one-half) of the odds that oneparent families will continue
ORs < 1 (cont’d)


Compute reciprocal (i.e., 1 / .50 = 2.00)
Express relationship as opposite event of
interest (e.g., discontinuing)
 A one-unit increase in the independent variable
increases the odds of discontinuing by a factor
of 2.00
 The odds that two-parent families will
discontinue are 2.00 times (or twice) the odds
of one-parent families
OR to Percentage Change


% change = 100(OR – 1)
Alternative way to express OR
 e.g., A one-unit increase in the independent
variable increases the odds of continuing by
139.00%
• 100(2.39 – 1) = 139.00
 e.g., A one-unit increase in the independent
variable decreases the odds of continuing by
50.00%
• 100(.50 – 1) = -50.00
Comparing OR > 1 and OR < 1

Compute reciprocal of one of the ORs

e.g., OR of 2.00 and an OR of .50
 Reciprocal of .50 is 2.00 (1 / .50 = 2.00)
 ORs are equal in size (but not in direction of
the relationship)
Qualitative Descriptors for OR
Table 2.3
 Use cautiously with IVs that aren’t
dichotomous

Question & Answer

Are two-parent families more likely to
continue fostering than one-parent
families?
 Yes. The odds of continuing are 2.39 times
(139%) higher for two-parent compared to oneparent families. The probability of continuing is
.41 for one-parent families and .62 for twoparent families.
Binary Logistic Regression
Example

DV = continue fostering, 0 = no, 1 = yes
 Customary to code category of interest 1 and
the other category 0
IV = married, 0 = not married, 1 = married
 N = 131 foster families


Are two-parent families more likely to
continue fostering than one-parent
families?
Statistical Significance

Table 2.4
 Relationship between marital status and
continuation is statistically significant (Wald 2
= 5.544, p = .019)
Direction of Relationship

B = slope
Positive slope, positive relationship
• OR > 1
Negative slope, negative relationship
• OR < 1
0 slope, no linear relationship
• OR = 1
Direction/Strength of
Relationship

Positive relationship between marital
status and continuation
 Two-parent families more likely to continue
 B = .869
 Exp(B) = OR = 2.385
• % change = 100(2.385 - 1) = 139%
 The odds of continuing are 2.39 times (139%)
higher for two-parent compared to one-parent
families
Roadmap to Computations
Logits
ln(p / 1 – p) = 
L short for ln(p / 1 – p)
Odds
eL
Odds Ratios
Odds(1) / Odds(0)
% change
100(OR - 1)
Probabilities
eL / (1 + eL)
Binary Logistic Regression Model
ln(π/ (1 - π)) = α + 1X1 +  1X2 + …  kXk, or
 ln(π / (1 - π)) = 

π is the probability of the event
  (eta) is the abbreviation for the linear
predictor (right hand side of this equation)
 k = number of independent variables

Logit Link

ln(π / (1 - π))
 Log of the odds that the DV equals 1 (event
occurs)
 Connects (i.e., links) DV to linear combination of
IVs
Estimated Logits (L)
ln(p / 1 - p) = a + B1X1 + B1X2 + … BkXk

ln(p / 1 – p)
 Log of the odds that the DV equals 1 (event
occurs)
 Estimated logit, L
 Does not have intuitive or substantive meaning
 Useful for examining curvilinear relationships
and interaction effects
 Primarily useful for estimating probabilities,
odds, and ORs
Estimated Logits (L)
L(Continue) = a + BMarriedXMarried
L(Continue) = -.372 + (.869)(XMarried)
a = intercept
 B = slope

Logit to Odds

If L = 0:
Odds = eL = e0 = 1.00

If L = .50:
Odds = eL = e.50 = 1.65

If L = 1.00:
Odds = eL = e1.00 = 2.72
Logits to Odds (cont’d)

Table 2.4
 One-parent families
• L(Continue) = -.372 = -.372 + (.869)(0)
• Odds of continuing = e-.372 = .69
Two-parent families
• L(Continue) = .497 = -.372 + (.869)(1)
• Odds of continuing = e.497 = 1.65
Odds to OR

OR = 1.65 / .69 = 2.39, or

e.869 = 2.39, labeled Exp(B)
 Table 2.4
OR to Percentage Change

% change = 100(OR – 1)
 e.g., A one-unit increase in the independent
variable increases the odds of continuing by
139.00%
• 100(2.39 – 1) = 139.00
 e.g., A one-unit increase in the independent
variable decreases the odds of continuing by
50.00%
• 100(.50 – 1) = -50.00
Logits to Probabilities
p( Continue )

One-parent families, L(Continue) = -.372
p( Continue )

eL

 eL
e .
.


 .
 .
 e
.
Two-parent families, L(Continue) = .497
p( Continue )
e .
.


 .
.
 e
.
Question & Answer

Are two-parent families more likely to
continue fostering than one-parent
families?
 Yes. The odds of continuing are 2.39 times
(139%) higher for two-parent compared to oneparent families. The probability of continuing is
.41 for one-parent families and .62 for twoparent families.
Single (Quantitative) IV Example

DV = continue fostering, 0 = no, 1 = yes
 Customary to code category of interest 1 and
other category 0
IV = number of resources
 N = 131 foster families


Are foster families with more resources
more likely to continue fostering?
Statistical Significance

Table 2.5
 Relationship between resources and
continuation is statistically significant (Wald 2
= 4.924, p = .026)
 H0:  = 0,   0,  ≤ 0, same as
 H0: OR = 1, OR  1, OR ≤ 1
 Likelihood ratio 2 better than Wald
Direction/Strength of
Relationship

Positive relationship between resources
and continuation
 Families with more resources are more likely to
continue
 B = .212
 Exp(B) = OR = 1.237
• % change = 100(1.237 – 1) = 24%
 The odds of continuing are 1.24 times (24%)
higher for each additional resource
Estimated Logits
L(Continue) = -1.227 + (.212)(X)
Figures

Resources.xls
Effect of Resources on
Continuation (Logits)
1.50
1.00
Logits
0.50
0.00
-0.50
-1.00
-1.50
1
Logits -1.01
2
3
4
5
6
7
8
9
10
11
-0.80
-0.59
-0.38
-0.16
0.05
0.26
0.47
0.68
0.90
1.11
Resources
Effect of Resources on
Continuation (Odds)
3.50
3.00
Odds
2.50
2.00
1.50
1.00
0.50
0.00
Odds
1
2
3
4
5
6
7
8
9
10
11
0.36
0.45
0.55
0.69
0.85
1.05
1.30
1.60
1.98
2.45
3.03
Resources
Effect of Resources on
Continuation (Probabilities)
0.80
0.70
Probabilities
0.60
0.50
0.40
0.30
0.20
0.10
0.00
1
Probabilities 0.27
2
3
4
5
6
7
8
9
10
11
0.31
0.36
0.41
0.46
0.51
0.56
0.62
0.66
0.71
0.75
Resources
Question & Answer

Are foster families with more resources
more likely to continue fostering?
 Yes. The odds of continuing are 1.24 times
(24%) higher for each additional resource. The
probability of continuing is .31 for families with
two resources, .51 for families with 6
resources, and .71 for families with 10
resources.
Relationship of Linear Predictor
to Logits, Odds & p

Relationship between linear predictor and
logits is linear

Relationship between linear predictor and
odds is non-linear

Relationship between linear predictor and p
is non-linear
 Challenge is to summarize changes in odds and
probabilities associated with changes in IVs in
the most meaningful and parsimonious way
Logit as Function of Linear
Predictor
Logit
3.00
2.00
1.00
.00
-1.00
-2.00
-3.00
-3.00
-2.00
-1.00
.00
1.00
Linear Predictor
2.00
3.00
Odds
Odds as Function of Linear
Predictor
21.00
18.00
15.00
12.00
9.00
6.00
3.00
.00
-3.00
-2.00
-1.00
.00
1.00
Linear Predictor
2.00
3.00
Probability
Probabilities as Function of
Linear Predictor
1.00
.90
.80
.70
.60
.50
.40
.30
.20
.10
.00
-3.00
-2.00
-1.00
.00
1.00
Linear Predictor
2.00
3.00
IVs to z-scores

z-scores (standard scores)
 Only the IV (not DV)--semi-standardized slopes
 One-unit increase in the IV refers to a onestandard-deviation increase
 OR interpreted as expected change in the odds
associated with a one standard deviation
increase in the IV
 Conversion to z-scores changes intercept, slope,
and OR, but not associated test statistics
 Table 2.6 (compare to Table 2.5)
Figures

zResources.xls
Probabilities
Effect of zResources on
Continuation (Probabilities)
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
Probabilities
-3
-2
-1
0
1
2
3
0.26
0.34
0.44
0.54
0.64
0.73
0.80
Standardized Resources
Question & Answer

Are foster families with more resources
more likely to continue fostering?
 Yes. The odds of continuing are 1.51 times
(51%) higher for each one standard deviation
(1.93) increase in resources. The probability of
continuing is .34 for families with resources
two standard deviations below the mean, .54 for
families with the mean number of resources
(6.60), and .73 for families with resources two
standard deviations above the mean.
IVs Centered

Centering
 Typically center on mean
 Useful when testing interactions, curvilinear
relationships, or when no meaningful 0 point
(e.g., no family with 0 resources)
 Centering doesn’t change slope, OR, or
associated test statistics, but does change the
intercept
 Table 2.7 (compare to Table 2.5)
Figures

cResources.xls
Probabilities
Effect of cResources on
Continuation (Probabilities)
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
Probabilities
-5
-4
-3
-2
-1
0
1
2
3
4
5
0.29
0.34
0.39
0.44
0.49
0.54
0.60
0.65
0.69
0.74
0.77
Centered Resources
Question & Answer

Are foster families with more resources
more likely to continue fostering?
 Yes. The odds of continuing are 1.24 times
(24%) higher for each additional resource. The
probability of continuing is .34 for families with
4 resources below the mean, .54 for families
with the mean number of resources (6.60), and
.74 for families with 4 resources above the
mean.
Multiple IV Example

DV = continue fostering, 0 = no, 1 = yes



IV = married, 0 = not married, 1 = married
IV = number of resources (z-scores)
N = 131 foster families

 Customary to code the category of interest as 1
and the other category as 0
Are foster families with more resources
more likely to continue fostering,
controlling for marital status?
Statistical Significance

Table 2.12
 Relationship between set of IVs and
continuation is statistically significant (2 =
6.58, p = .037)
 H0: 1 = 2 = k = 0, same as
 H0: 1 = 2 = k = 1
•  (psi) is symbol for population value of OR
Statistical Significance (cont’d)

Table 2.13
 Relationship between resources and
continuation is not statistically significant,
controlling for marital status (2 = .92, p = .338)
 Relationship between marital status and
continuation is not statistically significant,
controlling for resources (2 = 1.42, p = .234)
 H0:  = 0,   0,  ≤ 0, same as
 H0:  = 1,   1,  ≤ 1
•  (psi) is symbol for population value of OR
 Likelihood ratio 2 better than Wald
Statistical Significance (cont’d)

Table 2.9
 Relationship between resources and
continuation is not statistically significant,
controlling for marital status (2 = .91, p = .340)
 Relationship between marital status and
continuation is not statistically significant,
controlling for resources (2 = 1.41, p = .235)
 H0:  = 0,   0,  ≤ 0, same as
 H0:  = 1,   1,  ≤ 1
•  (psi) is symbol for population value of OR
 Wald 2, but likelihood ratio 2 better
Estimated Logits
L(Continue) = -.183 + (.228)(XzResources) + (.570)(XMarried)
ORs & Percentage Change

ORzResources = 1.256 (ns)
 The odds of continuing are 1.26 times (26%)
higher for each one standard deviation (1.93)
increase in resources, controlling for marital
status

ORMarried = 1.769 (ns)
 The odds of continuing are 1.77 times (77%)
higher for two-parent compared to one-parent
families, controlling for marital status
Figures

Married & zResources.xls
Effect of Resources and Marital
Status on Plans to Continue
Fostering (Odds)
3.50
3.00
Odds
2.50
2.00
1.50
1.00
0.50
0.00
-3
-2
-1
0
1
2
3
One-Parent
0.42
0.53
0.66
0.83
1.05
1.31
1.65
Two-Parent
0.74
0.93
1.17
1.47
1.85
2.32
2.92
Standardized Resources
Probabilities
Effect of Resources and Marital
Status on Plans to Continue
Fostering (Probabilities)
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
-3
-2
-1
0
1
2
3
One-Parent
0.30
0.35
0.40
0.45
0.51
0.57
0.62
Two-Parent
0.43
0.48
0.54
0.60
0.65
0.70
0.74
Standardized Resources
Presenting Odds and Probabilities
in Tables

Tables 2.10 and 2.11
Question & Answer

Are foster families with more resources
more likely to continue fostering,
controlling for marital status?
 No (ns). The odds of continuing are 1.26 times
(26%) higher for each one standard deviation
(1.93) increase in resources, controlling for
marital status.
 Cont’d
Question & Answer (cont’d)
 For one-parent families the probability of
continuing is .35 for families with resources
two standard deviations below the mean, .45 for
families with the mean number of resources,
and .57 for families with resources two
standard deviations above the mean. For twoparent families the probability of continuing is
.48 for families with resources two standard
deviations below the mean, .60 for families with
the mean number of resources, and .70 for
families with resources two standard deviations
above the mean.
Comparing the Relative Strength
of IVs

Size of slope and OR depend on how the IV
is measured
 When IVs measured the same way (e.g., two
dichotomous IVs or two continuous IVs
transformed to z-scores) relative strength can
be compared

Nothing comparable to standardized slope
(Beta)
Nested Models
IV1, IV2, IV3
IV1, IV2
IV1
IV2, IV3
IV2
IV2
IV1, IV3
IV3
IV1
IV3
Nested Models (cont’d)




One regression model is nested within another if it
contains a subset of variables included in the
model within which it’s nested, and same cases are
analyzed in both models
The more complex model called the “full model”
The nested model called the “reduced model.”
Comparison of full and reduced models allows you
to examine whether one or more variable(s) in the
full model contribute to explanation of the DV
Sequential Entry of IVs

Used to compare full and reduced models
 e.g., family resources entered first, and then
marital status

Fchange used in linear regression
Sequential Entry of IVs (cont’d)

SPSS GZLM doesn’t allow sequential of IVs
 Estimate models separately and compare
omnibus likelihood ratio 2 values
 Reduced model 2(1) = 5.168
 Full model 2(2) = 6.585
 2 difference = 6.585 – 5.168 = 1.417
 df difference = 2 – 1
 p = .234
 Chi-square Difference.xls
Assumptions Necessary for
Testing Hypotheses

No assumptions unique to binary logistic
regression other than ones discussed in
GZLM lecture
Model Evaluation

Evaluate your model before you test
hypotheses or interpret substantive
results
 Outliers
 Analogs of R2
Outliers
Atypical cases
 Can lead to flawed conclusions
 Can provide theoretical insights
 Common causes

 Data entry errors
 Model misspecification
 Rare events
Outliers (cont’d)

Leverage

Residuals
 Standardized or unstandardized deviance
residuals

Influence
 Cook’s D
Leverage





Think of a seesaw
Leverage value for each case
Cases with greater leverage can exert a
disproportionately large influence
Leverage value for each case
No clear benchmarks
 Identify cases with substantially different
leverage values than those of other cases
Residuals
Difference between actual and estimated
values of the DV for a case
 Residual for each case
 Large residual indicates a case for which
model fits poorly

Residuals (cont’d)

Standardized or unstandardized deviance
residuals
 Not normally distributed
 Values less than -2 or greater than +2 warrant
some concern
 Values less than -3 or greater than +3 merit
close inspection
Influence
Cases whose deletion result in substantial
changes to regression coefficients
 Cook’s D for each case

 Approximate aggregate change in regression
parameters resulting from deletion of a case
 Values of 1.0 or more indicate a problematic
degree of influence for an individual case
Index Plot

Scatterplot
 Horizontal axis (X)
• Case id
 Vertical axis (Y)
• Leverage values, or
• Residuals, or
• Cook’s D
Index Plot: Leverage Values
Index Plot: Standardized
Deviance Residuals
Index Plot: Cook’s D
Analogs of R2
None in standard use and each may give
different results
 Typically much smaller than R2 values in
linear regression
 Difficult to interpret

Multicollinearity

SPSS GZLM doesn’t compute
multicollinearity statistics

Use SPSS linear regression

Problematic levels
 Tolerance < .10 or
 VIF > 10
Additional Topics
Polychotomous IVs
 Curvilinear relationships
 Interactions

Overview of the Process





Select IVs and decide whether to test
curvilinear relationships or interactions
Carefully screen and clean data
Transform and code variables as needed
Estimate regression model
Examine assumptions necessary to
estimate binary regression model, examine
model fit, and revise model as needed
Overview of the Process (cont’d)
Test hypotheses about the overall model
and specific model parameters, such as
ORs
 Create tables and graphs to present
results in the most meaningful and
parsimonious way
 Interpret results of the estimated model
in terms of logits, probabilities, odds, and
odds ratios, as appropriate

Additional Regression Models for
Dichotomous DVs

Binary probit regression
 Substantive results essentially indistinguishable
from binary logistic regression
 Choice between this and binary logistic
regression largely one of convenience and
discipline-specific convention
 Many researchers prefer binary logistic
regression because it provides odds ratios
whereas probit regression does not, and binary
logistic regression comes with a wider variety
of fit statistics
Additional Regression Models for
Dichotomous DVs (cont’d)

Complementary log-log (clog-log) and loglog models
 Probability of the event is very small or large

Loglinear regression
 Limited to categorical IVs

Discriminant analysis
 Limited to continuous IVs