Multilevel Regression Models

Download Report

Transcript Multilevel Regression Models

Multilevel Regression Models
sean f. reardon
17 june, 2004
Outline
I.
II.
III.
IV.
What/Why Multilevel Regression?
The Basic Multilevel Regression Model
Growth Models
A Taste of Advanced Topics
Part I.A.
What are multilevel data and
multilevel analysis?
What are multilevel data?
•Multilevel data are data where observations
are clustered in units
•Observations within the same unit may be
more similar than observations in separate
units, on average
–What effect does this have on estimation and
statistical inference?
Examples of multilevel data with
contextual clustering
• Observations of students, clustered within
schools
• Observations of siblings, clustered within
families
• Observations of individuals, clustered
within countries, states, or neighborhoods
Examples of multilevel data with
intra-person clustering
• Repeated test scores, clustered within
students
• Multiple measures of a latent construct,
clustered within persons
Other examples of multilevel
data
• Patients, clustered within doctors
• Coefficient estimates, clustered within
studies (meta-analysis)
• Widget sizes, clustered within factories
• And so on…
What is multilevel regression
analysis?
• Also called
– Hierarchical Linear Models
– Mixed Models
– Multilevel Models
– Growth Models
– Slopes-as-Outcomes Models
Multilevel Regression Models
• A form of regression models
• Used to answer questions about the relationship of
context to individual outcomes
• Used to estimate both within-unit and betweenunit relationships (and cross-level interactions)
– e.g., within- vs. between-school relationships between
SES and achievement
Part I.B.
What’s wrong with OLS?
The OLS Model
Yi   0  1 X i   i
 i ~ N 0, 
2

Assumptions of OLS
•
•
•
•
Linearity
Errors are normally distributed
Errors are homoskedastic
Errors are uncorrelated/independent
– Knowing the error term for one observation is
not informative of the error term of any other
observation
Some Example Data
• Data from Early Childhood Longitudinal
Study-Kindergarten Cohort (NCES, 19982004)
– Longitudinal study of 21,000 kindergarten
students in K class of 1998-99
– Followed through fifth grade (2003-04)
ECLS-K data
• Subsample
– 399 kindergarten students
– sampled from 17 schools
• Math Score:
– Fall kindergarten math test scores
– Administered 2-3 months into school year
• Age
– Age in months at time of math assessment
– Ranges from 60-79 months
What is the relationship between
age and math scores?
• Note: this is NOT a growth model
• It is a cross-sectional model
• A growth model requires repeated
measures, so we can observe intraindividual growth
OLS Regression:
Math on Age
. reg math age
Source |
SS
df
MS
-------------+-----------------------------Model | 1765.41947
1 1765.41947
Residual | 22896.5737
420 54.5156517
-------------+-----------------------------Total | 24661.9932
421 58.5795563
Number of obs
F( 1,
420)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
422
32.38
0.0000
0.0716
0.0694
7.3835
-----------------------------------------------------------------------------math |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age |
.4956666
.0871016
5.69
0.000
.3244572
.666876
_cons | -10.91381
6.049008
-1.80
0.072
-22.80391
.9762943
------------------------------------------------------------------------------
OLS Regression:
Math on Age
Next look at the residuals from this model.
Are they homoscedastic? Normally
distributed? Independent?
Math IRT vs. Age in Months:
10
20
30
40
50
Whole Sample With OLS Regression Line
60
65
70
Age in Months
Math IRT Scale Scores
75
OLS Predicted Value
80
60
212
213
216
376
379
435
452
460
466
481
488
621
635
643
0
20
40
60
0
20
40
60
0
20
40
196
60
1119
1212
Total
0
20
40
60
665
65
60
65
70
75
80
60
65
70
75
80
60
65
70
75
80
60
65
70
75
80
Age in Months
Math IRT Scale Scores
Graphs by fall school identification number
OLS Predicted Value
70
75
80
Math IRT vs. Age in Months:
10
15
20
25
30
School #216
60
65
70
Age in Months
Math IRT Scale Scores
OLS--Sample
75
Math IRT vs. Age in Months:
20
30
40
50
School #1212
65
70
Age in Months
Math IRT Scale Scores
OLS--Sample
75
• Residuals look correlated with each other
within schools
• Formal test of this dependence – ANOVA
Random Effects ANOVA
One-way Analysis of Variance for resid: Residuals
Number of obs =
R-squared =
422
0.1638
Source
SS
df
MS
F
Prob > F
------------------------------------------------------------------------Between s_id1
3750.169
17
220.59817
4.65
0.0000
Within s_id1
19146.405
404
47.392091
------------------------------------------------------------------------Total
22896.574
421
54.386161
Intraclass
Asy.
correlation
S.E.
[95% Conf. Interval]
-----------------------------------------------0.13487
0.05204
0.03287
0.23687
Estimated SD of s_id1 effect
Estimated SD within s_id1
Est. reliability of a s_id1 mean
(evaluated at n=23.44)
2.718128
6.884191
0.78517
60
212
213
216
376
379
435
452
460
466
481
488
621
635
643
0
20
40
60
0
20
40
60
0
20
40
196
60
60
1119
1212
Total
0
20
40
665
60
65
70
75
80
60
65
70
75
80
60
65
70
75
80
60
65
Age in Months
Math IRT Scale Scores
OLS--Sample
OLS--School
Graphs by fall school identification number
70
75
80
65
70
75
80
Math IRT vs. Age in Months:
15
20
25
30
35
Within-school fitted lines
60
65
70
Age in Months
within-school fitted regression lines
75
80
OLS--Whole Sample
Consequences of NonIndependence of Residuals
• The computation of standard errors in OLS
depends on the assumption of independence
of errors
• If errors are not independent, then standard
errors will, in general, be too small (so the
probability of Type I errors is larger than it
should be)
Two extreme examples
• n individuals observed from each of K
schools (total of nK observations)
• if Yik= Yjk for all i and j in school k, then
knowing k completely determines Y, so
there are really only K unique observations
• In this case, we can just treat each school as
a single observation (with outcome Y.k), and
use OLS on the sample of K schools
Two extreme examples
• n individuals observed from each of K
schools (total of nK observations)
• if YikYjk for all i and j and k, then knowing
k tells us nothing about Y, so there are really
nK unique observations
• In this case, there is no dependence of the
errors, so we can use OLS on the sample of
nK students.
When do we need multilevel
regression?
• In the intermediate case, where knowing the
school gives us some, but not complete
information about Y.
• e.g., test scores vary both within and between
schools
• e.g., individuals vary within and between
neighborhoods
• e.g., mood varies both within individuals (over
time) and between individuals
Intermission I
Part II.A.
Farewell OLS
What we know so far
• Two observations within the same unit may be
more similar than two observations chosen at
random
• If the regression model does not explain all of the
between-unit differences (and it is unlikely that
they will), we will have correlated errors within
units
• This is a violation of the independence of residuals
assumption in OLS
• At a minimum, this results in incorrect standard
errors (too small)
How do we allow dependence in
the regression model?
• We want a model that explicitly allows the
level of the outcome variable to vary across
level-two units
• For example, we want to let the mean
reading score differ across schools
• So let’s write a model that allows this
Some notation
• i indexes level-one units (people within
schools, observations within persons)
• j indexes level-two units (e.g., schools, if
we have students nested within schools)
• We will use r to denote a level-one residual,
and u to denote a level-two residual
Farewell OLS: Our first
multilevel model
• Instead of :
Yij   0  1 X ij   ij
• Let’s write:
Yij   0  1 X ij  u j  rij
Farewell OLS: Our first
multilevel model
Yij   0  1 X ij  u j  rij
Farewell OLS: Our first
multilevel model
Yij   0  1 X ij  u j  rij
Outcome for
observation i
in unit j
Farewell OLS: Our first
multilevel model
Yij   0  1 X ij  u j  rij
Outcome for
observation i
in unit j
Intercept
Farewell OLS: Our first
multilevel model
Yij   0  1 X ij  u j  rij
Outcome for
observation i
in unit j
Intercept
Value of X for
observation i
in unit j
Coefficient
Farewell OLS: Our first
multilevel model
Yij   0  1 X ij  u j  rij
Outcome for
observation i
in unit j
Residual term
specific to unit j
Intercept
Value of X for
observation i
in unit j
Coefficient
Farewell OLS: Our first
multilevel model
Yij   0  1 X ij  u j  rij
Outcome for
observation i
in unit j
Residual term
specific to
observation i
in unit j
Residual term
specific to unit j
Intercept
Value of X for
observation i
in unit j
Coefficient
Farewell OLS: Our first
multilevel model
Yij   0  1 X ij  u j  rij
Outcome for
observation i
in unit j
Residual term
specific to
observation i
in unit j
Residual term
specific to unit j
Intercept
Value of X for
observation i
in unit j
Coefficient
What is uj?
•
•
•
•
•
A residual term
Specific to unit j
Common to all observations in unit j
Subscript j, no subscript i
Interpretation: the difference between the
overall intercept and the intercept in unit j
What is rij?
• A residual term
• Specific to observation i in unit j
• Has a mean of 0, so any part of ij that is
common to all observations within j has
been removed
• So the rij’s may be independent
• Not guaranteed to be independent
Features of this model
• Note that: ij = uj + rij
• We also have:
Var(ij)
= Var(uj + rij)
= Var(uj) + Var(rij) + 2*Cov(uj,rij)
= Var(uj) + Var(rij)
• We will come back to variance decomposition
later
Features of this model
• The level of Yij – after adjusting for Xij –
may vary across the units
• We have made no assumptions yet about the
distribution of the uj’s or the rij’s.
• The relationship between X and Y does not
depend on j (1 does not depend on j)
So how do we estimate this
model?
• We want an estimate of 1 , the relationship
between Xij and Yij.
• Two approaches:
– Fixed Effects estimator
– Random Effects estimator
Another way to write this model
Yij   0  1 X ij  u j  rij
  0  u j   1 X ij  rij
  0 j   1 X ij  rij
where
0 j  0  u j
The fixed effects estimator
Yij   0 j  1 X ij  rij
• We have ‘absorbed’ the level-two error terms (the
uj’s) into the intercept
• Now each aggregate unit has its own intercept; so
between-unit variation is accounted for in the
intercepts
• This solves the dependence problem with the rij’s
(they may still not be independent, but not because
of unexplained variation between-level-two units)
The fixed effects estimator
Yij   0 j  1 X ij  rij
• Three methods of obtaining the fixed effects
estimator 1 from this model:
– Dummy variables for each unit
– Change or difference scores
– Deviations from mean unit values
• All three are mathematically equivalent
• All can be estimated via OLS, with some
adjustment of the degrees of freedom
15
20
25
30
35
Math: Within School Slopes (Fixed-Effects) vs. OLS
60
65
70
Age in Months
Fixed-Effects Model
75
OLS
80
The random effects estimator
• We treat the variance between units as
consisting of parameter variance (true
variance between units) and error variance
(extra variance produced because of
sampling)
• We treat units with larger samples as having
more reliable estimated unit means
The random effects estimator
• So our estimate of the unit mean for a
particular unit is a weighted average of the
unit mean estimated in a fixed effects model
and the overall mean—our estimates are
shrunken toward the grand mean
• Let’s see a picture of this:
Math: Within School Slopes vs. OLS
30
25
20
15
15
20
25
Math IRT Scale Score
30
35
Fixed-Effects
35
Random Effects
60
65
70
Age in Months
Xb + u[s_id1]
75
80
Fitted values
60
65
70
Age in Months
Xb + u[s_id1]
75
80
Fitted values
Part II.B.
The basic multilevel model
The basic multilevel model
• Random effects ANOVA is the simplest
random effects model
• The random effects model is a very simple
kind of multilevel model
• So we are building up here to the multilevel
model
The multilevel model as a
random effects model
•We write the random effects model as:
Yij = 0 + uj + rij
•We can also write it as:
Yij = 0j + rij
.
.
The multilevel model as a
random effects model
•We write the random effects model as:
Yij = 0 + uj + rij
•We can also write it as:
Yij = 0j + rij
.
.
Level-1 model
The multilevel model as a
random effects model
•We write the random effects model as:
Yij = 0 + uj + rij
•We can also write it as:
Yij = 0j + rij
0j = 00 + uj
.
Level-1 model
The multilevel model as a
random effects model
•We write the random effects model as:
Yij = 0 + uj + rij
•We can also write it as:
Yij = 0j + rij
0j = 00 + uj
Level-1 model
Level-2 model
(here we’re using the 00 notation where before we
used 0; this is the notation of HLM)
HLM Notation (Null Model)
• Level-1 model:
Yij = 0j + rij
• Level-2 model
0j = 00 + uj
• Mixed model:
Yij = 00 + uj+ rij
HLM Notation
• Level-1 model:
Yij = 0j + 1jXij + rij
• Level-2 model:
0j = 00 + uj
1j = 10
• Mixed model:
Yij = 00 + 10Xij +uj+ rij
HLM Notation
• Mixed model:
Yij = 00 + 10Xij +uj+ rij
Structural part
of the model
HLM Notation
• Mixed model:
Yij = 00 + 10Xij +uj+ rij
Structural part
of the model
Stochastic
(random) part
of the model
HLM Notation
• Mixed model:
Yij = 00 + 10Xij +uj+ rij
Fixed Effects
HLM Notation
• Mixed model:
Yij = 00 + 10Xij +uj+ rij
Fixed Effects
Random Effect
Part II.C.
Variance Decomposition
Reminder: The unconditional (null)
random effects model
•The one-way random effects ANOVA model:
Yij = 00 + uj + rij
Mixed (composite) model
•We can also write it as:
Yij = 0j + rij
Level-1 model
Level-2 model
0j = 00 + uj
•Useful as a baseline model
•Allows us to decompose the variance
Variance decomposition
• Var(Yij) = Var(uj) + Var(rij)
= 00 + 2
• Intraclass Correlation (): the proportion of
the total variance in Yij that is between level2 units
 = 00 /(00 + 2)
Multilevel Analyses
• Analytic Problems:
– Explain variation in means across units
– Estimate within- and between-unit relationships
– Distinguish contextual from compositional
variation in means across units
– Explain how and why within-unit relationships
differ across units
Explaining variation in means
across units
• Why do some schools have higher mean
achievement levels?
• Why do some hospitals have lower
mortality rates?
• Why do some countries have higher infant
mortality rates?
Explaining variation in means
across units
• Means-as-outcomes regression (MLM)
Yij = 0j + rij
0j = 00 + 01Wj + uj
Note: why don’t we just compute the means
of Y in each unit and use OLS at level 2?
where Wj is a variable indicating some characteristic of
unit j (no i subscript)
–
Wj may be inherent to level-2
• School curriculum, doctor/patient ratio, regime type
–
Wj may be a compositional property of unit j
• School racial composition, patient diagnosis composition, average
maternal education level
Explaining variation in means
across units
• Called “means-as-outcomes” because the Wj’s can only
explain mean differences in Yij across units (Wj only
predicts the intercept, not the slope)
• uj is now a level-2 residual
• We can compute an R2 at both levels of the model:
– 00 from the null model is the total level-2 variance that can be
explained
–
2
R between
–
2
R within
= [00 (null) - 00 (model)]/[00 (null)]
= [2 (null) - 2(model)]/[2 (null)]
Example (null model)
The outcome variable is
MATH1
Final estimation of fixed effects:
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
20.369682
0.129474
157.327
867
0.000
---------------------------------------------------------------------------Final estimation of variance components:
----------------------------------------------------------------------------Random Effect
Standard
Variance
df
Chi-square P-value
Deviation
Component
----------------------------------------------------------------------------INTRCPT1,
U0
3.25337
10.58439
867
3733.93573
0.000
level-1,
R
6.46230
41.76128
-----------------------------------------------------------------------------
Intraclass correlation:  = 10.58439 / (10.58439 + 41.76128) = 0.202
Example: Means-as-outcomes model
The outcome variable is
MATH1
Final estimation of fixed effects:
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
19.585257
0.133633
146.560
866
0.000
S_PRIVAT, G01
3.691165
0.286361
12.890
866
0.000
----------------------------------------------------------------------------
Final estimation of variance components:
----------------------------------------------------------------------------Random Effect
Standard
Variance
df
Chi-square P-value
Deviation
Component
----------------------------------------------------------------------------INTRCPT1,
U0
2.87151
8.24560
866
3070.97445
0.000
level-1,
R
6.46291
41.76917
-----------------------------------------------------------------------------
Level-2 variance explained: R2 = (10.58439 - 8.24560) / (10.58439) = 0.221
Part II.D.
Variable Centering
Variable Centering
• An important topic with major implications
for fitting and interpreting multilevel
models…
Variable Centering
• An important topic with major implications
for fitting and interpreting multilevel
models…
• …which we will not have time to cover
today.
Part II.E.
Random Coefficients and
the Full 2-Level Model
Individual-Level Model
Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij
Slope on X1
for unit j
Intercept
for unit j
Outcome for
person i in unit j
Slope on X2
for unit j
Slope on XK
for unit j
Contextual Model
Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij
0j = 00
1j = 10
2j = 20
…
Kj = K0
In OLS, the intercept and
slopes are fixed – they are the
same in all units
Contextual Questions
• Does the intercept vary across units?
• Can we predict the intercepts using level-2
covariates (Z’s)?
• Do the slopes vary across units?
• Can we predict the slopes using level-2
covariates (Z’s)?
Does the intercept vary across units?
Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij
0j = 00 + u0j
1j = 10
2j = 20
…
Kj = K0
In the random effects model,
the intercept varies around
some grand mean intercept
(00), and the slopes are fixed
– they are the same in all units
Test H0: Var(u0j) = 0
Can we predict the intercepts?
Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij
0j = 00 + 01Z1 + 02Z2 + … + 0MZM + u0j
1j = 10
Here, the Zm’s predict the
2j = 20
intercept.
…
Kj = K0
Test H0: 0m = 0
Do the slopes vary across units?
Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij
0j = 00 + u0j
1j = 10 + u1j
2j = 20 + u2j
The intercept and each of the
slopes varies around thei
grand means (the k0’s)
…
Kj = K0 + uKj
Test H0: Var(ukj) = 0
Can we predict the slopes?
Yij = 0j + 1jX1ij + 2jX2ij + … + KjXKij + rij
0j = 00 + 01Z1 + 02Z2 + … + 0MZM + u0j
1j = 10 + 11Z1 + 12Z2 + … + 1MZM + u1j
2j = 20 + 21Z1 + 22Z2 + … + 2MZM + u2j
…
Kj = K0 + K1Z1 + K2Z2 + … + KMZM + uKj
Here, the Zm’s predict the
slopes.
Test H0: km = 0
Example
• ECLS-K Fall Kindergarten data
• 8,799 white and black students in 807
schools (618 public, 189 private schools)
• SES measured by standarized SES variable
• Outcome is Fall K math score
Research Questions
• What is the within-school relationship between
race and SES and math scores?
• Do average math scores vary across schools?
• Are math scores higher in private schools?
• Does the relationship between SES and math
scores vary across schools?
• Is the relationship between SES and math scores
lower in private schools?
Example (cont.)
• See HLM command files lecture8[a-g].hlm
and corresponding HLM output files
lecture8[a-g].txt
• We will meet in the lab 2/11/04.
Intermission II
Part III.A.
Growth Modeling
Growth Models
• Allow us to model development over time
and to investigate correlates of betweenperson variation in growth trajectories over
time.
• The fitted model describes an expected
growth trajectory for each person, rather
than a single expected value on an outcome
measure
Examples
• Modeling inter-individual changes in some
outcome
– School achievement, income, attitudes
• Modeling growth in national characteristics
– GNP, population, etc
• Modeling change in organizational
outcomes
– Business profits, hospital mortality rates
45
268
314
442
514
569
624
723
918
949
978
1105
1542
1552
1653
0
1 2 3 4
0 1 2 3 4
0 1 2 3
4
0 1 2
3 4
9
11
12
13
14
15
11
12
13
14
15
age
Graphs by id
11
12
13
14
15
11
12
13
14
15
45
268
314
442
514
569
624
723
918
949
978
1105
1542
1552
1653
0
1 2 3 4
0 1 2 3 4
0 1 2 3
4
0 1 2
3 4
9
11
12
13
14
15
11
12
13
14
15
age
Graphs by id
11
12
13
14
15
11
12
13
14
15
The Growth Model
• Made up of a within-unit model of change
and a between-unit model of interindividual variation in change
• Requires repeated measures of outcome
within each unit
• Requires multilevel error structure since
errors are likely not independent
Within-unit model of change
• Outcome varies as a function of time
Yit = fi(time) + rit
• Simple case: f is linear:
Yit = 0i + 1i(timeit) + rit
Within-unit model of change
• Outcome varies as a function of time
Yit = fi(time) + rit
• Simple case: f is linear:
Yit = 0i + 1i(timeit) + rit
Intercept for
unit i
Within-unit model of change
• Outcome varies as a function of time
Yit = fi(time) + rit
• Simple case: f is linear:
Yit = 0i + 1i(timeit) + rit
Intercept for
unit i
Growth slope
for unit i
45
268
314
442
514
569
624
723
918
949
978
1105
1542
1552
1653
0
1 2 3 4
0 1 2 3 4
0 1 2 3
4
0 1 2
3 4
9
11
12
13
14
15
11
12
13
14
15
age
Graphs by id
11
12
13
14
15
11
12
13
14
15
4
3
2
1
0
11
12
13
age
14
15
Between-unit model
• Model between-unit differences in growth
trajectories
Yit = 0i + 1i(timeit) + eit
0i = 00 + 01(Xi) + r0i
1i = 10 + 11(Xi) + r1i
Between-unit model of
intercept
Between-unit model of
slope
4
3
2
1
0
11
12
13
age
14
15
Within-unit model for change
• Simple case: f is linear:
Yit = 0i + 1i(timeit) + rit
• Need to specify zero-point for time
– Pick a point that is interpretable and substantively
meaningful for your study
– e.g., age; time in school, time since institution opened,
calendar time, etc…
– Affects estimation and interpretation of the intercept
Defining time
TOLERANCEit = 0i + 1i(AGEit) + eit
0i = 00 + r0i
1i = 10 + r1i
Final estimation of fixed effects:
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
-0.081187
0.511521
-0.159
15
0.876
For
TIME slope, B1
INTRCPT2, G10
0.130812
0.043074
3.037
15
0.009
----------------------------------------------------------------------------
Defining time
TOLERANCEit = 0i + 1i(AGEit-11) + eit
0i = 00 + r0i
1i = 10 + r1i
Final estimation of fixed effects:
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
1.357750
0.074445
18.238
15
0.000
For
TIME slope, B1
INTRCPT2, G10
0.130812
0.043074
3.037
15
0.009
----------------------------------------------------------------------------
Modeling Inter-personal variation
in growth trajectories
TOLERANCEit = 0i + 1i(AGEit-11) + eit
0i = 00 + 01(MALEi) + r0i
1i = 10 + 11(MALEi) + r1i
Final estimation of fixed effects:
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
1.355556
0.102740
13.194
14
0.000
MALE, G01
0.005016
0.155328
0.032
14
0.975
For
TIME slope, B1
INTRCPT2, G10
0.102333
0.058323
1.755
14
0.101
MALE, G11
0.065095
0.088177
0.738
14
0.473
----------------------------------------------------------------------------
Parameters of the growth model
Yit = 0i + 1i(timeit) + eit
0i = 00 + 01(Xi) + r0i
1i = 10 + 11(Xi) + r1i
Structural parameters of the
growth model
0i: true intercept for individual i
1i: true slope for individual i
00: population average intercept (for individuals with X=0)
01: population average difference in level-one intercept for
individuals with one unit difference in X
10: population average slope (for individuals with X=0)
01: population average difference in level-one slope for
individuals with one unit difference in X
Stochastic (random) parameters
of the growth model
Var(eit) = e2: level–1 residual variance
Var(r0i) = 02: level–2 residual variance in true
intercept (0i)
Var(r1i) = 12: level–2 residual variance in true slope
(1i)
Cov(r0i, r1i) = 01: level–2 residual covariance in true
intercept (0i) and true slope (1i)
Part III.B.
Additional Issues in
Growth Modeling
Growth Modeling Issues
•
•
•
•
•
•
•
Timing of observations
Centering the time variable
Variable numbers of observations
Missing observations
Time-varying covariates
Slope-intercept covariances
Non-linear growth curves
Timing of observations
• If each person (unit) has the same number of
observations, and if the timing of observations is
the same for all units,the data (and design) are said
to be balanced.
• In this case, the growth model is equivalent to a
repeated measures ANOVA
• But if not, the growth model is mode flexible than
the repeated measures ANOVA model
Centering time in growth models
• Group-mean centering time results in unbiased estimate of
average within-person growth rate
• Any other centering of time results in biased estimate of
average within-person growth rate if individuals’ mean
times are correlated with their mean outcomes
• In balanced design, mean time is the same for all persons,
so centering does not affect slope estimate
• In unbalanced design, it may be necessary to center time
• Centering time affects interpretation of the intercept
Variable numbers of observations
• Number and timing of observations may differ by
design or because of missingness
• Types of missingness (see S&W p. 157-159):
– MCAR: missing completely at random
– CDD: covariate dependent dropout
– MAR: missing at random
• The growth model estimates are unbiased under
any of these types of missingness
Time-varying covariates
• So far, we have considered only the
relationship between stable person-level
covariates and the growth trajectory
• What about time-varying covariates?
• A time-varying covariate is a covariate
whose value changes over time
• Examples from S&W chapter 5
Examining the covariance matrix
in growth models
Yit = 0i + 1i(timeit) + eit
0i = 00 + 01(Xi) + r0i
1i = 10 + 11(Xi) + r1i
Var(r0i) = 02: level–2 residual variance in true intercept (0i)
Var(r1i) = 12: level–2 residual variance in true slope (1i)
Cov(r0i, r1i) = 01: level–2 residual covariance in true
intercept (0i) and true slope (1i)
The slope-intercept covariance
• Do individuals with high initial values of Y have
faster growth rates of Y?
– Is r0i correlated with r1i?
• It depends on how we center the time variable
• It is possible to observe any correlation (-1 to +1)
between r0i and r1i depending where we define the
intercept.
CNLSY reading fitted growth curves
PIAT reading score
65.00
52.50
40.00
27.50
15.00
-0.50
1.00
2.50
4.00
Age (centered at 6.5 years)
5.50
Positive mood fitted growth curves
Positive mood level
250.00
212.50
175.00
137.50
100.00
0
1.67
3.33
5.00
Time (days since start of treatment)
6.67
Non-linear growth curves
• So far we have assumed that each
individual’s growth trajectory is linear
(constant growth rate over time)
• Now we consider cases where growth may
be non-linear
– Polynomial curve
– Piecewise linear
– Discontinuous
Polynomial growth curves
• Quadratic growth trajectory:
Yit = 0i + 1i(timeit) + 2i(time2it) + eit
18
16
14
Hourly Wage ($)
0i = 00 + r0i
1i = 10 + r1i
2i = 20 + r2i
Wage Growth
12
10
8
6
4
2
0
18
20
22
24
Age
26
28
30
Piecewise linear growth curves
• Piecewise growth trajectory:
Yit = 0i + 1i(time1it) + 2i(time2it) + eit
18
16
14
Hourly Wage ($)
0i = 00 + r0i
1i = 10 + r1i
2i = 20 + r2i
Wage Growth
12
10
8
6
4
2
0
18
20
22
GED
24
Age
26
28
30
Discontinuous growth curves
• Discontinuous growth trajectory:
Yit = 0i + 1i(timeit) + 2i(eventit) + eit
18
16
14
Hourly Wage ($)
0i = 00 + r0i
1i = 10 + r1i
2i = 20 + r2i
Wage Growth
12
10
8
6
4
2
0
18
20
22
GED
24
Age
26
28
30
Discontinuous growth curves
• Discontinuous growth trajectory (with timevarying growth rate):
Yit = 0i + 1i(timeit) + 2i(eventit) + 3i(timeit*eventit) + eit
Wage Growth
18
16
14
Hourly Wage ($)
0i = 00 + r0i
1i = 10 + r1i
2i = 20 + r2i
3i = 30 + r3i
12
10
8
6
4
2
0
18
20
22
GED
24
Age
26
28
30
Intermission III
Part IV.A.
A Taste of Advanced Topics
3+ Level Models
Examples of 3-level data
•
•
•
•
•
Students > classrooms > schools
Students > schools > districts
Patients > Doctors > Hospitals
Children > Families > Neighborhoods
Repeated observations > individuals > contexts
Data may have more than 3 levels—but the more levels, the more data
needed to model relationships.
• For example:
Repeated obs > students > classrooms > schools > districts > countries > planets …
The 3-level null model
Yijk   0 jk  eijk
Level 1 model
 0 jk   00k  r0 jk
Level 2 model
 00k   000  u00k
Level 3 model
Yijk   000  u00k  r0 jk  eijk
True grand
mean intercept
Level-3 error/residual
term: deviation of cluster
k mean from grand mean
Level-1 error/residual: deviation of
observation i from cluster j mean
Level-2 error/residual term: deviation
of cluster j mean from cluster k mean
3-level variance decomposition
• 02 = Var(eijk): true within level-1 variance
(variance within j, between i)
• 002 = Var(r0jk): true level-2 variance
(variance within k, between j)
• 0002 = Var(u00k): true between k level-3
(variance within k, between j)
• Var(Yijk)= 02 + 002 + 0002
3-level variance decomposition
• Remember the ICC from the 2-level model: Proportion of
true variance in Y that lies between clusters
– ICC = 002 /(02 + 002)
• We apply the same logic to the 3-level model:
• Proportion of total variance that lies between level-3 units:
0002 /(02 + 002 + 0002 )
• Proportion of level-1 + level –2 variance that lies between
level-2 units:
002 /(02 + 002 )
The 3-level growth model
Ytjk   0 jk   1 jk timetjk   etjk
 0 jk   00k   01k X jk   r0 jk
 1 jk  10k  11k X jk   r1 jk
 00k
 01k
10k
11k
  000   001Z k   u00k
  010
  100   101Z k   u10k
  110
Part IV.B.
A Taste of Advanced Topics:
Meta-Analysis
Meta-Analysis
• The problem: we often have a lot of studies,
each trying to estimate the same parameter
– effect of small classes on learning rates
– effect of welfare receipt on income, maternal
depression, child welfare, etc.
Multiple studies
• Suppose we conducted a number of similar studies
to estimate the effect of treatment T on outcome Y.
• Each study gives us an estimate of d, the
standardized effect of T on Y.
• The estimates of d may vary across studies –
Why?
• We would like to estimate the true average effect
of T in the population
Possible reasons for varying
estimates across studies
• The d’s may vary because of sampling variance
(each study is conducted with a different sample)
• The d’s may vary because of differences in the
populations of each study sample
• The d’s may vary because of differences in the
study design (e.g., different instruments)
• The d’s may vary because of differences in the
treatment studied (differences in implementation,
duration, etc.)
Approach 1
• In each study, we fit a regression model to estimate the
treatment effect 1:
Yi   0  1 Ti   ri
• But the treatment effect may vary across studies:
1 j   10  u1 j
• Here 10 is the true mean effect of T across studies, and u1j
is the deviation of the effect in study j from this mean.
Approach 1
• If we had access to all the data from each study,
we could fit a multilevel model:
Yij   0 j  1 j Tij   rij
 0 j   00  u0 j
1 j   10  u1 j
• But what if we don’t have access to all the original
data?
Approach 2
• Recall, that in each study, we have a regression model like
this:
Yi   0  1 Ti   ri
• Typically, each study will report the estimate of 1 and
some measure of its sampling variance (its standard error)
• Remember (lecture 4) that we estimate the grand mean by
weighting individual estimates by their precision (the
inverse of the variance of the estimate)
• We use the standard errors of the study-specific effect
estimates to construct these weights, so we don’t need the
original data.
Key Points
• We need from each study an estimate of the
treatment effect and its sampling variance
(standard error)
• The treatment effects must be measured in
the same metric across all studies
Part IV.C.
A Taste of Advanced Topics:
Cross-Classified Data and Models
Cross-Classified Data
• Observations nested in multiple, nonhierarchical units
– e.g. persons nested in schools and
neighborhoods
– patients nested in multiple doctors/clinics
– students nested in multiple classrooms
– repeated observations on relationship
dynamics nested in partners (where
partner changes are common)
Cross-Classified Data
• Yijk is outcome Y for person i in
neighborhood j and school k
Yijk   0 jk  rijk
 0 jk   000  u0 j  v0 k  w0 jk
Cross-Classified Data
• In 3-level hierarchical data, each
observation can be decomposed into:
–
–
–
–
–
a grand mean (common to all observations)
a row-specific (e.g. school) deviation from the grand mean
a column-specific (e.g. nbhd) deviation from the grand mean
a row*column-specific deviation
and individual deviation from the row*column mean
Yijk   000  u0 j  v0 k  w0 jk  rijk
Conclusion
Resources:
• Textbooks
– Raudenbush & Bryk (2002) Hierarchical Linear
Models. Sage.
– Singer & Willett (2003) Applied Longitudinal Data
Analysis
• Multilevel Listserv
– http://www.nursing.teaching.man.ac.uk/staff/mcampbell/mu
ltilevel.html
– http://www.jiscmail.ac.uk/lists/multilevel.html
Resources:
• Software
–
–
–
–
–
HLM
MLWin
SAS (PROC MIXED)
SPSS
Stata (-gllamm-)