Transcript Slide 1

Are Teacher-Level Value-Added
Estimates Biased?
An Experimental Validation of Non-Experimental Estimates
Thomas J. Kane
Douglas O. Staiger
HGSE
Dartmouth College
LAUSD Data


Grades 2 through 5
Three Time Periods:




Outcomes:




California Standards Test (Spring 2004- 2007)
Stanford 9 Tests (Spring 2000 through 2002)
California Achievement Test (Spring 2003)
All standardized by grade and year.
Covariates:




Years before Random Assignment: Spring 2000 through Spring 2003
Years of Random Assignment: Either Spring 2004 or 2005
Years after Random Assignment: Spring 2005 (or 2006) through Spring 2007
Student: baseline math and reading scores (interacted with grade), race/ethnicity (hispanic, white, black,
other or missing), ever retained, Title I, Eligible for free lunch, Gifted and talented, Special education,
English language development (level 1-5).
Peers: Means of all the above for students in classrooms.
Fixed Effects: School x Grade x Track x Year
Sample Exclusions:


Special Education Exclusion: >20 percent special education classes
Small and Large Class Exclusion: Fewer than 5 and more than 36 students in class
Experimental Design
 Sample of NBPTS applicants from Los Angeles area.
 Sample of Comparison teachers working in same school,
grade, calendar track.
 LAUSD chief of staff wrote letters to principals inviting
them to draw up two classrooms that they would be willing
to assign to either teacher.
 If principal agreed, classroom rosters (not individual
students) were randomly assigned by LAUSD on the day of
switching.
 Yielded 78 pairs of teachers (156 classrooms and 3500
students) for whom we had estimates of “value-added”
impacts from the pre-experimental period.
Step 1: Estimate a Variety of Non-Experimental
Specifications using Pre-Experimental Data
Aijt  X ijt    ijt , where  ijt   j   jt   ijt
Aijt  St udent t est score (levelor gain).
X ijt  St udent and classroom- levelcovariates.
 j  T eachereffect
 jt  Non - persist entclassroomby year shock
 ijt  St udent by year error
Generate Empirical Bayes estimates (VAj) of teacher effects using a
variety of specifications of A, X.
Step 2: Test Validity of VAj in Predicting WithinPair Experimental Differences
At the classroom level:
Yjp   p  VAjp   jp for j  1,2and p  1,...,78
Differencing within each pair, p=1 through 78:
.
Y2 p  Y1 p   VA2 p VA1 p  ~p for p  1,...,78
When thedependent variable Y is equal to...
Baseline characteristics, then
Ho :   0
T est scoresexperimental year ,
Ho :   1
T est scoresexperimental year1 ,
H o :   1,   1
T est scoresexperimental year 2 .
H o :   1,   1
Summary of Sample Comparisons
 The experimental sample of teachers was more
experienced. (15 vs. 10.5 years in LAUSD)
 The pre-experimental mean and s.d. of VAj were similar in
the experimental and non-experimental samples.
 Could not reject the hypothesis of no relationship between
VA2p-VA1p and differences in mean baseline characteristics.
 Could not reject the hypothesis of no differential attrition
or teacher switching.
Table 3: Non-experimental Specifications of Teacher Effects
Standard Deviation of Each
Component (in Student-level
Standard Deviation Units)
Specification Used for Non-experimental Teacher Effect
Teacher by
Year Random
Teacher Effects
Effect
Mean Sample
Size per
Teacher
Math Levels with...
No Controls
Student/Peer Controls (incl. prior scores)
Student/Peer Controls (incl. prior scores) & School F.E.
Student Fixed Effects
0.455
0.228
0.216
0.098
0.224
0.180
0.178
0.072
48.612
42.843
42.843
48.612
Math Gains with...
No Controls
Student/Peer Controls
Student/Peer Controls & School F.E.
0.228
0.227
0.217
0.223
0.220
0.221
45.171
45.171
45.171
Note: The above estimates are based on the total variance in estimated teacher fixed effects using observations
from the pre-experimental data (years 1999-2000 through 2002-03). See the text for discussion of the estimation of
the decomposition into teacher by year random effects, student-level error, and "actual" teacher effects. The
sample was limited to schools with teachers in the experimental sample. Any individual students who were in the
experiment were dropped from the pre-experimental estimation, to avoid any spurious relationship due to
regression to the mean, etc.
Why would student fixed-effect models
underestimate differences in teacher value added?
 When we demean student data, we subtract off 1/T of current teacher’s
effect (T=#years of data on each student)
  underestimate magnitude of teacher effect by 1/T
(i.e., need d.f. correction)
 In our data, typical student had 2-4 years of data, so magnitude is
biased down by ½ to ¼.
 Subtract off even more of teacher effect if some of current teacher’s
effect persists into scores in future years (FE model assumes no
persistence)
  underestimate magnitude by 1/T for teacher in year T (since this
teacher’s effect only in last year’s score)
  underestimate magnitude by more than 1/T for teachers in earlier
years, with downward bias largest for first teacher.
 If first teacher’s effect completely persistent, we would subtract off
all of the effect & estimate no variance in 1st year teacher effect.
Table 6: Predicting Outcomes During Experimental Period (Pair-level Regressions)
Specification Used for Non-experimental Teacher Effect
Math Levels with...
No Controls
Student/Peer Controls (incl. prior scores)
Student/Peer Controls (incl. prior scores) & School F.E.
Student Fixed Effects
Math Gains with...
No Controls
Student/Peer Controls
Student/Peer Controls & School F.E.
Test Score First Year
Coefficient
R2
N:
0.495***
(0.103)
0.863***
(0.178)
0.918***
(0.180)
1.987***
(0.488)
0.183
78
0.213
78
0.230
78
0.161
78
0.833***
(0.204)
0.841***
(0.211)
0.878***
(0.217)
0.168
78
0.170
78
0.176
78
Test Score
Second Year
Coefficient
Test Score
Third Year
Coefficient
0.273**
(0.103)
0.378*
(0.174)
0.410*
(0.178)
0.915
(0.471)
0.128
(0.097)
0.068
(0.137)
0.102
(0.140)
0.344
(0.428)
0.373
(0.191)
0.381
(0.201)
0.405
(0.210)
0.054
(0.153)
0.054
(0.159)
0.067
(0.165)
Note: Each baseline characteristic listed in the columns was used as a dependent variable, regressing the within-pair difference in mean
baseline characteristic on different non-experimental estimates of teacher effects. The coefficients were estimated in separate bivariate
regressions with no constant. Robust standard errors are reported in parentheses.
Figure 1: Within Pair Differences in Pre-experimental
Value-added and End of First Year Test Score
-1.5
-1
-.5
0
.5
1
1.5
Mathematics
0
.2
.4
.6
Within Pair Difference in Pre-experimental Value-added
Observed
45-degree Line
Linear Fitted Values
Lowess Fitted Values
.8
Table 6: Predicting Outcomes During Experimental Period (Pair-level Regressions)
Specification Used for Non-experimental Teacher Effect
Math Levels with...
No Controls
Student/Peer Controls (incl. prior scores)
Student/Peer Controls (incl. prior scores) & School F.E.
Student Fixed Effects
Math Gains with...
No Controls
Student/Peer Controls
Student/Peer Controls & School F.E.
Test Score First Year
Coefficient
R2
N:
0.495***
(0.103)
0.863***
(0.178)
0.918***
(0.180)
1.987***
(0.488)
0.183
78
0.213
78
0.230
78
0.161
78
0.833***
(0.204)
0.841***
(0.211)
0.878***
(0.217)
0.168
78
0.170
78
0.176
78
Test Score
Second Year
Coefficient
Test Score
Third Year
Coefficient
0.273**
(0.103)
0.378*
(0.174)
0.410*
(0.178)
0.915
(0.471)
0.128
(0.097)
0.068
(0.137)
0.102
(0.140)
0.344
(0.428)
0.373
(0.191)
0.381
(0.201)
0.405
(0.210)
0.054
(0.153)
0.054
(0.159)
0.067
(0.165)
Note: Each baseline characteristic listed in the columns was used as a dependent variable, regressing the within-pair difference in mean
baseline characteristic on different non-experimental estimates of teacher effects. The coefficients were estimated in separate bivariate
regressions with no constant. Robust standard errors are reported in parentheses.
Structural Model for Estimating Fadeout Parameter, δ
Aijt  ijt   ijt , where ijt  ijt1   jt   jt
Aijt  Student test score levelor gain
ijt  Cumulativeschool/teacher impact
  Fade - out parameterfor school/teacher impact
 jt  Effectof teacherin year t
 jt  Non - persistentclassroomby yearshock
 ijt  Serially correlatedstudent by year error.
IV Strategy for Estimating Fade-Out
Parameter (δ) in Non-Exp Data
 We can rewrite the error component model as:
Aijt  Aijt1   jt   jt   ijt   ijt1 
 OLS estimates of δ biased, because Aijt-1 correlated with error
 Use prior year teacher dummies to instrument for Aijt-1

Assumes that prior year teacher assignment is not correlated with

Control for teacher or classroom fixed effects to capture current
teacher/classroom effects.
 it or  it1
Table 11: IV Estimates of Teacher Effect Fade-out Coefficient
A
Math
N:
English Language Arts
N:
Current Teacher F.E.
Current Classroom F.E.
Student Controls
B
C
0.489***
(0.006)
89,277
0.478***
(0.006)
89,277
0.401***
(0.007)
89,277
0.533***
(0.007)
87,798
0.514***
(0.007)
87,798
0.413***
(0.009)
87,798
Yes
No
No
No
Yes
No
No
Yes
Yes
Note: Coefficients were estimated using separate 2SLS regressions with student test score as the
dependent variable. Each specification included controls as indicated, grade-by-year F.E. Baseline
test score is instrumented using a teacher dummy variable for the teacher associated with the
baseline test.
Joint Validity of Non-Experimental
Estimates of δ and VAj


Y2 pt  Y1 pt    tVA2 p   tVA1 p  ~pt
.
T est of Joint Validity : H o :   1
Table 12: Predicting Outcomes in Future Years Using Estimated Fade-out Coefficients
Specification Used for Non-experimental
Teacher Effect
Math Levels with...
Student/Peer Controls (incl. prior scores)
Math Gains with...
Student/Peer Controls
Language Arts Levels with...
Student/Peer Controls (incl. prior scores)
Language Arts Gains with...
Student/Peer Controls
N:
P-value for Test
of Coefficients
Years 0, 1, Equivalent Across
and 2 Pooled
Years
Year 0
Year 1
Year 2
0.852***
(0.177)
0.894*
(0.429)
0.209
(0.826)
0.843***
(0.207)
0.311
0.828***
(0.207)
0.889
(0.477)
0.060
(0.941)
0.819***
(0.239)
0.289
0.987***
(0.277)
1.155
(0.689)
2.788
(1.454)
1.054**
(0.343)
0.144
0.826**
(0.262)
0.668
(0.631)
1.880
(1.413)
0.829**
(0.319)
0.170
78
78
78
234
Note: Each year's classroom average test score was used as the dependent variable, regressing the within-pair difference in
average test score on different non-experimental estimates of teacher effects discounted in year two by the coefficients in
column "C" of Table 11 and in year three by the square of those same coefficients. The coefficients were estimated in
separate regressions with no constant. Robust standard errors are reported in parentheses.
Potential Sources of Fade-out
 Unused knowledge may becomes inoperable.
 Grade-specific content is not entirely reflected in
future achievement. (e.g. even if you’ve not
forgotten logarithms, may not hurt you in calculus)
Potential Sources of Fade-out
 Unused knowledge becomes inoperable.
 Grade-specific content is not entirely relevant for
future achievement. (e.g. even if you’ve not
forgotten logarithms, may not hurt you in calculus)
 Takes more effort to keep students at high
performance level than at low performance level.
 Students of best teachers mixed with students of
worst teachers in following year, and new teacher
will focus effort on students who are behind.
( no fade-out if teachers were all effective)
Is Teacher-Student Sorting Different in Los Angeles?
Table 13: Comparing Assortive Matching in Los Angeles to Other Urban Districts
Los Angeles
Math
ELA
New York City
Math
ELA
Boston
Math
ELA
Standard Deviation in Teacher's Value-added
0.184
0.135
0.157
0.121
0.191
0.162
Standard Deviation in Baseline Achievement in
Teacher's Classroom
0.400
0.408
0.512
0.513
0.528
0.539
Correlation between Teacher's Value-added and
Baseline Achievement in Teacher's Classroom
0.120
0.118
0.041
0.083
0.114
0.103
Note: Estimated using non-experimental samples of 4th and 5th graders in years 2000-2003 for Los Angeles, 20002006 for New York City, and 2006-2007 for Boston. Teacher value-added and baseline achievement estimated
including controls for student-level controls for baseline test scores, race/ethnicity, special ed, ELL, and free lunch
status; classroom peer means of the student-level characteristics; and grade-by-year F.E.
Summary of Main Findings:
 All non-experimental specifications provided
information regarding experimental outcomes, but
those controlling for baseline score yielded unbiased
predictions with highest explanatory power.
 The experimental impacts in both math and english
language arts seem to fade out at annual rate of .4.6.
 Similar fade-out was observed non-experimentally.
 Depending on source, fade-out has important
implications for calculations of long-term benefits
of improvements in average teacher effects.
Next steps:
 Test for “complementaries” in teacher effects across
years. (e.g. What is the effect of having a high or
low-value added teacher in two consecutive years?)
(Current experiment won’t help, but STAR experiment
might.)
Empirical Methods:
2. Generating Empirical Bayes Estimates of NonExperimental Teacher Effects
.




2
2
2
 ˆ  
ˆ signal
ˆ 


  j 
VA j   j 
 j 2


1
2
 Var   
ˆ
ˆ



j 
signal
noise

 ˆ 2   h  

jt
 

 t
 

h jt 
1
Var  jt |  j 

1
 ˆ 2 
2
ˆ    n 
jt 

Table 1: Sample Comparison - Teachers
Experimental School
Non-experimental
Experimental Sample
Sample
Non-experimental
School
Non-experimental
Sample
Mean Teacher Effect in Math
S.D.
Mean Teacher Effect in ELA
S.D.
-0.009
0.195
-0.010
0.149
-0.003
0.196
0.003
0.148
0.005
0.196
0.003
0.147
Black, Non-Hispanic
Hispanic
White, Non-Hispanic
Other, Non-Hispanic
Teacher Race/Ethnicity Missing
0.166
0.258
0.466
0.110
0.000
0.138
0.311
0.447
0.102
0.003
0.123
0.325
0.425
0.123
0.003
Years of Experience
15.490
10.542
10.758
165
1,785
11,352
N:
Note: Descriptive statistics based on the experimental years (2003-04 and 2004-05). The mean teacher
effect in math and ELA were estimated using the full sample of schools and teachers, controlling for baseline
scores, student characteristics, and peer controls.
Table 2: Sample Comparison - Students
Experimental School
Non-experimental
Experimental Sample
Sample
Non-experimental
School
Non-experimental
Sample
Math Scores
2004 Mean
S.D.
2005 Mean
S.D.
2006 Mean
S.D.
2007 Mean
S.D.
0.027
0.931
-0.008
0.936
0.001
0.960
-0.016
0.956
-0.110
0.941
-0.113
0.940
-0.100
0.941
-0.092
0.941
0.024
1.008
0.028
1.007
0.037
1.006
0.030
1.006
ELA Scores
2004 Mean
S.D.
2005 Mean
S.D.
2006 Mean
S.D.
2007 Mean
S.D.
0.038
0.913
0.009
0.920
0.039
0.923
0.018
0.940
-0.113
0.936
-0.117
0.930
-0.096
0.928
-0.095
0.936
0.023
1.008
0.027
1.009
0.037
1.001
0.037
1.000
N:
3,554
43,766
273,525
Note: Descriptive statistics based on the experimental years (2003-04 and 2004-05). Students present
both years are counted only once.
Table 2: Sample Comparison - Students (cont.)
Experimental School
Non-experimental
Experimental Sample
Sample
Non-experimental
School
Non-experimental
Sample
Black, Non-Hispanic
Hispanic
White, Non-Hispanic
Other, Non-Hispanic
0.112
0.768
0.077
0.044
0.115
0.779
0.060
0.046
0.113
0.734
0.088
0.066
Grade 2
Grade 3
Grade 4
Grade 5
0.377
0.336
0.113
0.131
0.280
0.201
0.215
0.305
0.288
0.207
0.211
0.294
N:
3,554
43,766
273,525
Note: Descriptive statistics based on the experimental years (2003-04 and 2004-05). Students present
both years are counted only once.
Table 4. Baseline Student Characteristics Regressed on Non-Experimental Teacher Effects
Baseline Scores
Baseline Demographics & Program Participation
Specification Used for
Non-experimental Teacher Effect
Math
Score
Math Levels with Student/Peer Controls
-0.081
(0.230)
44
0.036
(0.268)
44
-0.014
(0.022)
78
0.089
(0.323)
44
0.296
(0.359)
44
0.023
(0.032)
78
N:
ELA Levels with Student/Peer Controls
N:
Language Gifted and Special
Score
Talented Education
English
Language
Status
Hispanic
Black
Free
Lunch
Level
1 to 3
-0.049
(0.033)
78
-0.053
(0.041)
78
0.008
(0.041)
78
0.031
(0.061)
78
-0.026
(0.071)
78
-0.066
(0.051)
78
-0.037
(0.097)
78
0.008
(0.066)
78
0.084
(0.085)
78
-0.097
(0.132)
78
Note: Each baseline characteristic listed in the columns was used as a dependent variable, regressing the within-pair difference in
mean baseline characteristic on different non-experimental estimates of teacher effects. The coefficients were estimated in separate
bivariate regressions with no constant. Robust standard errors are reported in parentheses. Baseline math and language arts scores
were missing for the pairs that were in second grade.
Table 5: Attrition and Teacher Switching
Specification Used for
Non-experimental Teacher Effect
Math Levels with Student/Peer Controls
N:
ELA Levels with Student/Peer Controls
N:
Missing Test Score
First Year
Second Year
Third Year
Switched Teacher
-0.004
(0.049)
78
0.029
(0.057)
78
-0.018
(0.058)
78
-0.028
(0.133)
78
-0.035
(0.077)
78
0.006
(0.084)
78
0.030
(0.097)
78
-0.148
(0.171)
78
Note: Each baseline characteristic listed in the columns was used as a dependent variable, regressing the within-pair difference in
mean baseline characteristic on different non-experimental estimates of teacher effects. The coefficients were estimated in
separate bivariate regressions with no constant. Robust standard errors are reported in parentheses.
Table 7: Predicting Experimental Performance in Math and ELA (Student-level Regressions)
Specification Used for
Non-experimental Teacher Effect
Math Levels with Student/Peer Controls
N:
ELA Levels with Student/Peer Controls
N:
Student-Level Controls
Second Year Teacher F.E.
Second x Third Year Teacher F.E.
First Year Score
Second Year Score
Third Year Score
0.845***
(0.181)
2,905
0.423*
(0.178)
2,685
0.421*
(0.185)
2,305
0.08
(0.145)
2,504
0.076
(0.290)
1,892
1.073***
(0.271)
2,903
0.605*
(0.275)
2,691
0.718*
(0.280)
2,312
0.589*
(0.249)
2,503
0.626
(0.376)
1,891
No
No
Yes
No
No
No
Yes
Note: The above were estimated with student-level regressions using fixed effects for each experimental teacher pair. Robust
standard errors (in parentheses) allow for clustering at the teacher-pair level. The sample for specifications including teacher
fixed effects are limited to students in grades 3-5 as teacher identifiers for secondary grades are not yet available.
Table 8. Estimating Fade-Out in the Non-Experimental Sample (Student-level Regressions)
Specification Used for
Non-experimental Teacher Effect
Math Levels with Student/Peer Controls
N:
ELA Levels with Student/Peer Controls
N:
Student-Level Controls
Second Year Teacher F.E.
Second x Third Year Teacher F.E.
2004-05
2005-06
2006-07
1.096***
(0.016)
114,767
0.952***
(0.010)
108,505
0.246***
(0.011)
97,908
0.144***
(0.016)
67,079
0.115***
(0.012)
88,993
0.008
(0.026)
32,429
0.869***
(0.022)
114,963
0.745***
(0.012)
108,656
0.223***
(0.013)
98,009
0.135***
(0.020)
67,140
0.140***
(0.015)
89,028
0.067*
(0.032)
32,442
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Note: The 2004-05 teacher effect is estimated using data from 1999-2000 through 2002-03 excluding schools who participated in the experiment.
Above we report the coefficients on that estimated 2004-05 teacher effect in predicting a student's 2004-05, 2005-06, and 2006-07 scores
respectively. The sample for specifications including teacher fixed effects are limited to students in grades 3-5 as teacher identifiers for secondary
grades are not yet available.
Table 9. Baseline Student Characteristics Regressed on Non-Experimental Teacher Effects
Baseline Demographics & Program Participation
Specification Used for Non-experimental
Teacher Effect & Sample
Baseline Gifted and Special
Talented Education Hispanic
Score
Sample
Math Levels (incl. prior scores, student/peer controls)
Experimental Sample
-0.109
(0.242)
1,840
Sample
Black
Free
Lunch
Level
1 to 3
-0.016
(0.031)
3,038
-0.047
(0.035)
3,038
-0.045
(0.033)
3,038
0.006
(0.039)
3,038
0.033
(0.049)
3,038
-0.014
(0.075)
3,038
0.028+
(0.015)
34,196
-0.032**
(0.013)
34,196
0.016
(0.032)
34,196
0.000
(0.018)
34,196
-0.006
(0.010)
34,196
-0.018
(0.045)
34,196
-0.011*
(0.005)
359,368
-0.005
(0.004)
359,368
-0.006
(0.014)
359,368
Non-experimental Teachers in
Experimental Schools
0.130+
(0.071)
24,864
Non-experimental Teachers in Nonexperimental Schools
0.241*** 0.038*** -0.021*** 0.001
(0.033)
(0.007)
(0.005)
(0.007)
258,533 359,368 359,368 359,368
ELA Levels (incl. prior scores, student/peer controls)
Experimental Sample
0.398
(0.378)
1,843
English
Language
Status
0.036
(0.036)
3,038
-0.075
(0.054)
3,038
-0.011
(0.090)
3,038
0.000
(0.073)
3,038
0.075
(0.067)
3,038
-0.100
(0.138)
3,038
0.042*
(0.021)
34,196
-0.029+
(0.017)
34,196
Non-experimental Teachers in
Experimental Schools
0.190
(0.127)
24,914
0.003
(0.045)
34,196
-0.009
(0.028)
34,196
-0.014
(0.012)
34,196
-0.051
(0.055)
34,196
Non-experimental Teachers in Nonexperimental Schools
0.233*** 0.042*** -0.021*** 0.000
(0.044)
(0.009)
(0.006)
(0.010)
258,885 359,368 359,368 359,368
-0.004
(0.007)
359,368
-0.003
(0.005)
359,368
-0.032
(0.020)
359,368
Note: Each baseline characteristic listed in the columns was used as a dependent variable, regressing the baseline
characteristic on non-experimental estimate of a student's 2003-04 or 2004-05 teacher effect. The coefficients were
estimated in separate regressions. Robust standard errors are reported in parentheses. All specifications include school by
grade and grade by year fixed effects.
Why would current gains be related to prior
teacher assignments?
 We find teacher effect fading out
 Let VAt = value added of teacher in year t
ak = % left after k years
 Then At = VAt + a1VAt-1 + a2VAt-2 + …
Implies gains include % of prior teacher effect
 (At – At-1) = VAt + (a1 – 1)VAt-1 + (a2 – a1)VAt-2 + …
 Our estimate of a1≈0.5 implies
 Variance of prior teacher effect would be roughly 25% of the
variance of current teacher effect.
 Prior teacher effect would enter with negative sign.
 Does fade-out mean the non-structural approach would be
biased? Do we need to estimate full human capital
production function?

Depends partially on correlation among VAjt,VAjt-1VAat-1…
Why would current gains be related to future
teacher assignments?
 Students are assigned to future teachers based on current performance.


e.g., tracking, student sorting
This is why the unadjusted mean end of year score was a biased
measure of teacher effects. (If differences in baseline scores were
just random noise, mean student scores from the non-experimental
period would have been a noisy but unbiased estimator).
 In value-added regression, this generates relationship between future
teacher assignment (in t+1) and current end-of-year score (in t) (that is,
future teacher assignments are endogenous to current year gains).
 We would expect future teacher assignments to be related to current
gains, as Rothstein (2007) reports.
Table 10. Effects of Current and Prior Year Teachers Using Non-Experimental Sample
(Student-level Regressions)
Specification Used for Non-experimental Teacher Effect
Math Levels with Student/Peer Controls - 2004-05 Teacher
Student-level
Controls
0.843***
(0.033)
Math Levels with Student/Peer Controls - 2003-04 Teacher
2004-05
Student-level
Controls
Student-level
Controls
0.852***
(0.033)
0.849***
(0.033)
-0.488***
(0.020)
-0.488***
(0.020)
Math Levels with No Controls - 2005-06 Teacher
Using baseline score as dependent variable
N:
0.068**
(0.025)
40,672
40,672
40,672
Note: The 2003-04, 2004-05 and 2004-05 teacher effect regressors were estimated using data from 1999-2000
through 2002-03 excluding schools who participated in the experiment. Above we report the coefficients on the
estimated 2004-05 teacher effect in predicting a student's 2004-05 scores. All specifications include school by
grade and grade by year fixed effects, and student-level controls.
What is the variance in teacher effects on
student achievement?
Non-Experimental Studies:


Armour (1971), Hanushek (1976), McCaffrey et. al. (2004), Murnane and
Phillips (1981), Rockoff (2004), Hanushek, Rivkin and Kain (2005), Jacob
and Lefgren (2005), Aaronson, Barrow and Sander (2007), Kane, Rockoff
and Staiger (2006), Gordon, Kane and Staiger (2006)
Standard Deviation in teacher-effect estimated .10 to .25 student-level
standard deviations.
Experimental Study (TN Class-Size Experiment):




Nye, Konstantopoulous and Hedges (2004)
Teachers and students were randomly assigned to classes of various sizes,
grades K through 3.
Looked at teacher effect, net of class size category effects and school effects.
Standard Deviation in teacher-effect estimated .08 to .11 student-level
standard deviations. Even higher (.10 to .18) in low SES schools.
Table 3: Non-experimental Specifications of Teacher Effects (cont.)
Standard Deviation of Each
Component (in Student-level
Standard Deviation Units)
Specification Used for Non-experimental Teacher Effect
Teacher by
Year Random
Teacher Effects
Effect
Mean Sample
Size per
Teacher
English Language Arts Levels with...
No Controls
Student/Peer Controls (incl. prior scores)
Student/Peer Controls (incl. prior scores) & School F.E.
Student Fixed Effects
0.458
0.182
0.173
0.082
0.220
0.169
0.168
0.041
48.391
42.730
42.730
48.391
English Language Arts Gains with...
No Controls
Student/Peer Controls
Student/Peer Controls & School F.E.
0.186
0.177
0.170
0.205
0.202
0.202
44.366
44.366
44.366
Note: The above estimates are based on the total variance in estimated teacher fixed effects using observations
from the pre-experimental data (years 1999-2000 through 2002-03). See the text for discussion of the estimation of
the decomposition into teacher by year random effects, student-level error, and "actual" teacher effects. The
sample was limited to schools with teachers in the experimental sample. Any individual students who were in the
experiment were dropped from the pre-experimental estimation, to avoid any spurious relationship due to
regression to the mean, etc.
Interpretation of Coefficient on
Lagged Student Performance
Aijt  X ijt   ijt  Aij (t 1) o   xkjt  k  ijt
k
 We estimate several non-experimental specifications
Βo =0 (no controls), Βo =1 (“gains”), Βo <1 (“quasi-gains”)
and ask:
 Which yields unbiased estimates of teacher effects (μj)?
 Which minimizes the mean squared error in predicting student
outcomes?.
 We place no structural interpretation on Βo .
 Βo presumably contains a number of different roles– (i) systematic
selection of students to teachers, (ii) fade-out of prior educational
inputs, (iii) measurement error.


These separate roles are difficult to identify.
The various biases introduced may or may not be offsetting.