keeping the same sample

Download Report

Transcript keeping the same sample

EVAULATION OF THE NSCRG
SCHOOL SAMPLE
Donsig Jang and Xiaojing Lin
Third International Conference on
Establishment Surveys
Montreal, Canada, June 21, 2007
Outline






Sampling options on repeated establishment
surveys
Reasons to keep the same sample in
establishment surveys
Issues in keeping the same sample
Example: NSRCG school sample
Summary
Recommendation for 2008 NSRCG School
Sample
Sampling options on repeated
establishment surveys

Keep the same sample over time with
supplemental samples for births
– Efficient change estimates BUT
– Response burden
– Inefficient “cross-sectional” estimates


An independent sample in each survey round
Sample coordination to maximize overlaps
between samples
– Rotation samples (Sigman and Monsour 1995)
– Permanent random number technique (Ohlsson 1995, 2001)
– Keyfitz procedure (Keyfitz 1951)
Reasons to keep the same sample in
establishment surveys



Difficulty in identifying point of contact
Costly efforts in gaining participation
Often requires nontrivial process to gather
information – previous survey participation
would help
Issues in keeping the same sample


Can they be a representative sample of the current
cross-sectional population?
– Depending on how dynamic the population is over
time
 coverage issues: births vs. deaths
 sample efficiency: distributional changes
Alternatives
– Independent sample from the most up-to-date sample
frame
– Coordination of samples
 E.g., Keyfitz procedure to maximize the sample
overlap between the current and the previous ones
National Survey of Recent College
Graduates (NSRCG)



Repeated every two or three years
Collects education, demographic, and
employment information from recent college
graduates (bachelor’s and master’s)
majoring in science,engineering, and health
fields
Two stage sample design
– 1st stage: select schools and obtain the list of
graduates from selected schools
– 2nd stage: select graduates from the list provided by
schools

NSF-sponsored survey
NSRCG
List collection from schools


Identify point of contact (usually institutional
coordinator)
Gather the list of graduates with key sampling and
locating information including:
– degree award dates
– degree level
– field of major
– race/ethnicity
– gender
– date of birth
– SSN
– student ID
– mailing addresses including parent’s addresses
– phone numbers (land line, cell)
– emails, etc.
NSRCG
List collection from schools (continued)

Need a good understanding on the
information requested and file format

Time consuming and costly efforts
– different schools have different issues
A crucial part for the quality of the survey
– strive to get almost perfect cooperation rate
(99%)
– Out of 300 schools,
 only four final refusals in 2003
 only five refusals in 2006

NSRCG
School sample selection


For 1995, 1997, 1999, 2001 surveys
– 275 schools initially selected in 1995 and kept with 5
supplemental samples added over three survey
rounds (to account for frame coverage)
A new sample of 300 schools selected in 2003:
– To reflect rapid changes of S&E populations in 1990’s
– Health field added to the survey as eligible field of
study
NSRCG
School sample selection (continued)


Probability proportional size (PPS) with composite
size measure
Composite size measures calculated to achieve
equal weights within each of NSRCG analytic
domains constructed by a combination of:
– degree year, degree level, field of majors, race/ethnicity, and
gender

Population dynamics
– new schools (birth), closed (death), no S&E graduates
(temporarily ineligible), etc
 Coverage issue
– distributions of schools changed (in terms of composite size
measures)
 potential factor affecting the sample efficiency
2003 NSRCG school sample
In both 2001 and
2003 NSRCG
170 (57%)
Only in 2003
NSRCG
130 (43%)
Total
300
Excessive efforts (time and resources) to
achieve 99% of RR (4 schools refused)
Distribution of list submission dates
in 2003 NSRCG
Both in 01 and 03
Only in 2003
0.015
0.010
0.005
0.000
0
30
60
90
Days
120
150
180
210
240
School sample after 2003 NSRCG –
2006 NSRCG
Frame evaluation
In 2003 frame but not in
2006 frame
In both 2003 and 2006
frames
Not in 2003 frame but in
2006 frame
Graduate count
Frame
School
Count
Sample
School
Count
AY2001
AY2002
AY2003
AY2004
AY2005
48
0
3,077
1,092
0
0
23
1,762
300
624,297
639,411
671,868
702,021
722,727
190
0
0
570
4,369
7,396
6,819
2003 Frame based on AY2001 IPEDS counts
2006 Frame based on AY2003 and AY2004 IPEDS counts
Graduate counts dropped from and
added to the population
Domain
Bachelor
Degree Level
Master
Non-Hispanic White
Race/Ethnicity Asian,Pacific Islander,Nonresident
Hispanic,Black,American Indian
Male
Gender
Female
Eligible in 2003 but
Newly Eligible for 2006
Ineligible in 2006
NSRCG
Count Percentage Count Percentage
3476
0.35
5252
0.49
5380
1.97
1775
0.60
5461
0.66
4109
0.48
2643
1.04
1610
0.54
752
0.40
1308
0.63
3949
0.71
3967
0.65
4907
0.69
3060
0.41
Graduate counts dropped from and
added to the population
Field of Major
Chemistry
Physics/Astronomy
Other Physical Sciences
Mathematics/Statistics
Computer Sciences
Environmental, Geological
and Agricultural Sciences
Aerospace Engineering
Chemical Engineering
Civil Engineering
Electrical Engineering
Industrial Engineering
Mechanical Engineering
Other Engineering
Biological Sciences
Psychology
Economics
Sociology/Anthropology
Other Social Sciences
Political Science
Health-Related - Nursing
Health-Related – all else
Eligible in 2003 but
Ineligible in 2006
Count
Percentage
15
0.06
0
0.00
30
0.23
36
0.10
687
0.56
585
0
0
1
242
0
0
266
1,165
182
39
57
321
145
163
4,922
1.81
0.00
0.00
0.00
0.43
0.00
0.00
0.87
0.80
0.09
0.08
0.07
0.53
0.17
0.16
3.62
Newly Eligible for
2006 NSRCG
Count
Percentage
109
0.46
0
0.00
44
0.34
120
0.30
2,329
1.53
69
33
0
86
723
0
112
171
273
1,079
129
76
398
251
741
284
0.27
0.55
0.00
0.34
1.06
0.00
0.31
0.51
0.18
0.52
0.21
0.08
0.59
0.24
0.71
0.23
2006 NSRCG School Sample

No significant change of the population
– Kept the same school sample without any
supplemental sample
Distribution of list submission dates
in 2006 NSRCG
Both in 2001 and 2006
Only in 2006
0.020
0.015
0.010
0.005
0.000
20
50
80
110
Days
140
170
200
230
2008 NSRCG ?

Evaluate the current sampling strategy
(keeping the same sample) by doing
– frame evaluation
– comparisons with other sampling schemes
 Independent PPS
 Keyfitz procedure
2008 NSRCG
Frame evaluation
In 2003 frame but not in
2008 frame
In both 2003 and 2008
frames
Not in 2003 frame but in
2008 frame
Graduate count
Frame
School
Count
Sample
School
Count
AY2001
AY2002
AY2006
78
2
4,643
2,584
0
1,732
298
622,731
637,919
744,070
294
0
0
494
11,755
2003 Frame based on AY2001 IPEDS counts
2008 Frame based on AY2006 IPEDS counts
Graduate counts dropped from and
added to the population
Sample Evaluation

Three sample selection methods considered
– Keep the 2003 school sample with a
supplemental sample of size 4
– Independent PPS with composite size
measures based on updated frame
information
– Keyfitz procedure
PPS sample selection procedure
Define Size Measure:
Si   d
md
M id
Md
where md is a sample size of domain d,
Md is the population size of domain d
Mid is the population size of domain d in school i
domain d is constructed from a combination of:
graduate year, degree level, field of major,
race/ethnicity, and gender
PPS sample selection procedure

School i selected with probability (pi) proportional to size Si

Achieve equal weight within each domain d

Distributional changes of the NSRCG graduate populations
would cause unequal weight variations within domains

Independent PPS with up-to-date frame data is desirable if
weight variation is severe
Keyfitz procedure



Maximize the overlap between two samples
The first sample (2003 NSRCG) was selected
with PPS
The second sample inclusion probability is
dependent upon:
– updated size measures
– the first sample inclusion probability
– the actual sample realization in the first
sample
Simulation of sampling procedures

Generate 1000 school “independent”
samples for each of the following options
– Keep the same school sample with a
supplemental sample of size 4 from the
newly eligible schools (“births”)
– Independent PPS sampling using MOS
calculated from 2008 NSRCG frame
– Keyfitz procedure
Summary





Keeping the same sample is a cost effective option
Concern about statistical inefficiency due to the
nature of dynamic population
Frame coverage corrected by supplemental sample
Evaluate the NSRCG school sample
– Empirical frame evaluation
– Samples simulated based on two methods
Distribution changes (in terms of composite size
measure) would make the final sample inefficient:
– Weight variation within planned domains
– Over or under estimation of graduates in some
domains
Recommendation

Keep the same school sample with
supplemental sample of size 4 for 2008
NSRCG