Transcript Weng
A Resampling Study of NASS Survey
MPPS Sampling Strategy
By Stanley Weng
National Agricultural Statistics Service
U.S. Department of Agriculture
1
INTRODUCTION
MPPS
• Multivariate Probability Proportional to
Size
• Address multiple, and often competing,
purposes (multi targets) of a survey
• Used for NASS Crops Survey (CS) etc.,
since 1999
2
MPPS
• Technically
Sample was selected using a Poisson
method. Each farm i had a unique
probability of selection, formed by
pi min{1, max{ p
( m)
i
, m 1,..., M}}
3
MPPS
( m)
i
where p
is the item m selection
probability, determined by
▪ auxiliary data with the assumption of the
variance proportional to (a power of) the
auxiliary variable value
▪ optimal allocation
▪ a desired item-level sample size
4
MPPS
• Development and application of the
MPPS strategy at NASS:
Amrhein, Hicks and Kott (1996)
Amrhein and Bailey (1998)
Bailey and Kott (1997)
Hicks, Amrhein and Kott (1996)
Kott, Amrhein and Hicks (1998).
5
A COMPARISON STUDY
• This study was designed to compare
MPPS with the previously used SRS
((Stratified) Simple Random
Sampling) strategy
6
THIS STUDY
• Explored the resampling approach to
reveal the statistical characteristics/
behavior of NASS Ag survey data
• Raised issues for further investigation
to improve our understanding and
practice of NASS Ag survey sampling
/estimation
7
RESAMPLING
Population bootstrap
• Base sample S
●
June Crop Survey MPPS samples
• Pseudo population U *
Composed of replicates of base sample
elements, according to the (integerized)
weight of the element
8
RESAMPLING
• Resamples S , r 1,..., R
*
Independent samples, drawn from U
by Poisson and SRS sampling
strategies respectively
*
r
9
RESAMPLING
*
● Resample totals t r
*
t RS
r 1,2,..., R , and
1 R *
tr
R r 1
10
RESAMPLING
• Resampling variance estimate for the
sample total estimate
*
VRS
1 R * * 2
(t r t RS )
R 1 r 1
Bootstrap statistic
11
DATA
,
• The crop component of the 2004 and
2005 June QAS, for all 42
participating states
• Certainty elements were eliminated
from sample, to avoid unnecessary
complication
12
RESAMPLING VAR ESTIMATES
Based on 1000 resamples
• Naive Comparison
Log-Log Plot
●
●
▪Resampling variance est vs sample
total across crops – for each state
▪Overlay: Poisson (*) vs SRS (^)
13
Naive Comparison
• General linear trend
(Assumption: the variance
proportional to a power of the total)
• For majority of crops, SRS variance
appeared greater than Poisson
variance (but often not appreciably)
14
Log-Log Plot of Resampling Variance Est vs Total Across Crops: CA
Overlay: Poisson (*) vs SRS (^)
pot srg
saf sun dwh
bar ctp
oat ohy ctu wwh ric crn
alf
22 ˆ
‚
‚
^
‚
‚
‚
^
*
‚
^
20 ˆ
*
‚
^
^
‚
*
*
log_var_psn ‚
*
*
^
‚
^
*
‚
‚
*
18 ˆ
‚
^
*
‚
^
‚
* ^
‚
^
*
‚
‚
*
^
16 ˆ
*
‚
^
‚
‚
*
‚
‚
‚
14 ˆ
Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒ
9.0
9.5
10.0
10.5
11.0
11.5
12.0
12.5
13.0
13.5
log_tot
NOTE: 1 obs hidden.
15
Validness of the Comparison
• Need additional information to justify
• The quality of the resampling variance
estimate depends on the statistical quality
of the resample totals, which also provides
evidence for the appropriateness of the
sampling strategy
• Among various aspects, the most
important: NORMALITY
16
Normality
Q – Q plot of resample totals
• Demonstration: CA
●
▪ Most crops: Good shape of Q-Q Plot
(Corn, Potatoes)
▪ Exception: Other Hay
Evidence that Poisson was better than
SRS
17
18
19
20
21
22
23
Outliers on the log-log plot
• Located far apart from the general trend
• The two sampling strategies gave
appreciably different estimates
• Demonstration:
▪ CA: Other Hay
▪ MT: Potatoes
Evidence that SRS was better
24
Log-Log Plot of Resampling Variance Est vs Total Across Crops: MT
Overlay: Poisson (*) vs SRS (^)
mus sun
can pot saf
fla crn oat
ohy dwh
bar
alf wwh swh
log_var_psn ‚
‚
25.0 ˆ
‚
‚
‚
*
‚
22.5 ˆ
* ^
‚
*
‚
*
‚
^
‚
*
20.0 ˆ
*
‚
‚
*
*
‚
^
‚
^
^
*
17.5 ˆ
^
‚
^
*
‚
^
*
‚
‚
*
15.0 ˆ
‚
‚
^
*
‚
‚
12.5 ˆ
*
‚
Šƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒ ƒƒƒˆƒ
7
8
9
10
11
12
13
14
15
log_tot
NOTE: 5 obs hidden.
25
26
27
FINITE SAMPLE RESAMPLING
Complexities
- Due
to the special features of survey
sampling
● Nonindependence arising in sampling
without replacement
● Other complexities of finite population
structure by designs and estimators
28
FINITE SAMPLE RESAMPLING
Effects of discreteness
(Davison & Hinkley, 1997, 2.3.2)
▪ Discrete empirical distribution
and in particular,
▪ In finite population sampling, the
pseudo population formed by replicates of
sample elements
29
FINITE SAMPLE RESAMPLING
Issues with this study
• Comparable sample size
- Addressed by size adjustment
• Impact of the base sample
- Not clear
30
Impact of Base Sample
For finite population resampling, the general
guideline
▪ The resampling population mimics the
original population, and
▪ The resamples, mimic the base sample,
*
drawn from U by a design identical to the
one by which the base sample was
originally drawn
(Sarndal, et al., 1992, Ch. 11)
31
AT ISSUE
●
How the resampling technique should
be correctly modified to
accommodate the finite sampling
situation?
32
AT ISSUE
In literature, most reported finite
sample resampling studies used
(stratified) SRS, which bears the most
similarity to the infinite population
independent random sampling - the
standard setting that the resampling
technique is based on
●
33
SUMMARY
• An Approach
Resampling & analysis of resamples,
using statistical graphical and diagnostic
techniques, to reveal statistical
characteristics / behavior of NASS Ag
survey data
34
SUMMARY
●
Sampling strategy comparison
▪ Poisson seemed to be preferable to
stratified simple random sampling
▪ A national comparison table of the two
strategies across crops and states is to be
produced for a comprehensive picture with
likely causal factors identified
35
FURTHER INVESTIGATION
To develop statistical understanding,
the resampling setting of this study and
other statistical information techniques
will be further explored
36
FURTHER INVESTIGATION
▪ Behavior of Studentized bootstrap
statistics
▪ Statistical function
(Booth, Butler, and Hall, 1994;
Davison & Hinkley, 1997)
▪ Examine different survey data
37
THANK YOU
38
ALF
BAR
CAN
CRN
CTP
CTU
DEB
DWH
FLA
MUS
OAT
OHY
PNT
POT
RIC
RYE
SAF
SGB
SOY
SPT
SRG
SUG
SUN
SWH
WWH
Alfalfa All Harvested Acres
Barley All Planted Acres
Canola All Planted Acres
Corn Planted Acres
Pima Cotton Planted Acres
Upland Cotton Planted Acres
Dry Beans Planted Acres
Durum Wheat Planted Acres
Flaxseed Planted Acres
Mustard All Planted Acres
Oats All Planted Acres
Other Hay Harvested Acres
Peanuts All Planted Acres
Potatoes All Planted Acres
Rice All Grain Planted Acres
Rye All Planted Acres
Safflower All Planted Acres
Sugarcane All Planted Acres*
Soybeans All Planted Acres
Sweet potatoes Planted Acres
Sorghum All Planted Acres
Sugarcane For Sugar Harvested Acres
Sunflowers All Planted Acres
Spring Wheat Irr Planted Acres
Winter Wheat All Planted Acres
39