Revision of Basic Statistical Concepts

Download Report

Transcript Revision of Basic Statistical Concepts

Sampling Design
 M. Burgman & J. Carey 2002
Types of Samples
• Point samples
(including neighbour distance samples)
• Transects
line intercept sampling
line intersect sampling
belt transects
• Plots
circular, square, rectangular plots
quadrats
nested quadrats
• Permanent or temporary sites
Arrangement of Samples
•
•
•
•
Subjective (Haphazard, Judgement)
Systematic Sampling
Search Sampling
Probability Sampling
– Random:
– Multistage
– Cluster
– Multiphase:
Simple
Stratified (restricted)
Double
• Variable Probability Sampling
PPS/PPP
Systematic Sampling
Samples are selected systematically according
to a pre-determined plan.
e.g. grid samples
•
•
•
•
•
evaluation of spatial patterns
simplicity of site location (cost)
guaranteed coverage of an area
representation of management units
facilitation of mapping
Systematic Sampling
• If the ordering of units in a population is
random, any predesignated positions will be a
simple random sample.
• Bias may be introduced if there is a spatial
pattern in the population.
• Formulae for random samples may not be
applicable.
Assumptions of Systematic
Sampling
Assumptions
• no spatial or temporal trends in the variable
• no natural strata
• no correlations among individual samples
Given these assumptions, a systematic sample
will, on average, estimate the true mean with
the same precision as a simple random
sample or a stratified random sample of the
same size.
Simple Random Sampling
• sample mean
(unbiased estimate of )
x=
1
n
n
 xi
i=1
• sample variance
(unbiased estimate of 2)
1
n
s2 = n-1  (xi - x)2
i=1
Stratified Random Sampling
A population is classified into a number
of strata. Each stratum is sampled
independently.
Simple random sampling is
employed within strata.
• fewer samples are required to
obtain a given level of precision
• independent sampling of strata is useful for
management, administration, and mapping.
Stratified Random Sampling
mean
m
xall =  pi xi
i=1
where m = number of strata, and
pi = proportion of the total made up by the ith
stratum.
e.g. pi = Ai / A
Stratified Random Sampling
standard error of overall mean
sx =
all

m
 pi2
i=1
s2
=
ni

 Ai2 sx 2
A2
i
where Ai is the area of a stratum,
A is the total area,
sx is the standard error of the mean within the
ith stratum, and
ni is the number of sampling units in the ith
stratum.
i
Stratified Random Sampling
confidence limits for the mean
CLmean = xall ± sx t[, n-1]
all
confidence limits for the whole population
CLpop = A (xall ± sx t[, n-1])
all
where A = total number of units over all strata
(e.g. total area in m2, when xall has been calculated per m2)
Allocation of Samples
proportional to area:
Ai
ni = pi N = A N
where pi = proportion of total area in stratum i,
N = total number of samples, and
ni = number of samples allocated to stratum i.
to minimize variance:
[
Ai si.
ni =
 Ai si
]
N
where si = standard deviation
in stratum i
Random Sampling within Blocks
Combination of systematic and
random sampling.
Gives coverage of an area,
together with some protection
from bias.
Cluster Sampling
• Clusters of individuals are chosen at random,
and all units within the chosen clusters are
measured.
• Useful when population units
cluster together, either naturally,
or because of sampling
methods.
Cluster Sampling
Examples: schools of fish
clumps of plants
leaves on eucalypt trees
pollen grains in soil core samples
vertebrates in quadrat samples
• Two-stage cluster sampling:
clusters are selected, and a sample is taken
from each cluster (i.e. each cluster is
subsampled)
Multistage Sampling
The division of a population
into primary sampling units,
only some of which are
sampled. Each of those
selected is further subdivided
into secondary sampling
units, providing a hierarchical
subdivision of sampling units.
Motivations include access,
stratification, and efficiency.
Procedure for Multistage Sampling
• A study area (or a population) is partitioned into N
large units (termed first-stage or primary units)
• A first-stage sample of n of these is selected
randomly.
• Each first-stage unit is subdivided into M secondstage units.
• A second-stage sample of m of these is selected
randomly.
• The m elements of the second-stage sample are
concentrated within n first-stage samples.
Multistage Sampling Statistics
When the primary units are of equal size, the
population mean of a multi-stage sample is given
by the arithmetic mean of the nm measurements
xij:
1
n m
1 n
x =
where
nm
xi =
  xij =
i=1 j=1
1 m
m
 xij
j=1
n
 xi
i=1
is the mean of the m
selected subunits in the ith
primary unit
Multistage Sampling
To estimate the total amount I of the measured
variable (e.g. the total amount of a pollutant),
I=NMx
and
sI2 = (N M)2sx2
Multistage Sampling
When the primary units are of unequal size, the
population mean of a multi-stage sample is
given by
n
 Mi xi
x=
i=1
n
 Mi
i=1
1 mi
where xi = m  xij
j=1
Multistage Sampling
The total amount of the variable is given by
Nn
I = n  Mi xi
i=1
Gilbert (1987 - Statistical Methods for Environmental Pollution
Monitoring) provides formulae for allocating samples among sampling
units, for estimating variances, and for including costs in the sample
allocation protocol.
Sampling Methods revisited
simple random sampling
stratified random sampling
stratum
two-stage sampling
1° unit
cluster sampling
cluster
systematic sampling
random sampling
within segments
2° units
Double Sampling
(multiphase sampling)
When two or more techniques are available to
measure a variable, double sampling may
improve the efficiency of the measurement
protocol.
• Use the easiest (and least accurate) method to
measure all samples (n' samples).
• Use the more accurate technique to measure a
relatively small proportion of samples (n samples,
where n  n').
• Correct the relatively inaccurate measurements,
using the relationship between the measurements
made with both techniques.
Double Sampling
Examples
• GIS interpretation
• Chemical assays
• Wildlife surveys
• Inventories
• Monitoring plots
Double Sampling
Double sampling will be more efficient than simple
random sampling if
• the underlying relationship between the methods is
linear
• optimum values of n and n' are used (Gilbert, 1987)
• CA (1 + 1 - 2)2
CI >
2
where CA is the cost of an accurate measurement,
CI is the cost of an inaccurate measurement, and
 is the correlation coefficient between the methods.
Example of Double Sampling
(Gilbert 1987)
(nCi/m2)
30000
239,240Pu
Contaminated
soil at a
nuclear
weapons test
facility in
Nevada
20000
y = 22112 + 18.06 (x - 1051.8)
 = 0.998
10000
1000
2000
241Am (nCi/m2)