Transcript document

DQO Training Course
Day 1
Module 4
Key Concepts Underlying
DQOs and VSP
Presenter: Sebastian Tindall
(60 minutes)
(75 minute lunch break)
1 of 37
Key Points
Have fun while learning key statistical
concepts using hands-on illustrations
 This module prepares the way for a more indepth look at the DQO Process and the use of
VSP

2 of 37
The
Big
Picture
Schedule
Health
Risk
Sampling
Cost
Remediation
Cost
Decision
Error
Compliance
Waste
Disposal
Cost
3 of 37
Our Focus
Unnecessary
Disposal
and/or
Cleanup
Cost
Sampling
Cost
$
$
4 of 37
Balance in Sampling Design
The statistician’s aim in designing surveys and
experiments is to meet a desired degree of
reliability at the lowest possible cost under the
existing budgetary, administrative, and physical
limitations within which the work must be
conducted. In other words, the aim is efficiency-the most information (smallest error) for the
money.
Some Theory of Sampling,
Deming, W.E., 1950
5 of 37
Our Methodology:
Use Hands-On Illustrations of...


Basic statistical concepts needed for VSP
and the DQO Process
Using...
Visual
Sample
Plan
6 of 37
Our Methodology:
Use Hands-On Illustrations of...


Basic statistical concepts needed for VSP
and the DQO Process
Using Coin flips
– Pennies
 Demo #1
 Demo #2
– Quarter
7 of 37
How Many Samples
Should We Take?
5?
50?
8 of 37
How Many Times Should I Flip a
Coin Before I Decide it is
Contaminated (Biased Tails)?
One tail, 50%
Two tails, 25%
Three tails, 12.5%
Four tails, 6%
Five tails, 3%
Six tails, 1.6%
Seven tails, 0.8%
Eight tails, 0.4%
Nine tails, 0.2%
Ten tails, 0.1%
9 of 37
Football Field
One-Acre
Football Field
30'0"
10 of 37
Example Problem
A 1-acre field was contaminated with mill
tailings in the 1960s
 Cleanup standard:
– “The mean 226Ra concentration in the upper
6” of soil must be less than 6.0 pCi/g.”
 There is a good chance that actual mean 226Ra
concentration is between 4.0 and 6.0 pCi/g

11 of 37
Example Problem (cont.)
Historical data suggest a standard deviation
of 1.6 pCi/g
 It costs $1000 to collect, process, and
analyze one sample
 The maximum sampling budget is $5,000

12 of 37
Graph of Perfect Decision Making
1.0
Ideal Rule
Chance
0.5
of
Deciding
Site is
Dirty
0.0
6 pCi/g
Action Level
Low
True Mean 226Ra Concentration
High
13 of 37
Graph of Typical Decision Making
1.0
Typical
Curve
Chance
0.5
of
Deciding
Site is
Dirty
0.0
6 pCi/g
Action Level
Low
True Mean 226Ra Concentration
High
14 of 37
Marbles
Color
Ra-226, pCi/g
Clear
White
Green
Red
Dark Yellow
Blue
Black
3
4
5
6
7
8
9
15 of 37
Simplified Decision Process
Take some number of samples
 Find the average 226Ra concentration in our
samples
 If we pass the appropriate QA/G-9 test, decide
the site is clean
 If we fail the appropriate QA/G-9 test, decide
the site is dirty

16 of 37
Example of Ad Hoc Sampling
Design and the Results
Suppose we choose to take 5 samples for
various reasons: low cost, tradition,
convenience, etc.
 Need volunteer to do the sampling
 Need volunteer to record results
 We will follow QA/G-9 One-Sample t-Test
directions using an Excel spreadsheet

17 of 37
One-Sample t-Test Equation
from EPA’s Practical Methods
for Data Analysis, QA/G-9
Calculated t = (sample mean - AL)
-----------------------std. dev/sqrt(n)
If calculated t is less than table value,
decide site is clean
18 of 37
Comparing UCL to Action Level is Like Student’s t-Test
UCL = 4
X
4 - 6 = -2
UCL = 5
X
5 - 6 = -1
UCL = 7
X
7-6=1
UCL = 8
X
2
3
4
5
6
7
8-6=2
8
Action Level
True Mean 226Ra Concentration
19 of 37
20 of 37
Key Concepts Defined
Latin Letters
N
n
x
s
H0
Concepts
population size
(population unit)
number of samples
sample mean is a
statistic
sample standard
deviation is a statistic
null hypothesis
(action level)
Greek Letters

Concepts

population mean is a
statistical parameter
population standard
deviation is a statistical
parameter
alpha error rate


beta error rate
width of the gray region

21 of 37
Learn the Jargon
• t-test
• UCL - upper
confidence limit
• AL - action level
• N - target population
• n - population units
sampled
•  - population mean
• x - sample mean
•  - population
standard deviation
• s - sample standard
deviation
• H0 - null hypothesis
•  - alpha error rate
•  - beta error rate
•  - width of gray
region
22 of 37
t-test
Calculated t = (sample mean - AL)
-----------------------(s / n )
If calculated t is less than table value, decide
site is clean
23 of 37
Upper Confidence Limit, UCL
For a 95% UCL and assuming sufficient n:
If you repeatedly calculate 95% UCLs for many
independent random sampling events, in the long
run, you would be correct 95% of the time in
claiming that the true mean is less than or equal to
your UCLs.
Note: Different X s will produce different UCLs
UCL  X  [ t 1 , df * (s/ n )]
24 of 37
Upper Confidence Limit, UCL
More commonly, but some experts dislike:
For a single UCL, you are 95% confident that the
true mean is less than or equal to your calculated
UCL.
(See Hahn and Meeker in Statistical Intervals A
Guide for Practitioners, p. 31).
25 of 37
Action Level
A measurement threshold value of the Population
Parameter (e.g., true mean) that provides the criterion for
choosing among alternative actions.
26 of 37
N
Target Population: The set of N population units about which
inferences will be made
Population Units: The N objects (environmental units) that
make up the target or sampled population
n
The number of population units selected and measured is n
27 of 37
10 x 10 Field
Population = All 100 Population Units
28 of 37
10 x 10 Field
Population = All 100 Population Units
Sample = 5 Population Units
1.5
1.9
2.3
1.7
1.5
29 of 37
Population Mean

The average of all N population units
 
1
N

N
i=1
Xi
Sample Mean
X
The average of the n population units actually measured
1
X 
n
 Xi
n
i=1
30 of 37
Population Standard Deviation

The average deviation of all N population units from the
population mean
N

 Xi 
i 1
 2
N
Sample Standard Deviation
s
The “average” deviation of the n measured units from the
sample mean

i 1
n
s
 Xi  X
2
n 1
31 of 37
The Null Hypothesis
H0
The initial assumption about how the
true mean relates to the action level
Example: The site is dirty. (We’ll
assume this for the rest of this
discussion)
H 0 :   Action Level
32 of 37
The Alternate Hypothesis
HA
The alternative hypothesis is
accepted only when there is
overwhelming proof that the Null
condition is false.
H A :   Action Level
33 of 37
(Null Hypothesis = Site is Dirty)
The Alpha Error Rate (Type 1, False +)

The chance of deciding that a dirty site is clean
when the true mean is equal to the action level
The Beta Error Rate (Type 2, False -)

The chance of deciding a clean site is dirty when
the true mean is equal to the lower bound of the
gray region (LBGR)
34 of 37
The Width of Gray Region
AL
- 1 = 
Gray Region = AL - LBGR
The lower bound of the gray region (1)
is defined as the hypothetical true mean
concentration where the site should be declared
clean with a reasonably high probability
35 of 37
Summary
Decisions about population parameters, such
as the true mean, , and the true standard
deviation, , are based on statistics such as
the sample mean, X , and the sample standard
deviation, s. Since these decisions are based
on incomplete information, they can be in
error.
36 of 37
End of Module 4
Thank you
Questions?
We will now take a 75 minute lunch break.
Please be back at 1:00 pm.
37 of 37