Impact of Sample Size
Download
Report
Transcript Impact of Sample Size
The Impact of Sample
Size
©Dr. B. C. Paul 2005 revised 2009
Note – The information supplied in these slides is regarded as common
knowledge to those familiar with basic statistics and similar material can be
found in most basic statistics text books. The slides include a shot of
computer output from the computer program SPSS.
Often Interested in Information
about averages
Hubbert’s Hammers makes clobber balls
for use in a doll recycling plant. Hardness
is important to determining the longevity
of the hammer balls. Herby has been
getting some customer complaints about
his balls not holding up and pulls a few off
the assembly line for testing. He gets
values of 3, 3.6, 4.2, 4.1, 2.7, 4.7, and 4.3.
The balls are suppose to have a hardness
of 4.5. Does Herby have a problem?
Herby Runs to SPSS, enters his
sample data
He gets an average of 3.8 and a Stdev of
0.73.
Interpreting the Data
Everyone knows not every ball will be 4.5
hardness, but on average they need to be.
Herby knows that if he ran to his assembly
line and grabbed another 7 balls at
random he would get a different number.
Herby’s World
Herby knows that 95% of the
Time a sample of 7 grab balls
Will be within 1.96 standard
Deviation units of the true mean.
X 1.96 *
(He’s spent too much time
Looking at normal distribution
Tables)
σ
μ
Herby Formulates a “Hypothesis
Test”
Herby thinks the endurance of his balls has gone
down.
The “null hypothesis” is that this one sad looking
sample is not enough to conclude the mean ball
hardness on the assembly line has changed
If the sample falls within 1.96 standard deviation
units of the target mean of 4.5 Herby can be 95%
certain the spec on his assembly line is still in
tolerance
If not Herby will reject the “null hypothesis” and
conclude that his assembly line is screwed
Oh gosh – get out the crosses and garlic –
where starting to sound like statisticians.
The “Alpha Level”
In reality Herby could grab 7 balls on a perfectly
normal assembly line and get any value
Yet Herby is going to declare a disaster if he does not
come out within 1.96 standard deviation units of his
target value
Because in the real world a sample could come
from anywhere, one of the decisions we have to
make is how willing are we to be wrong.
This is called setting our Alpha Level
Statistics is Like a Court of Law
Innocent until proven guilty
We start with a null hypothesis that Herby’s assembly
line is innocent – ie the mean really is still on target
We set our Alpha Level – we are willing to take a 5%
chance of hanging an innocent man
That’s how Herby got the 1.96 standard deviation units (95%
of all random samples he could take would be within 1.96
standard deviation units of the true mean)
Oh by the way – Herby had to assume the distribution of
hardness of balls on his assembly line pretty much fit a normal
distribution model
If we are only willing to take a 1% chance we would have to
say 2.575 standard deviation units
Comments on Alpha Levels
People hate to be wrong
Tempting to say I don’t want any chance of
being wrong or some very tiny percentage
Null hypothesis becomes an excuse to do
nothing – if you do nothing you won’t be
wrong in what you did
And won’t be worth a darn thing to anyone
High confidence levels are expensive to get
You do see them in drug testing
Don’t Forget – It’s a Model
A warning on high confidence intervals and modeling
reality
Normal distribution is a mathematical model
Normal distribution goes from plus to minus infinity
Many engineered phenomenon and numbers have finite ranges of
values
Normal distribution model often fits the center and way out into
the fringes
We use models because they are fully defined and because we can
make mistakes on models that we cannot afford with trial and error
in the real world
But not all the way out to infinity
When we demand ridiculously low alpha levels we begin
exposing the lack of fit in our model
We still get answers but at some points they are “for adults and
entertainment purposes only”
OK – Lets Get on With Herby’s Test
Plug into the Equation
3.8 1.96 *
Wholly Marshmallows! What do we use for
standard deviation?
Our standard deviation was the standard deviation for
individual samples – not averages
What’s the Big Deal About
Individual Samples and Averages?
In a large general ed class what kind of
range do you get on peoples test scores?
Ever noticed that certain professors test
average scores tend to come out about
the same value year after year?
Point- In a random sample, the standard
deviation of an average will always be less
than the standard deviation of individual
values.
OK- I Believe – Now Get Me the
Dogone Standard Deviation
For a random sample the standard deviation of
the mean is
Mean
samples
Where n= # samples
Used in the mean
n
If you think I’m going to try showing you the proof your out of
Your mind.
OK – Let Roll
Our standard deviation of the mean is
0.73
0.276
7
Plug into the magic equation
4.34 3.8 1.96 * 0.276
Oh Crud – The Assembly Line is Turning Out Weak Balls!
What if We Had Set A Higher Alpha
Level
Plug and Chug for 1% Alpha Level
4.51 3.8 2.575 * 0.276
Now we look ok
Note from standard deviation formula that
larger samples suck in the standard deviation
If there really is a problem with Herby’s balls –
how big a sample will it take to see the problem?
Need to Decide How Accurate You
Need Your Estimate to Be
Herby’s assembly line is suppose to turn out
balls of 4.5 hardness
How far out of spec can Herby Tolerate Things?
Suppose Herby decides he needs his estimates to be good to
within 0.5 hardness units.
Next Herby has to decide how much of a chance
he is willing to take that he will shut down the
line and issue recalls when nothing is really
wrong at all.
Suppose Herby wants 99% confidence (ie – alpha
level is 1%)
99% of a normal distribution is within +/- 2.575 standard
deviation unit of the true mean
Herby’s Task
Herby needs to detect a 0.5 hardness unit
departure from the 4.5 target hardness but still
have a less than 1% of shutting the line down by
mistake.
Formula is
Z *
n
L
Where L is the min error that must be
Be detected
Z is the Z value for our alpha level
Note that this is just the plus or minus part of our
confidence interval formula
Doing the Math
First solve for our sample size needed
*
Z
n
L
2
2
2
Then plug into the equation and solve
2
n
2.575
0.5
2
* 0.73
2
N=14.13 as a practical matter means need sample of 15
To actually achieve desired accuracy with an acceptable risk.
Note – this also implies that higher confidence requires more money spent
On sampling and testing.
Herby’s Assembly Line Analysis to
Date
Herby has grabbed a sample of 7 balls off the assembly
line
With this sample Herby is 95% sure he has a problem
with the hardness of the balls being produced
When Herby checked for only a 1% chance that he was
going to shut the line down for no reason at all Herby’s
sample could not furnish him enough certainty
To detect a 0.5 unit departure from the target hardness
of 4.5 and doing so with no more than a 1% chance of
stopping the line for a quirk of sampling Herby must take
a grab of 15 balls off the assembly line
Now Its Your Turn
Do Homework Unit #2 homework 2
In this case the concern is that the ore fed
to the mill has to be within a narrow range
of the recovery at the mill will go to heck.
How many mining faces do you need to be
blending from to keep the ore grade in
tolerance?