Impact of Sample Size

Download Report

Transcript Impact of Sample Size

The Impact of Sample
Size
©Dr. B. C. Paul 2005 revised 2009
Note – The information supplied in these slides is regarded as common
knowledge to those familiar with basic statistics and similar material can be
found in most basic statistics text books. The slides include a shot of
computer output from the computer program SPSS.
Often Interested in Information
about averages

Hubbert’s Hammers makes clobber balls
for use in a doll recycling plant. Hardness
is important to determining the longevity
of the hammer balls. Herby has been
getting some customer complaints about
his balls not holding up and pulls a few off
the assembly line for testing. He gets
values of 3, 3.6, 4.2, 4.1, 2.7, 4.7, and 4.3.
The balls are suppose to have a hardness
of 4.5. Does Herby have a problem?
Herby Runs to SPSS, enters his
sample data

He gets an average of 3.8 and a Stdev of
0.73.
Interpreting the Data


Everyone knows not every ball will be 4.5
hardness, but on average they need to be.
Herby knows that if he ran to his assembly
line and grabbed another 7 balls at
random he would get a different number.
Herby’s World
Herby knows that 95% of the
Time a sample of 7 grab balls
Will be within 1.96 standard
Deviation units of the true mean.

  X 1.96 *

(He’s spent too much time
Looking at normal distribution
Tables)
σ
μ
Herby Formulates a “Hypothesis
Test”


Herby thinks the endurance of his balls has gone
down.
The “null hypothesis” is that this one sad looking
sample is not enough to conclude the mean ball
hardness on the assembly line has changed



If the sample falls within 1.96 standard deviation
units of the target mean of 4.5 Herby can be 95%
certain the spec on his assembly line is still in
tolerance
If not Herby will reject the “null hypothesis” and
conclude that his assembly line is screwed
Oh gosh – get out the crosses and garlic –
where starting to sound like statisticians.
The “Alpha Level”

In reality Herby could grab 7 balls on a perfectly
normal assembly line and get any value


Yet Herby is going to declare a disaster if he does not
come out within 1.96 standard deviation units of his
target value
Because in the real world a sample could come
from anywhere, one of the decisions we have to
make is how willing are we to be wrong.

This is called setting our Alpha Level
Statistics is Like a Court of Law

Innocent until proven guilty


We start with a null hypothesis that Herby’s assembly
line is innocent – ie the mean really is still on target
We set our Alpha Level – we are willing to take a 5%
chance of hanging an innocent man

That’s how Herby got the 1.96 standard deviation units (95%
of all random samples he could take would be within 1.96
standard deviation units of the true mean)


Oh by the way – Herby had to assume the distribution of
hardness of balls on his assembly line pretty much fit a normal
distribution model
If we are only willing to take a 1% chance we would have to
say 2.575 standard deviation units
Comments on Alpha Levels

People hate to be wrong


Tempting to say I don’t want any chance of
being wrong or some very tiny percentage
Null hypothesis becomes an excuse to do
nothing – if you do nothing you won’t be
wrong in what you did


And won’t be worth a darn thing to anyone
High confidence levels are expensive to get

You do see them in drug testing
Don’t Forget – It’s a Model

A warning on high confidence intervals and modeling
reality

Normal distribution is a mathematical model


Normal distribution goes from plus to minus infinity


Many engineered phenomenon and numbers have finite ranges of
values
Normal distribution model often fits the center and way out into
the fringes


We use models because they are fully defined and because we can
make mistakes on models that we cannot afford with trial and error
in the real world
But not all the way out to infinity
When we demand ridiculously low alpha levels we begin
exposing the lack of fit in our model

We still get answers but at some points they are “for adults and
entertainment purposes only”
OK – Lets Get on With Herby’s Test

Plug into the Equation
 3.8  1.96 *

Wholly Marshmallows! What do we use for
standard deviation?

Our standard deviation was the standard deviation for
individual samples – not averages
What’s the Big Deal About
Individual Samples and Averages?



In a large general ed class what kind of
range do you get on peoples test scores?
Ever noticed that certain professors test
average scores tend to come out about
the same value year after year?
Point- In a random sample, the standard
deviation of an average will always be less
than the standard deviation of individual
values.
OK- I Believe – Now Get Me the
Dogone Standard Deviation

For a random sample the standard deviation of
the mean is

Mean


samples
Where n= # samples
Used in the mean
n
If you think I’m going to try showing you the proof your out of
Your mind.
OK – Let Roll

Our standard deviation of the mean is
0.73
0.276 
7

Plug into the magic equation
4.34  3.8  1.96 * 0.276
Oh Crud – The Assembly Line is Turning Out Weak Balls!
What if We Had Set A Higher Alpha
Level

Plug and Chug for 1% Alpha Level
4.51  3.8  2.575 * 0.276


Now we look ok
Note from standard deviation formula that
larger samples suck in the standard deviation

If there really is a problem with Herby’s balls –
how big a sample will it take to see the problem?
Need to Decide How Accurate You
Need Your Estimate to Be

Herby’s assembly line is suppose to turn out
balls of 4.5 hardness

How far out of spec can Herby Tolerate Things?


Suppose Herby decides he needs his estimates to be good to
within 0.5 hardness units.
Next Herby has to decide how much of a chance
he is willing to take that he will shut down the
line and issue recalls when nothing is really
wrong at all.

Suppose Herby wants 99% confidence (ie – alpha
level is 1%)

99% of a normal distribution is within +/- 2.575 standard
deviation unit of the true mean
Herby’s Task


Herby needs to detect a 0.5 hardness unit
departure from the 4.5 target hardness but still
have a less than 1% of shutting the line down by
mistake.
Formula is
Z  *
n

L
Where L is the min error that must be
Be detected
Z is the Z value for our alpha level
Note that this is just the plus or minus part of our
confidence interval formula
Doing the Math

First solve for our sample size needed
*
Z
 
n
L
2
2
2

Then plug into the equation and solve
2
n
2.575
0.5
2
* 0.73
2
N=14.13 as a practical matter means need sample of 15
To actually achieve desired accuracy with an acceptable risk.
Note – this also implies that higher confidence requires more money spent
On sampling and testing.
Herby’s Assembly Line Analysis to
Date




Herby has grabbed a sample of 7 balls off the assembly
line
With this sample Herby is 95% sure he has a problem
with the hardness of the balls being produced
When Herby checked for only a 1% chance that he was
going to shut the line down for no reason at all Herby’s
sample could not furnish him enough certainty
To detect a 0.5 unit departure from the target hardness
of 4.5 and doing so with no more than a 1% chance of
stopping the line for a quirk of sampling Herby must take
a grab of 15 balls off the assembly line
Now Its Your Turn


Do Homework Unit #2 homework 2
In this case the concern is that the ore fed
to the mill has to be within a narrow range
of the recovery at the mill will go to heck.
How many mining faces do you need to be
blending from to keep the ore grade in
tolerance?