Strategies for Finding a Job

Download Report

Transcript Strategies for Finding a Job

Determining the Sample Size
Doing research costs…




Power of a hypothesis test generally is
an increasing function of sample size.
Margin of error is generally a decreasing
function of sample size.
Cost of research is generally an
increasing function of sample size
Who is paying the bills?
Before research project
begins…






Formulation of hypotheses.
Choice of significance/confidence levels.
What size of effect are we looking for?
What power do we want?
How many observations do we need?
Can we afford it?
Research Setting
You are a statistician working for a
manufacturer. Process engineer is investigating
the mean amount of time required to complete
an assembly task. He wishes to show that the
mean assembly time is less than 30 seconds.
Past experience with similar studies leads him
to believe that the assembly times will be
approximately normally distributed, and that
the sample range for 1000 observations will be
approximately 9 seconds.
 Formulation of Hypotheses: In this case, Ha:  < 30
sec., so that H0:   30 sec.
 Choice of significance/confidence levels: We weigh
the possible consequences of making either a Type I
error or a Type II error, and decide that we want
 = 0.05. We also want to obtain a 95% confidence
interval estimate of the mean assembly time.
 The engineer tells us that we want to be able to
detect a mean difference of 0.5 sec., i.e., if the mean
assembly time is less than 29.5 sec., we should be
able to detect it.
 We want to be able to detect this size
of effect with probability 0.90. We also
want to be able to estimate the mean
assembly time with an interval width of
0.8 sec.
What size sample do we
need?

X  30
Our test statistic is T 
. Under H0,
 S 


 n
this statistic has a t(n-1) distribution. What is
The distribution under Ha? It is noncentral
with noncentrality parameter     30 .
  


 n
Power analysis

If Tn-1, is a random variable having the above noncentral t distribution, then we see that the power of
the test is the probability that this r.v. will be found to
be less than the critical value of the test. Thus the
power depends on the true value of , the sample
size n, and the true population standard deviation .
We know what size of effect we want to be able to
detect, but we need to know something about .
“Guess-timating” 
If the range of values of assembly times
for 1000 observations is 9 sec., and if
assembly time is a normally distributed
r.v., then we can “guess-timate” that
Range
 
 1.5 sec . .
6
Sample Size for Hypothesis
Test
We then want to find n so that
P Tn 1,  t n 1, 0.05  0.90 .
Can we use SAS to do this calculation?
If we run the following program using a
range of possible sample sizes, we will be
able to solve our problem.


data one;
input n;
cp = tinv(0.05,n-1);
delta = 0.5/(1.5/sqrt(n));
power = probt(cp,n-1,-delta);
put cp power;
cards;
50
;
run;
Sample Size for Estimation
The form of the confidence interval is
S . We want the margin of
X t

n 1, 0.025
n
error to be 0.4 sec. We then want a
value of n to satisfy the following
inequality:
2t n 1,0.025
2n / 2
nn  1n  1 / 2
 0.8
.
The following SAS program, run for a
range of values of n, will help us to solve
our problem.
data one;
input n;
cp = -tinv(0.025,n-1);
numg = gamma(n/2);
deng = gamma((n-1)/2);
width = (2*cp*sqrt(2)*numg*1.5)/(sqrt(n*(n-1))*deng);
put n cp width;
cards;
50
;
run;
Sample Size


We had two criteria for the sample size;
one based on the power of the test, the
other based on the margin of error of
estimation.
We choose the larger of the two values
of n to be sure that we achieve both
the desired power and the desired
margin of error.


To obtain our sample sizes, we had to make
an assumption about the value of . If our
“guess-timate” for  was too small, then our
power would be less that the desired value
and our margin of error would be larger than
the desired value.
If our “guess-timate” was too large, then we
might be wasting resources with a sample
size that would be larger than necessary.