Two-Sample Inference Procedures

Download Report

Transcript Two-Sample Inference Procedures

Two-Sample
Inference
Procedures with
Means
Suppose we have a population of
adult men with a mean height of
71 inches and standard deviation
of 2.6 inches. We also have a population of
adult women with a mean height of 65 inches
and standard deviation of 2.3 inches. Assume
heights are normally distributed.
Describe the distribution of the difference in
heights between males and females (malefemale).
Normal distribution with
mx-y =6 inches & sx-y =3.471 inches
Female
65
Male
71
Difference = male - female
6
Remember:
m  m m
x y
s
x y
x
y
 s s
2
2
x
y
We will
be
interested
in the
difference
of means,
so we will
use this to
find
standard
error.
We will do the calculator
simulation in class.
a) What is the probability that the
mean height of 30 men is at most 5
inches taller than the mean height of
30 women?
P((xm – xw)< 5) = .0573
b) What is the 70th percentile for the
difference (male-female) in mean
heights of 30 men and 30 women?
6.332 inches
Two-Sample Procedures
with means
• The goal of these inference
procedures is to compare the
responses to two treatments or
to compare the characteristics
of two populations.
• We have INDEPENDENT samples
from each treatment or
population
Assumptions:
• Have two SRS’s from the
populations or two randomly
assigned treatment groups
• Samples are independent
• Both populations are normally
distributed
– Have large sample sizes
– Graph BOTH sets of data
• ss are known/unknown
Formulas
Since in real-life, we
will NOT know both ss,
we will do t-procedures.
Degrees of Freedom
Option 1: use the smaller of the two
values n1 – 1 and n2 – 1
This will produce conservative
results – higher p-values & lower
confidence.
Option 2: approximation used
technology
s s 
2

2
2
Calculator
bydoes this
automatically!


n n 

df 
1 s 
1 s
  

n  1 n  n  1 n
1
1
1
2
2
2
2
1
2
1
2
2



Confidence
Called
intervals:
standard
error
CI  statistic criticalvalueSD of statistic
s
s
x  x   t *

n n
1
2
2
1
2
1
2
2
Pooled procedures:
• Used for two populations with the
same variance
• When you pool, you average the
two-sample variances to estimate
the common population variance.
• DO NOT use on AP Exam!!!!!
We do NOT know the variances of the population,
so ALWAYS tell the calculator NO for pooling!
Two competing headache remedies claim to give fastacting relief. An experiment was performed to
compare the mean lengths of time required for bodily
absorption of brand A and brand B. Assume the
absorption time is normally distributed. Twelve people
were randomly selected and given an oral dosage of
brand A. Another 12 were randomly selected and given
an equal dosage of brand B. The length of time in
minutes for the drugs to reach a specified level in the
blood was recorded. The results follow:
mean
SD
n
Brand A
20.1
8.7
12
Brand B
18.9
7.5
12
Describe the shape & standard error for sampling
distribution of the differences in the mean speed of
absorption. (answer on next screen)
Describe the sampling distribution of the differences
in the mean speed of absorption.
Normal distribution with and m=1.2 &
S.E. = 3.316
Find a 95% confidence interval difference in
mean lengths of time required for bodily
absorption of each brand. (answer on next screen)
Assumptions:
State assumptions!
Have 2 independent SRS from volunteers
Given the absorption rate is normally distributed
s’s unknown
s12 s22 Formula & calculations
x1  x2   t *


df  21.53
n1 n2
2
2
8.7 7.5
20.1 18.9  2.080

 (5.685,8.085)
12
12
From calculator df =
Conclusion in context
We are 95% confident that the true difference in mean
21.53, use t* for df =
lengths of time required for bodily absorption of each
21 & 95% confidence
brand is between –5.685 minutes and 8.085 minutes.
level
Note: confidence interval
statements
• Matched pairs – refer to
“mean difference”
• Two-Sample – refer to
“difference of means”
Hypothesis Statements:
H0: m1 =
- m2 = 0
Ha:
Ha:
H
Haa::
m1<- mm22 < 0
m1>- mm22 > 0
mm11 -≠ mm22 ≠ 0
Be sure
to define
BOTH m1
and m2!
Hypothesis Test:
Test statistic 
Since we usually
assume H0 is true,
statistic
parameter
then this equals 0 –
can usually
SDsoofwestatistic
leave it out
 x  x   m  m 
t
1
2
1
2
2
1
2
1
2
s s

n n
2
The length of time in minutes for the drugs
to reach a specified level in the blood was
recorded. The results follow:
Brand A
Brand B
mean
20.1
18.9
SD
8.7
7.5
n
12
12
Is there sufficient evidence that these
drugs differ in the speed at which they
enter the blood stream?
Assump.: Have 2 independent SRS from volunteers
State assumptions!
Given the absorption rate is normally
distributed
s’s unknown
H0: mA= mB
Hypotheses & define variables!
Where mA is the true mean absorption time
for Brand A & mB is the true mean
absorption time for Brand B
Ha:mA= mB
x1  x2
20.1  18.9
t

 .361
Formula
& calculations
2
2
2
2
s1 s2
8.7 7.5


n1 n2
12
12
Conclusion in context
p  value  .7210 df  21.53 α  .05
Since p-value > a, I fail to reject H0. There is not
sufficient evidence to suggest that these drugs differ in
the speed at which they enter the blood stream.
Robustness:
• Two-sample procedures are more
robust than one-sample procedures
• BEST to have equal sample sizes! (but
not necessary)
A modification has been made to the process
for producing a certain type of time-zero film
(film that begins to develop as soon as the
picture is taken). Because the modification
involves extra cost, it will be incorporated only
if sample data indicate that the modification
decreases true average development time by
more than 1 second. Should the company
incorporate the modification?
Original 8.6 5.1 4.5 5.4
Modified 5.5 4.0 3.8 6.0
6.3 6.6
5.8 4.9
5.7 8.5
7.0 5.7