AP Statistics Section 13.1 A

Download Report

Transcript AP Statistics Section 13.1 A

AP Statistics Section 13.1 A
Which of two popular drugs,
Lipitor or Pravachol, helps lower
bad cholesterol more? 4000
people with heart disease were
randomly assigned to two
treatment groups: Lipitor or
Pravachol.
At the end of the study,
researchers compared the mean
“bad cholesterol levels” for each
group. This is a question about
comparing two means.
The researchers also compared the
proportion of subjects who died,
had a heart attack or suffered
other serious consequences in the
first two years. This is a question
about comparing two proportions.
Two-sample problems can arise from a
randomized comparative experiment that
randomly divides the subjects into two groups
and exposes each group to a different
treatment. Unlike the matched pairs design
studied earlier there is no matching of the units
in the two samples and the samples can even be
of different sizes. Two-sample problems also
arise when comparing two different samples
randomly selected from two populations.
Conditions for Comparing Two Means
We have two SRSs from two distinct populations. This allows
SRS: ___________________________________________
us to generalize our findings. We measure the same variable for both
groups.
Normality: Both populations are Normally distributed. In practice, it is
enough that the distributions have _______________
similar shapes and that the
data have no strong _________.
outliers More on this at the end of the notes.
Independence: The samples are independent. That is, one sample has
no influence on the other. Paired observations violate independence,
for example. When sampling without replacement from two distinct
populations, each population must be at least _____
10 times as large as
the corresponding sample size.
We want to compare the two population means,
either by giving a confidence interval for their
difference _______
1   2 or by testing the hypothesis
of no difference, ___________.
H 0 : 1   2  0 To do inference
about the difference between the means of the
two populations, we start with the difference
between the means of the two samples, _____.
x1  x2
The Two-Sample z Statistic
Here are the facts about the sampling distribution of the
difference between the two sample means of independent SRSs.
1. The mean of x1  x2 equals ________
1   2 (i.e. the difference of
sample means is an __________
unbiased estimator of the difference of
population means.
2. The variance of the difference is the sum of the variances
2
2
of x1  x2 , which is  1   2
n1
n2
Note: the variances add because the samples are
independent. The standard deviations do not.
3. If the two population distributions are both Normal, then
the distribution of x1  x2 is also Normal.
Two-sample z statistic (for use when is known)
Suppose that x1 is the mean of an SRS of size n1 drawn from a
Normally distributed population with mean 1 and standard
deviation  1 and that x2 is the mean of an SRS of size n2 drawn
from a Normally distributed population with mean  2 and
standard deviation  2 . Then the two-sample z statistic
z
x1  x2  1   2 
 12
n1

 22
n2
has the standard Normal distribution.
It is really very unlikely that both
population standard deviations are
known. Since this is rarely the
case, let’s consider the more useful
t procedures.
The Two-Sample t Procedures
Because we don’t know the population
standard deviations, we estimate them by
the standard deviations from our two
samples. Recall that this is called the
______________
standard error
2
1
2
2
s
s
SE 

n1 n2
We standardize our estimate x1  x2 ,
using the two-sample t statistic:
t
x1  x2  ( 1   2 )
2
1
2
2
s
s

n1 n2
The level C confidence interval for
1   2 is given by the formula:
x1  x2  t

2
1
2
2
s
s

n1 n2
The degrees of freedom, will equal
_______________________
the smaller of n1  1 and n 2  1
Note: The two-sample t statistic
has approximately a t distribution.
It does not have exactly a t
distribution even if the
populations are both exactly
Normal.
Example 13.2-3: Does increasing the amount of calcium in our
diet reduce blood pressure? Examination of a large sample of
people revealed a relationship between calcium intake and
blood pressure. The relationship was strongest for black men.
Such observational studies do not establish causation.
Researchers therefore designed a randomized comparative
experiment. The subjects in part of the experiment were 21
healthy black men. A randomly chosen group of 10 of the men
received a calcium supplement for 12 weeks. The control group
of 11 men received a placebo pill that looked identical. The
experiment was double-blind. The response variable is the
decrease in systolic blood pressure for a subject after 12 weeks,
in mm of Hg. An increase appears as a negative response.
5
8.743
- .273 5.901
Hypothesis:
The populationof interest is healthy black men.
Wish to test H 0 : 1   2 vs H a : 1   2 where
1  mean decrease in the systolic blood pressure of the calcium group
 2  mean decrease in the systolic blood pressure of the control group
Conditions:
SRS : While the randomization in the experiment helps, the subjects are
volunteers and not an SRS so results may not generalize to the population.
Normality of x : A boxplot of the data shows no outliers in either group
and both distributions are approximately Normal. So I will assume the
population distributions are approximately Normal.
Independence : Because of the randomization, I will assume the two
groups are independent samples. For each group, N  10n since we
are sampling w/o replacement.
Calculations:
t
x1  x2  ( 1   2 )
s12 s22

n1 n2

Degrees of freedom  10 - 1  9
5  (.273)
8.7432 5.9012

10
11
p - value between .05 and .1
p - value  .072
 1.604
TI 83 / 84 : STAT TESTS 4 : 2 - Samp T Test
Choose NO to pooled question.
Interpretation:
My p - value of .072 is greater than the commonly accepted significance
level of .05 so I fail to reject the H 0 . My conclusion is that the experiment
failed to show that calcium reduces blood pressure.
Example: Construct and interpret a 90% confidence
interval for the previous example.
x1  x2  t

s12 s22
8.7432 5.9012

 5  (.273)  1.833

n1 n2
10
11
(.754,11.300)
I am 90% confident the difference between the mean bllod pressure readings
of the calcium group and the control group is between - .754 and 11.300 mm of Hg.
TI 83 / 84 : STAT TESTS 4 : 2 - Samp T Int
We know that sample size does
influence the P-value of a test. A result
that fails to be significant at a specified
level  in a small sample may be
significant in a larger sample.
Subsequent analysis of data from an
experiment with more subjects
resulted in a P-value of 0.008.
Robustness Again
The two-sample t procedures are more robust
than the one-sample t methods, particularly
not symmetric
when the distributions are _____________.
When the sizes of the two samples are _______
equal
and the two populations being compared have
distributions with similar ______,
shape probability
values from the t table are quite accurate for a
broad range of distributions, even when the
sample sizes are as small as ____.
5
As a guide, n1  n2 should be greater than or
equal to ___
10 with both n1  __
5 and n2  __.
5
In planning a two-sample study, choose
_______
equal sample sizes if you can.
The two-sample t procedures are most
robust against non-Normality in this case
and the conservative P-values are most
accurate.