Lecture 7 Outline: Thurs, Sep 25

Download Report

Transcript Lecture 7 Outline: Thurs, Sep 25

Lecture 7 Outline – Thur, Sep 25
• Two sample t-tests – Chapter 2.3
– Levene’s test for equality of two variances – Chapter
4.5.2
• Inferences in a two-treatment randomized
experiment – Chapter 2.4
• Interpretation of p-values – Chapter 2.5.1
• Practical and Statistical Significance – Chapter
4.5.1
• Choosing a sample size for a study – material in
Chapter 23.4 but slides will give you information
you need
Two independent samples
• Probability model: Independent simple random
samples from two populations
- sample from population I
Y11,,Y1n
Y21,, Y2n - sample from population II
• Examples:
1
2
– Perished and surviving sparrows
– Men’s and women’s scores on a social insight test
– Cholesterol in urban and rural Guatemalans
Two-sample t-test
Population parameters: 1,1, 2 , 2
H0: 1  2  0 , H1: 1  2  0
Equal spread model: 1   2 (call it  )
Statistics from samples of size n1 and n2
2
2
from pops. 1 and 2: Y1,Y2 , s1 , s2
• For Bumpus’ data:
•
•
•
•
Y1  .728,Y2  .738,Y2  Y1  .010, s1  .024, s2  .020
Sampling Distribution of
•
•
1 1
SD(Y2  Y1 )  

n1 n2
1 1
SE (Y2  Y1 )  s p

n1 n2
• Pooled estimate of
(equal spread model)
2
:
(n1  1)s1  (n2  1)s2
2
sp 
(n1  1)  (n2  1)
2
See Display 2.8
Y2  Y1
2
Two sample t-test
• H0:2  1   *, H1: 2  1   *
• Test statistic: T= | t | | (Y2  Y1 )   * |
SE(Y2  Y1 )
• If population distributions are normal with equal 
, then if H0 is true, the test statistic t has a
Student’s t distribution with n1  n2  2 degrees of
freedom.
• p-value equals probability that T would be greater
than observed |t| under random sampling model if
H0 is true; calculated from Student’s t distribution.
• For Bumpus data, two-sided p-value = .0809,
suggestive but inconclusive evidence of a
difference
One-sided p-values
• If H1:
is
• If
is
2  1   *, test statistic
(Y  Y )   *
T t  2 1
SE(Y2  Y1 )
H1: 2  1   * , test statistic
(Y2  Y1 )   *
T  t  
SE(Y2  Y1 )
p-value is probability that T would be >=
observed T0 if H0 is true
Confidence Interval for 
• 100(1- )% confidence interval for
2
 1
2  1 :
(Y2  Y1)  tdf (1   / 2)SE(Y2  Y1)
confidence interval, tdf (.975)  2
• For 95%
• Factors affecting width of confidence
interval:
– Sample size
– Population standard deviation 
– Level of confidence (1   )
Two sample tests and CIs in JMP
• Click on Analyze, Fit Y by X, put Group
variable in X and response variable in Y,
and click OK
• Click on red triangle next to Oneway
Analysis and click Means/ANOVA/t-test
• To see the means and standard deviations
themselves, click on Means and Std Dev
under red triangle
Bumpus’ Data Revisited
• Bumpus concluded that sparrows were subjected to
stabilizing selection – birds that were markedly different
from the average were more likely to have died.
• Bumpus (1898): “The process of selective elimination is
most severe with extremely variable individuals, no matter
in what direction the variations may occur. It is quite as
dangerous to be conspicuously above a certain standard of
organic excellence as it is to be conspicuously below the
standard. It is the type that nature favors.”
• Bumpus’ hypothesis is that the variance of physical
characteristics in the survivor group should be smaller than
the variance in the perished group
Testing Equal Variances
• Two independent samples from populations with
variances  12 and  2 2
• H0:  2   2 vs. H1:  2   22
1
1
2
• Levene’s Test – Section 4.5.3
• In JMP, Fit Y by X, under red triangle next to
Oneway Analysis of humerus by group, click
Unequal Variances. Use Levene’s test.
• p-value = .4548, no evidence that variances are not
equal, thus no evidence for Bumpus’ hypothesis.
t-tests for randomized experiments
• Section 2.4
• t-test (with its associated Student t distribution
under H0) has been developed in Ch. 2 for making
inferences to populations using the random
sampling probability model.
• In Ch. 1, we studied making causal inferences in
the additive treatment effect model using the
probability model of a randomized experiment.
| Y  Y * |
t

• The two-sample t-statistic SE(Y  Y ) is a
reasonable test statistic for testing H0: additive
treatment effect is  *
2
1
2
1
t-test for randomized experiments cont.
• The t-test provides an approximately correct pvalue and confidence interval for a randomized
experiment, i.e., the distribution of the t-statistic
under the null hypothesis of an additive treatment
effect of  * is well approximated by the Student’s
t distribution with n1  n2  2 degrees of freedom.
• See Display 2.11
• Bottom line: t-test in JMP can be used to make
approximately correct inferences (p-values and
CIs) for randomized experiments but inferences
should be phrased in terms of additive treatment
effects rather than difference in population mean.
Notes about tests, p-values
• Interpretation of p-value:
– Formally: the probability of random sampling (or
random assignment) leading to a test statistic at least as
large as the observed one if H is true.
0
– Informally, the degree of credibility in H0.
• Conclusions from p-values
– (a) Small p-values mean either (i) H0 is wrong or (ii)
we obtained an unusual sample
– (b) Large p-values mean either (i) H0 is correct or (ii)
the study isn’t large enough to conclude otherwise (i.e.,
the data are consistent with H0 being true but do not
prove it).
Conceptual Question 2.8
• Suppose the following statement is made in
a statistical summary: “A comparison of
breathing capacities in individuals in
households with low nitrogen dioxide levels
and individuals in households with high
nitrogen dioxide levels indicated that there
is no difference in the means (two-sided pvalue =.24).” What is wrong with this
statement?
Interpretation of p-values
• So what p-values are small and large.
• For reference: chance of
–
–
–
–
–
–
3 heads in 3 coin tosses is
4
4
5
5
6
6
7
7
8
8
.125
.063
.031
.016
.008
.004
• See Display 2.12 for a subjective guide.
Practical and Statistical Significance
• Section 4.5.1
• p-values indicate statistical significance, the
extent to which a null hypothesis is
contradicted by data
• This must be distinguished from practical
significance, the practical importance of the
finding.
Example
• Investigators compare WISC vocabulary scores for big city
and rural children.
• They take a simple random sample of 2500 big city
children and an independent simple random sample of
2500 rural children.
• The big city children average 26 on the test and their SD is
10 points; the rural children average only 25 and their SD
is 10 point
• Two sample t-test: t  1/(10 1  1 )  3.3
, p-value 
2500 2500
.00005
• Difference between big city children and rural children is
highly significant, rural children are lagging behind in
development of language skills and the investigators
launch a crusade to pour money into rural schools.
Example Continued
• Confidence interval for mean difference between
rural and big city children: (.43,1.28).
• WISC test – 40 words child has to define. Two
points given for correct definition, one for
partially correct definition.
• Likely value of mean difference between big city
and rural children is about one partial
understanding of a word out of forty.
• Not a good basis for a crusade. Actually
investigators have shown that there is almost no
difference between big city and rural children on
WISC vocabulary scale.
Practical vs. Statistical Significance
• The p-value of a test depends on the sample size.
With a large sample, even a small difference can
be “statistically significant,” that is hard to explain
by the luck of the draw. This doesn’t necessarily
make it important. Conversely, an important
difference may not be statistically significant if the
sample is too small.
• Always accompany p-values for tests of
hypotheses with confidence intervals. Confidence
intervals provide information about the likely
magnitude of the difference and thus provide
information about its practical importance.
Conclusions from a Study
• A successful experiment has both statistical and practical
significance.
• Often the results of a study may be a summarized by a
confidence interval on a key parameter (e.g., treatment
effect)
• Display 23.1 – four possible outcomes to a confidence
interval procedure.
• First three outcomes – A, B and C – are successes in that it
is possible to draw an inferential conclusion that
distinguishes between the important alternatives in one
way or another. But outcome D is a failure because both
the null hypothesis and practically significant alternatives
remain plausible.
Designing a Study
• Role of research design is to avoid outcome D.
This is accomplished by making confidence
interval short enough that it cannot simultaneously
include both parameter values.
• How to make confidence interval short enough
(Display 23.2)?
– Make s small through blocking, covariates, improved
measurement (more later in course)
– Choose large enough sample size.
Choosing the sample size
• Suppose the null hypothesis is that   0 in a matched
pairs study.
• Let PSD denote the practically significant alternative that
is closest to zero.
• A confidence interval for 
has margin of error
s
s
t.n1 (.975)
2
n
n
• We want the CI to have margin of error less than |PSD|,
i.e.,
s
2
| PSD |
• Thus, we want the sample size n to satisfy
n
• Solving for n gives that the sample size needs to be at least
4s2/PSD2.
• Sample size calculation requires an estimate of  (s)
before conducting the study.
Example
• Blood platelet aggregation before and after
smoking cigarettes
• The smallest medically significant difference is
considered to be 1 platelet. The standard deviation
of differences before and after in the population is
estimated to be 8. How large a sample should be
taken so that the confidence interval is not likely
to contain both the null hypothesis that the
difference is zero and a difference of 1 platelet?
Choosing Sample Size
• Similar principles can be used to find
appropriate sample sizes for two
independent sample studies and randomized
experiments