Transcript Lecture8
STAT 651
Lecture 8
Copyright (c) Bani Mallick
1
Topics in Lecture #8
Sign test for paired comparisons
Wilcoxon signed rank test for paired
comparisons
Comparing two population means: first
pass
Copyright (c) Bani Mallick
2
Book Sections Covered in Lecture #8
My own material (sign test)
Chapter 6.5 (Wilcoxon signed rank test,
although I think my explanation is better)
Chapters 6.1-6.2 (comparing two
population means: this lecture will not
have any formulae)
Copyright (c) Bani Mallick
3
Lecture 7 Review: Sample Size
Calculations
You want to test at level (Type I error) a the
null hypothesis that the mean = 0
• You want power 1 - b to detect a change of
from the hypothesized mean by the amount
D or more, i.e., the mean is greater than D
or the mean is less than -D
• There is a formula for this, that I showed you
in class.
Copyright (c) Bani Mallick
4
Lecture 7 Review: Never Accept a
Null Hypothesis
Suppose we use a 95% confidence interval, it
includes zero. Why do I say: with 95%
confidence, I cannot reject that the
population mean is zero.
I never, ever say: I can therefore conclude
that the population mean is zero.
Copyright (c) Bani Mallick
5
Lecture 7 Review: Never Accept a
Null Hypothesis
If you pick a tiny sample size, there is no
statistical power to reject the null
hypothesis
In particular, p-values are not the probability
that the null hypothesis is true.
Copyright (c) Bani Mallick
6
Lecture 7 Review: P-Values
The p-value is NOT the probability that the
null hypothesis is true.
p-values are simply a mechanical way to
understand what will happen to hypothesis
tests when you go out and compute them.
For example if you take n=2 , you will have
no power, hence you will have high p-values.
Does this mean that the null hypothesis has a
high probability of being correct? No! It
means you have a rotten study.
Copyright (c) Bani Mallick
7
Lecture 7 Review: Student’s tDistribution
The (1-a)100% CI when s was known was
X±z α/2σ/ n
The (1-a)100% CI when is s unknown is
X±t α/2 (n-1)s/ n
You replace s by s and z α/2
Copyright (c) Bani Mallick
by
t α/2 (n-1)
8
Lecture 7 Review: Student’s tDistribution
Take 95% confidence, a = 0.05
za/2 = 1.96
n = 3, n-1 = 2,
n = 10, n-1 = 9,
n = 30, n-1 = 29,
n = 121, n-1 = 120,
ta/2(n-1) = 4.303
ta/2(n-1) = 2.262
ta/2(n-1) = 2.045
ta/2(n-1) = 1.98
Copyright (c) Bani Mallick
9
Paired Comparisons
We have shown how to test
Η ο : population mean difference = 0;
ΗA : population mean difference 0;
using t-statistics (confidence intervals and
tests).
Copyright (c) Bani Mallick
10
Paired Comparisons
Unfortunately, it often arises (as it does for
the hormone assay) that the differences
between two variables can have many
outliers.
We know that outliers affect the sample mean
and especially the sample standard deviation,
making the latter larger.
Larger standard deviations mean larger
confidence intervals and hence less power.
Copyright (c) Bani Mallick
11
Paired Comparisons
There are two alternative methods that are
not so affected by outliers
These are the Wilcoxon signed rank test and
the sign test
Both are available in SPSS: “Analyze”,
“Nonparametric Tests”, “2 Related Samples”,
also click in “sign” test.
Copyright (c) Bani Mallick
12
Paired Comparisons
The sign test is simple: recode the data
+1 = positive difference
0 = no difference
-1 = negative difference
Then run a t-test and compute the p-value
Problem: No confidence intervals
Serves as check in t-inferences
Copyright (c) Bani Mallick
13
HAND EXAMPLE
Data
Signs
-2
-1
-1
-1
3
5
8
9
1
1
1
1
Copyright (c) Bani Mallick
14
Paired Comparisons
The Wilcoxon signed rank test is simple:
recode the data!
Take the absolute values of the data
Order the absolute values from largest to
smallest
To the smallest absolute value, assign the
number –1 if the actual difference is
negative, 0 if there is no difference, +1 if the
difference is positive
Copyright (c) Bani Mallick
15
Paired Comparisons
The Wilcoxon signed rank test is simple:
To the jth absolute value in order, assign the
number –j if the actual difference is
negative, 0 if there is no difference, +j if the
difference is positive
Then run a t-test and compute the p-value
Problem: No confidence intervals
Serves as check on t-inferences
Copyright (c) Bani Mallick
16
HAND EXAMPLE
Data
-2
-1
3
5
8
9
Absolute
2
1
3
5
8
9
Rank
2
1
3
4
5
6
Signed Rank-2
-1
3
4
5
6
(Run t-test on these guys)
Copyright (c) Bani Mallick
17
Armspan Data
Sign test p-value = 0.486
Wilcoxon signed rank test p-value = 0.281
t-test p-value = 0.282
All consistent!
Copyright (c) Bani Mallick
18
Hormone Assay Data
Remember that in the hormone assay data,
we seemed to get different inferences based
on whether we used the raw data or their
logarithms
The sign test is not affected by
transformations
The Wilcoxon test may be slightly affected by
transformations when studying paired
comparisons
Copyright (c) Bani Mallick
19
Hormone Assay Data
t-test on raw data, p = 0.244
t-test of log data, p = 0.000
Sign test, logged or raw data, p = 0.001
Wilcoxon signed rank test, raw data, p =
0.016, logged data p = 0.000
Remember, I claimed that the log data scale
was most nearly bell-shaped, and hence
thought there was a difference!
Copyright (c) Bani Mallick
20
Comparing Two Population Means
A great deal of our effort will go into
comparing population means.
Bluebonnet Heights on red petals: does
environment matter?
Are true building costs different in
Bryan and College Station, after
accounting for land valuaton?
Copyright (c) Bani Mallick
21
Comparing Two Population Means
We’ll use all our methods
Histograms, boxplots, q-q plots, confidence
intervals, nonparametric tests
Copyright (c) Bani Mallick
22
Comparing Two Populations
There a two populations
Take a sample from each population
The sample sizes need not be the same
Population 1:
n1
Population 2:
n2
Copyright (c) Bani Mallick
23
Comparing Two Populations
Each will have a sample standard deviation
Population 1:
Population 2:
s1
s2
Copyright (c) Bani Mallick
24
Comparing Two Populations
Each sample with have a sample mean
Population 1:
X1
Population 2:
X2
That’s the statistics. What are the
parameters?
Copyright (c) Bani Mallick
25
Comparing Two Populations
Each sample with have a population standard
deviation
Population 1:
σ1
Population 2:
σ2
Copyright (c) Bani Mallick
26
Comparing Two Populations
Each sample with have a population mean
Population 1:
μ1
Population 2:
μ2
Copyright (c) Bani Mallick
27
Comparing Two Populations
How do we compare the population means
and
????
2
The usual way is to take their difference:
μ
μ1
Δ=μ1 -μ 2
If the population means are equal, what is
their difference?
Copyright (c) Bani Mallick
28
Comparing Two Populations
The usual way is to take their difference:
Δ=μ1 -μ 2
If the population means are equal, their
difference = 0
Suppose we form a confidence interval for
the difference. What do we learn?
Say a 95% CI is from 1 to 3?
Copyright (c) Bani Mallick
29
Comparing Two Populations
The usual way is to take their difference:
Δ=μ1 -μ 2
Suppose we form a confidence interval for
the difference. What do we learn?
Say a 95% CI is from 1 to 3?
Population 1 has a mean that is between 1
and 3 units larger than population 2, with
95% probability
Copyright (c) Bani Mallick
30
Comparing Two Populations
Before learning how this confidence interval is
computed, let’s look at an example.
Copyright (c) Bani Mallick
31
NHANES Comparison
“Analyze”, “Compare Means”, “Independent
Samples” will get you the analysis in SPSS
You will get lots and lots of things, so we
have to be a little careful
First do the plots, then the analysis!
You will get means and standard errors
Copyright (c) Bani Mallick
32
NHANES Comparison
6
5
119
Log(Saturated Fat)
4
3
2
1
N=
59
60
Cancer
Healthy
Health Status
Copyright (c) Bani Mallick
33
NHANES Comparison (Cancer Cases)
Normal Q-Q Plot of Log(Saturated Fat)
4.5
4.0
3.5
Expected Normal Value
3.0
2.5
2.0
1.5
1.0
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
Observed Value
Copyright (c) Bani Mallick
34
NHANES Comparison (Healthy
Cases)
Normal Q-Q Plot of Log(Saturated Fat)
4.5
4.0
Expected Normal Value
3.5
3.0
2.5
2.0
1.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
Observed Value
Copyright (c) Bani Mallick
35
NHANES Comparison
Healthy: Mean = 2.9905, s = .6173,
se = .00769
Cancer: Mean = 2.6969, s = .6423,
se = .00836
Note: The sample standard deviations are
nearly numerically equal. This agree with the
box plots, where the IQR’s are nearly equal
Note how small the standard errors are
Copyright (c) Bani Mallick
36
NHANES Comparison
The next thing is that there will be two rows,
one for “Equal Variances Assumed”, the other
for “Equal Variances Not Assumed”
Because we have been careful, the variability
looks to be in the same ballpark. Thus I
would conclude to assume equal variances
Copyright (c) Bani Mallick
37
NHANES Comparison
What happens if the variances do not look
equal?
Generally the results are not very different
unless the sample sizes are quite small.
Generally, people quote the “Variances
assumed equal” p-values and CI
You have a backup, nonparametric rank tests,
that we will discuss later. It’s pretty hard to
make a huge blunder
Copyright (c) Bani Mallick
38
NHANES Comparison
The “Mean Difference” is 0.2937. Since the
healthy cases had a higher mean, this is
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.065 to 0.5223
What is this a CI for?
Copyright (c) Bani Mallick
39
NHANES Comparison
The “Mean Difference” is 0.2937. Since the
cancer cases had a higher mean, this is
Mean(Healthy) – Mean(Cancer)
The 95% CI is from 0.065 to 0.5223
What is this a CI for? In the log scale,
healthy people eat between 0.065 and
0.5223 of saturated fat than women
who developed breast cancer, with 95%
probability.
Copyright (c) Bani Mallick
40