Transcript Lecture8

STAT 651
Lecture 8
Copyright (c) Bani Mallick
1
Topics in Lecture #8



Sign test for paired comparisons
Wilcoxon signed rank test for paired
comparisons
Comparing two population means: first
pass
Copyright (c) Bani Mallick
2
Book Sections Covered in Lecture #8



My own material (sign test)
Chapter 6.5 (Wilcoxon signed rank test,
although I think my explanation is better)
Chapters 6.1-6.2 (comparing two
population means: this lecture will not
have any formulae)
Copyright (c) Bani Mallick
3
Lecture 7 Review: Sample Size
Calculations
 You want to test at level (Type I error) a the
null hypothesis that the mean = 0
• You want power 1 - b to detect a change of
from the hypothesized mean by the amount
D or more, i.e., the mean is greater than D
or the mean is less than -D
• There is a formula for this, that I showed you
in class.
Copyright (c) Bani Mallick
4
Lecture 7 Review: Never Accept a
Null Hypothesis

Suppose we use a 95% confidence interval, it
includes zero. Why do I say: with 95%
confidence, I cannot reject that the
population mean is zero.

I never, ever say: I can therefore conclude
that the population mean is zero.
Copyright (c) Bani Mallick
5
Lecture 7 Review: Never Accept a
Null Hypothesis


If you pick a tiny sample size, there is no
statistical power to reject the null
hypothesis
In particular, p-values are not the probability
that the null hypothesis is true.
Copyright (c) Bani Mallick
6
Lecture 7 Review: P-Values



The p-value is NOT the probability that the
null hypothesis is true.
p-values are simply a mechanical way to
understand what will happen to hypothesis
tests when you go out and compute them.
For example if you take n=2 , you will have
no power, hence you will have high p-values.
Does this mean that the null hypothesis has a
high probability of being correct? No! It
means you have a rotten study.
Copyright (c) Bani Mallick
7
Lecture 7 Review: Student’s tDistribution

The (1-a)100% CI when s was known was
X±z α/2σ/ n

The (1-a)100% CI when is s unknown is
X±t α/2 (n-1)s/ n

You replace s by s and z α/2
Copyright (c) Bani Mallick
by
t α/2 (n-1)
8

Lecture 7 Review: Student’s tDistribution
Take 95% confidence, a = 0.05

za/2 = 1.96

n = 3, n-1 = 2,

n = 10, n-1 = 9,

n = 30, n-1 = 29,

n = 121, n-1 = 120,
ta/2(n-1) = 4.303
ta/2(n-1) = 2.262
ta/2(n-1) = 2.045
ta/2(n-1) = 1.98
Copyright (c) Bani Mallick
9
Paired Comparisons

We have shown how to test
Η ο : population mean difference = 0;
ΗA : population mean difference  0;
using t-statistics (confidence intervals and
tests).
Copyright (c) Bani Mallick
10
Paired Comparisons



Unfortunately, it often arises (as it does for
the hormone assay) that the differences
between two variables can have many
outliers.
We know that outliers affect the sample mean
and especially the sample standard deviation,
making the latter larger.
Larger standard deviations mean larger
confidence intervals and hence less power.
Copyright (c) Bani Mallick
11
Paired Comparisons



There are two alternative methods that are
not so affected by outliers
These are the Wilcoxon signed rank test and
the sign test
Both are available in SPSS: “Analyze”,
“Nonparametric Tests”, “2 Related Samples”,
also click in “sign” test.
Copyright (c) Bani Mallick
12
Paired Comparisons

The sign test is simple: recode the data

+1 = positive difference

0 = no difference

-1 = negative difference

Then run a t-test and compute the p-value

Problem: No confidence intervals

Serves as check in t-inferences
Copyright (c) Bani Mallick
13
HAND EXAMPLE


Data
Signs
-2
-1
-1
-1
3
5
8
9
1
1
1
1
Copyright (c) Bani Mallick
14
Paired Comparisons




The Wilcoxon signed rank test is simple:
recode the data!
Take the absolute values of the data
Order the absolute values from largest to
smallest
To the smallest absolute value, assign the
number –1 if the actual difference is
negative, 0 if there is no difference, +1 if the
difference is positive
Copyright (c) Bani Mallick
15
Paired Comparisons


The Wilcoxon signed rank test is simple:
To the jth absolute value in order, assign the
number –j if the actual difference is
negative, 0 if there is no difference, +j if the
difference is positive

Then run a t-test and compute the p-value

Problem: No confidence intervals

Serves as check on t-inferences
Copyright (c) Bani Mallick
16
HAND EXAMPLE

Data
-2
-1
3
5
8
9

Absolute
2
1
3
5
8
9

Rank
2
1
3
4
5
6

Signed Rank-2
-1
3
4
5
6
(Run t-test on these guys)
Copyright (c) Bani Mallick
17
Armspan Data

Sign test p-value = 0.486

Wilcoxon signed rank test p-value = 0.281

t-test p-value = 0.282

All consistent!
Copyright (c) Bani Mallick
18
Hormone Assay Data



Remember that in the hormone assay data,
we seemed to get different inferences based
on whether we used the raw data or their
logarithms
The sign test is not affected by
transformations
The Wilcoxon test may be slightly affected by
transformations when studying paired
comparisons
Copyright (c) Bani Mallick
19
Hormone Assay Data

t-test on raw data, p = 0.244

t-test of log data, p = 0.000

Sign test, logged or raw data, p = 0.001


Wilcoxon signed rank test, raw data, p =
0.016, logged data p = 0.000
Remember, I claimed that the log data scale
was most nearly bell-shaped, and hence
thought there was a difference!
Copyright (c) Bani Mallick
20
Comparing Two Population Means



A great deal of our effort will go into
comparing population means.
Bluebonnet Heights on red petals: does
environment matter?
Are true building costs different in
Bryan and College Station, after
accounting for land valuaton?
Copyright (c) Bani Mallick
21
Comparing Two Population Means


We’ll use all our methods
Histograms, boxplots, q-q plots, confidence
intervals, nonparametric tests
Copyright (c) Bani Mallick
22
Comparing Two Populations

There a two populations

Take a sample from each population

The sample sizes need not be the same

Population 1:
n1

Population 2:
n2
Copyright (c) Bani Mallick
23
Comparing Two Populations

Each will have a sample standard deviation

Population 1:

Population 2:
s1
s2
Copyright (c) Bani Mallick
24
Comparing Two Populations

Each sample with have a sample mean

Population 1:
X1

Population 2:
X2

That’s the statistics. What are the
parameters?
Copyright (c) Bani Mallick
25
Comparing Two Populations

Each sample with have a population standard
deviation

Population 1:
σ1

Population 2:
σ2
Copyright (c) Bani Mallick
26
Comparing Two Populations

Each sample with have a population mean

Population 1:
μ1

Population 2:
μ2
Copyright (c) Bani Mallick
27
Comparing Two Populations


How do we compare the population means
and
????
2
The usual way is to take their difference:
μ
μ1
Δ=μ1 -μ 2

If the population means are equal, what is
their difference?
Copyright (c) Bani Mallick
28
Comparing Two Populations

The usual way is to take their difference:
Δ=μ1 -μ 2



If the population means are equal, their
difference = 0
Suppose we form a confidence interval for
the difference. What do we learn?
Say a 95% CI is from 1 to 3?
Copyright (c) Bani Mallick
29
Comparing Two Populations

The usual way is to take their difference:
Δ=μ1 -μ 2



Suppose we form a confidence interval for
the difference. What do we learn?
Say a 95% CI is from 1 to 3?
Population 1 has a mean that is between 1
and 3 units larger than population 2, with
95% probability
Copyright (c) Bani Mallick
30
Comparing Two Populations

Before learning how this confidence interval is
computed, let’s look at an example.
Copyright (c) Bani Mallick
31
NHANES Comparison


“Analyze”, “Compare Means”, “Independent
Samples” will get you the analysis in SPSS
You will get lots and lots of things, so we
have to be a little careful

First do the plots, then the analysis!

You will get means and standard errors
Copyright (c) Bani Mallick
32
NHANES Comparison
6
5
119
Log(Saturated Fat)
4
3
2
1
N=
59
60
Cancer
Healthy
Health Status
Copyright (c) Bani Mallick
33
NHANES Comparison (Cancer Cases)
Normal Q-Q Plot of Log(Saturated Fat)
4.5
4.0
3.5
Expected Normal Value
3.0
2.5
2.0
1.5
1.0
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
Observed Value
Copyright (c) Bani Mallick
34
NHANES Comparison (Healthy
Cases)
Normal Q-Q Plot of Log(Saturated Fat)
4.5
4.0
Expected Normal Value
3.5
3.0
2.5
2.0
1.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
Observed Value
Copyright (c) Bani Mallick
35
NHANES Comparison




Healthy: Mean = 2.9905, s = .6173,
se = .00769
Cancer: Mean = 2.6969, s = .6423,
se = .00836
Note: The sample standard deviations are
nearly numerically equal. This agree with the
box plots, where the IQR’s are nearly equal
Note how small the standard errors are
Copyright (c) Bani Mallick
36
NHANES Comparison


The next thing is that there will be two rows,
one for “Equal Variances Assumed”, the other
for “Equal Variances Not Assumed”
Because we have been careful, the variability
looks to be in the same ballpark. Thus I
would conclude to assume equal variances
Copyright (c) Bani Mallick
37
NHANES Comparison




What happens if the variances do not look
equal?
Generally the results are not very different
unless the sample sizes are quite small.
Generally, people quote the “Variances
assumed equal” p-values and CI
You have a backup, nonparametric rank tests,
that we will discuss later. It’s pretty hard to
make a huge blunder
Copyright (c) Bani Mallick
38
NHANES Comparison

The “Mean Difference” is 0.2937. Since the
healthy cases had a higher mean, this is
Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.065 to 0.5223

What is this a CI for?
Copyright (c) Bani Mallick
39
NHANES Comparison

The “Mean Difference” is 0.2937. Since the
cancer cases had a higher mean, this is
Mean(Healthy) – Mean(Cancer)


The 95% CI is from 0.065 to 0.5223
What is this a CI for? In the log scale,
healthy people eat between 0.065 and
0.5223 of saturated fat than women
who developed breast cancer, with 95%
probability.
Copyright (c) Bani Mallick
40