Let`s revisit the t-test and add Analysis of Variance

Download Report

Transcript Let`s revisit the t-test and add Analysis of Variance

Let’s revisit the t-test and add
Analysis of Variance
T-Test
• Two Sample t-test
• Comparing two
sample means.
Signal
t 
X1  X 2
2
S X1
Error
(Standard error of
mean differences)
n1
2

SX2
n2
It is evident from the formula
that the smaller the
variability, the larger the t
value.
Formulas of variation
Variance:
S  Variance
2

( X  X )
n 1
Standard Deviation (SD):
SD 
S
2
Standard error of the mean=SEM:
SEM 
SD
n
2
Let’s take an output from a ttest analysis
Example from the PASW tutorial
Independent Samples Test
Levene's Test for Equality of
Levene’s test determines if the
variance in one group is different
from the other. This is an
important assumption.
Variances
F
This is the dependent
Equal variances assumed
variable for weight
Equal variances not
Sig.
1.138
.300
assumed
Independent Samples Test
t-test for Equality of Means
t
df
Sig. (2-tailed)
This is the dependent
Equal variances assumed
-8.462
18
.000
variable for weight
Equal variances not
-8.462
17.648
.000
assumed
Independent Samples Test
t-test for Equality of Means
Mean
Std. Error
Difference
Difference
This is the dependent
Equal variances assumed
-32.50000
3.84086
variable for weight
Equal variances not
-32.50000
3.84086
The results are significant.
Sig. (2-tailed) is the Type 1
error.
assumed
Independent Samples Test
t-test for Equality of Means
95% Confidence Interval of the
Difference
Lower
Upper
This is the dependent
Equal variances assumed
-40.56935
-24.43065
variable for weight
Equal variances not
-40.58090
-24.41910
assumed
Confidence
intervals
Confidence interval: Definition
• In statistics, a confidence interval (CI) is a
particular kind of interval estimate of a
population parameter and is used to indicate the
reliability of an estimate. It is an observed
interval (i.e. it is calculated from the
observations), in principle different from sample
to sample, that frequently includes the
parameter of interest, if the experiment is
repeated.
• http://en.wikipedia.org/wiki/Confidence_interval
Confidence intervals
• Confidence intervals can be for a variety of
statistics.
– Means, t statistics, etc…
• For the mean difference as seen in the ttest output, the confidence interval
encompasses 95% of all expected t values
given the error estimated from our data.
– Thus for our example, we expect to obtain a
mean difference to be between and include 40.56 and -24.43 95% of the time
CI cont’d
• As we can see the value of zero is within
that CI. Therefore, we would not reject the
null hypothesis.
– Indeed, the p value obtained is greater than
.05
The value of CI
• In most experimental work, investigators
simply report the inferential statistic, the p
value and, sometimes power.
• In many clinical papers, CI is reported, as
clinicians feel that the range of possible
values is more informative.
CI cont’d
• If we know the population values of any
distribution we use the Z statistic for the number
of SD away from the mean. Thus, the exact
values would be +/- 1.96 SD.
• When we don’t have the population we use a t
statistic for the number of SD away from the
mean which varies depending on the sample
size (see example in the next few slides).
CI cont’d
• Any values within the CI could be
considered common values and many
physicians would regard those value as
normal. However, that would have to be
determined against many other measures
where a pattern would be obtained.
CI example
•
•
•
•
•
•
•
•
•
•
2
Scores
( X  X )
100
SD 
n 1
115
125
6173 . 3
111

 1234 . 67  35 . 13
123
5
198
Mean=128.67
SD
35 . 14
SEM 

 14 . 34
SS=6173.3
n
6
Df=6-1=5
CI cont’d
• For 5 degrees of freedom the critical t is
2.571 (taken from the t-test tables).
• Distance from the mean =14.34±2.571=
±36.88
• CI=128.67±36.88= 91.78 to165.55
Tests of normality
• As we discussed before, one of the rules of
statistics is that the samples come from normally
distributed populations.
• We can test whether or not the samples come
from normally distributed populations.
• The tests are:
• Shapiro-Wilk Test for samples less then 50 but can handle
greater sample sized.
• Kolmogorov-Smirnov Test which is quite suitable for large
sample sizes.
Example of output from SPSS
Tests of Normality
Kolmogorov-Smirnov
Statistic
TestVariable
df
.375
a
Shapiro-Wilk
Sig.
6
.008
Statistic
.741
df
Sig.
6
.016
a. Lilliefors Significance Correction
We can see here that the data is not normally distributed.
Clearly not normal
What to do when data is not normal
• 1. Transform the data using various
formulas suited for the shape of the data.
– Square root.
– Inverse cubed
– Log base 10
– Ln
– Etc
• 2. Use nonparametric statistics that are
insensitive to violations including shape.
Nonparametric tests
• Since we have been discussing the t-test
we will offer an alternative to it.
• There are two:
– Mann-Whitney U test
– Wilcoxon Sign Rank test
• Both provide identical results. The story is that
both were independently developed at the same
time.
Analysis of Variance (ANOVA)
One-Way Analysis of Variance
ANOVA aka Single Factor Analysis of Variance
1) when is a one-way analysis of variance
used?
2) sources of variation: generally from
treatment and from individual differences.
3) an example of a one-way analysis
of variance
4) assumptions underlying F-distribution
When would you use a one-way analysis of
variance?
Example 1:
-What if you were interested in investigating the efficacy
of 3 types of medication as headache remedies?
-You would need to consider…
-IV: medication Type:
Subjects would be randomly allocated to one of
three levels; 1) Tylenol, 2) Bayer, or 3) Advil
condition
-DV: elapsed time (in minutes) from ingesting the
medication to reporting disappearance of headache


1. Analysis of variance is mostly used with you have more than 2 means.
2. F = t2 when you have only two groups.
Example 2:
-What if we wanted to know whether or not the
household income of adults was different depending on
political affiliation?
-in this case we have 5 groups, representing the
political parties.
They are: Liberal, NDP, PC, Reform, Bloc
Note: This was before the PC and Reform and Canadian
Alliance merged.
IV (grouping variable): preferred party with 5 levels
DV (that variable whose values will be influences by the
IV) which is household income
Conceptual basis of analysis of variance
***We want to explain why people differ from each other
-is it because of your treatment variable (independent
variable)?
-or is it just random variation (error)?
i.e., want to track down the sources of variation
e.g., let's investigate how often UWO students go home
during 1 semester
Here is random a sample of 12 students and the number
of times they go home in a semester.
8, 4, 6, 1, 7, 5, 2, 7, 4, 3, 7, 4
Now we allocate subjects to the distance they have to travel if they
which to visit the homestead:
< 2 hours drive: 8, 7, 7, 5
2 to 4 hours drive: 6, 7, 4, 4
> 4 hours drive: 3, 1, 2, 4
From the one-way analysis of variance we will be
able to identify two sources of variance:
1) distance from home to UWO (treatment or
categorization)
2) residual variation that could be due to lots of
things (this is the variation that cannot be
explained by your IV) or error
This is exactly what happens in an
Analysis of Variance
• variation is broken down into 2 components:
1. variation due to differences between groups
2. variation due to differences within groups
• the analysis measures whether the between groups
variance is larger than would be expected by chance
by comparing it to the variance within groups
Lets expand on a previous example:
Data copied from an excel worksheet
representing dollars in thousands.
Bloc
PC
12
14
15
11
17
16
Means
Liberal
23
21
24
22
24
25
NDP
34
34
35
36
37
38
Reform
45
46
45
44
41
42
56
57
58
59
60
68
14.16667 23.16667 35.66667 43.83333 59.66667
Grand Mean
Is the variation between means larger
compared to individual differences?
35.3
Do you remember the formula
for variance?
 (x  x)
S 
2
2
n 1
= sample variance
The analysis of variance (F test) essentially uses the same
conceptual format.
 n ( x  x ) /( J  1)
 ( x  x ) /( N  J )
2
F 
j
j
..
2
ij
Between group (treatment) variance (J=number of groups)
Within group (individual subject) variance (N=total sample
size)
j
Remember that the F test (ratio) is a statistic used to compare
the size of the variance from one source against another. For
us, it is comparing between group variance against individual
subject variance.
Assumptions associated with
the F distribution
1. Observations come from normally distributed populations.
2. Observations represent random samples from populations.
3. Population variances are equal.
4. Numerator and denominator of F ratio are independent.
would be dependent if a score or subject in 1 condition is contingent on having
some score or subject in another condition
e.g., scores are dependent when a subject in 1 condition scoring high means that a
subject in another condition must score low
How would you construct an Fdistribution
1.
Determine the number of levels and the number of
subjects per level.
From a sample distribution, randomly sample with
replacement.
With each sampling calculate the F statistic.
Plot as many calculated Fs possible to obtain a
sampling distribution of Fs.
We can now determine beyond which point an F will
be observed less than 5% of time if sampling from the
same population.
2.
3.
4.
5.
•
•
This is called the critical F.
The critical F changes depending on the number levels and
the number of subjects per level.
F-Distribution
Determination of an F
critical from a probability
density function.
The F critical depends on
the number of levels and the
number of subjects used in
each sample.
One-way analysis of variance
Example Problem
A researcher was interested in whether or not various
cholesterol reducing drugs called statins actually resulted in a
decrease of blood serum Low Density Lipids (LDL). The
mechanism by which these drugs work is by inhibiting “HMG CoA
reductase” a rate controlling enzyme for the production of
cholesterol. Male subjects with higher than recommended
cholesterol levels (>160 mg/dL) were randomly assigned to one of
four drug levels of the IV called “LDL Reducing Drugs”. The DV is
the LDL amount in blood in mg/dL.
1. Atorvastatin
2. Fluvastatin
3. Simvastatin
4. Regular treatment not consisting of a statin.
Three weeks after being prescribed the compound, all
subjects were asked to visit the research clinic and have their LDL
levels measured.
Hypotheses
µ refers to mean of the population
H0: µA = µF = µS = µR
(null)
H1: not all means are equal (alternate)
Note: You may have noticed that the alternate hypothesis simply states that not all
means are equal. The analysis that we will conduct here simply determines if there are
means which are not equal (this is an omnibus test). The analysis will not specify which
means are different from one another. Following the ANOVA you will have to conduct
posthoc analyses which will study later in the lecture.
The data
Statin
IV in column 1:
1=Atorvastatin
2=Fluvastatin
3=Simvastatin
4=Regular treatment.
DV in column 2:
Measurements in mg/dL.
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
4
4
4
4
4
LDL
110
103
90
94
101
120
115
113
105
114
100
101
110
106
104
150
144
129
133
130
Results
F 
 n (x  x
 (x  x )
j
j
ij
j
) /( J  1)
2
..
2
/( N  J )

Sum of squares between groups/df
Sum of squares between individual

BG
s/df
WG
Mean squares between groups
Mean squares between individual
s
4206 . 8
F 
3  1402 . 267  28 . 987
774 . 00
48 . 375
16
Fα=.05 (3,16)=3.24
Since the obtained value is larger compared to the critical value, we can reject the
null hypothesis that all samples come from the same population. Hence, a significant
treatment effect is observed and we can make a statement that statins have an effect.
See tables in the next two slides for the critical values.
DF for
Treatment
DF for error
How to use tables:
http://www.statsoft.com/textbook/distribution-tables/
F table cont’d
Results from SPSS
Treatment
Error
SPSS results match our hand calculations.
Testing the assumptions
1) Normal distribution: Use Shapiro-Wilk’s test of
normality
2) Random sampling: make sure that you sample
randomly, but we will have to take your word for it.
3) Equal variances: tests of homogeneity of variances
can be used (e.g., Levene's test).
4) Numerator and denominator are independent: if
samples are random, can assume that this is true.
Failures to meet the assumptions
1) F distribution is not terribly affected by small departures. Can
transform data if you expect a large departure from normality.
2) Not randomly sampling the population can be probematic.
This can be the case if you hand pick samples. Conclusions
don’t generalize to population.
3) Can be a problem if variances are extremely different or if
sample sizes are unequal. Can transform data or use a
nonparametric test.
4) Don't have subjects' scores be dependent on one another.
Comparing Groups
• The analysis of variance does not determine
specific group differences.
• We could use the t-test but we would end up
with an unacceptable family wise error (FW).
– FW is the accumulation of Type1 errors committed
with every comparison.
• Three comparisons using the t-test would mean we have FW
of 0.15, meaning that we have 15% that at least one
comparison shows significant differences between the mean
due to chance alone.
– We can correct this with a Bonferroni correction
• BC=per comparison alpha (PCa) /number of comparisons.
• This value becomes the new PCa.
Comparing Groups Cont’d
• The Bonferroni correction is somewhat
conservative.
– Type2 errors are possible.
• It is recommended to use Tukey’s
Honestly Significant Difference test (HSD).
– This test is considered to be a good
compromise between Type1 and Type2
errors.
Tukey's HSD
(Honestly Significant Difference)
1) used to test for a significant difference between each pair of means
2) a post-hoc test
i.e., you didn't plan to do that specific test ahead of time
you're reacting to a significant result after you found it
controls for Type I error rate () across a bunch of tests (called
family-wise )
3) only used if:
(a) The ANOVA is significant.
(b) The main effect has more then two groups.
(c) calculate q, where:
n = # of subjects/group
MSerror = within groups mean square from
Anova table
q 
X
i
 X
MS
error
n
j
Our Statin example.
•q critical=4.05, when you have 4 groups and 16 dfs for error.
•MSerror from the original analysis=48.375
•N=5
•Let’s compare the Atorvastatin to the control group.
•Thus, 99.6 to 137.2.
q 
X
a
 X
MS
error
n
c

99 . 6  137 . 2
48 . 375

 37 . 6
 12 . 09
3 . 11
5
Thus, these two groups are significantly different
from one another.
Notice that I’m
not concerned
about
direction. It’s
the magnitude
that matters
here.
Percentage Points of the Studentized Range
Percentage points of the studentized range (cont'd)
Post Hoc Tests
Shown here are examples of
the Tukey and the Bonferroni
tests using data from our
fictitious study.
Homogeneous Subsets
This simply shows aggregates or subsets of groups that
are not different from one another.