Basics of Statistics
Download
Report
Transcript Basics of Statistics
Research Statistics, Lecture 5
Test statistic: Group
Comparison
Jobayer Hossain
Larry Holmes, Jr
October 30,2008
Hypothesis Testing (Quantitative
variable)
Hypothesis Testing Procedure
One group sample
Two-group sample
More than two groups sample
One sample t-test
Independent
Not Independent
Analysis of variance
Sign test
Two-sample t-test
Paired t-test
Kruskal Wallis test
Mann-Whitney U test
Wilcoxon Signed Rank test
One group sample - one sample t-test
Test for value of a single mean
E.g., test to see if mean SBP of all AIDHC
employees is 120 mm Hg
Assumptions
– Parent population is normal
– Sample observations (subjects) are independent
One group sample- one sample ttest
Formula
Let x1, x2, ….xn be a random sample from a normal
population with mean µ and variance σ2, then the
following statistic is distributed as Student’s t with (n-1)
degrees of freedom.
x
t
s/ n
One group sample- one sample ttest
Computation in Excel:
– Excel does not have a 1-sample test, but we can fool
it.
– Suppose we want to test if the mean height of pediatric
patients in our data set 1 is 50 inch
– Create a dummy column parallel to the hgt column
with an equal number of cells, all set to 0.0
– Run the Matched sample test using hgt and the
dummy column and 50 as the hypothesized mean
difference.
– The p-value for two tail test is 0.0092
One group sample - one sample t-test
Using SPSS:
– Analyze> Compare Means >One Sample T Test
> Select variable (e.g. height) > Test value: (e.g.
50) > ok
– P-value is .009
– Interpretation: The mean height of the pediatric
patients in our dataset 1 is statistically
significantly different from 50 inches.
One group sample - Sign Test
(Nonparametric)
Use:
(1) Compares the median of a single group with a
specified value (instead of single sample t-test).
Hypothesis: H0:Median = c
Ha:Median c
Test Statistic:
We take the difference of observations from median (xi c). The number of positive difference follows a Binomial
distribution. For large sample size, this distribution
follows normal distribution.
One group sample - Sign Test
(Nonparametric)
SPSS: Analyze> Nonparametric Tests>
Binomial
Two-group (independent) samples - twosample t-statistic
Use
– Test for equality of two means
Assumptions
– Parent population is normal
– Sample observations (subjects) are
independent.
Two-group (independent) samples - twosample t-statistic
Formula (two groups)
– Case 1: Equal Population Standard Deviations:
The following statistic is distributed as t distribution with (n1+n2 -2) d.f.
t
( x1 x2 )
1 1
Sp
n1 n2
The pooled standard deviation,
(n1 1) S12 (n2 1) S 22
Sp
n1 n2 2
n1 and n2 are the sample sizes and S1 and S2 are the sample
deviations of two groups.
standard
Two-group (independent) samples - twosample t-statistic
Formula (two groups)
– Case 2: Unequal population standard deviations
The following statistic follows t distribution.
t
( x1 x2 ) ( 1 2 )
s12 s22
n1 n2
The d.f. of this statistic is,
s
2
/ n1 s / n2
v 2
( s1 / n1 ) 2 ( s22 / n2 ) 2
n1 1
n2 1
2
1
2
2
Two-group (independent) samples - twosample t-statistic
MS Excel (in Tools -> Data Analysis…)
Two Groups (Independent Samples):
– t-Test: Two-Sample Assuming Equal Variances
– t-Test: Two-Sample Assuming Unequal Variances
Two-group (independent) samples - twosample t-statistic
Using SPSS:
–
–
–
–
–
–
Analyze>Compare Means>Independent-Samples T-test>
Select hgt as a Test Variable
Select sex as a Grouping Variable
In Define Groups, type f for Group 1 and m for Group 2
Click Continue then OK
It gives us the p-value 0.205. We can assume equal
variance as the p-value of F statistic for testing equality of
variances is 0.845.
Two-group (independent) samplesWilcoxon Rank-Sum Test (Nonparametric)
Use: Compares medians of two independent
groups.
Corresponds to t-Test for 2 Independent Means
Test Statistic:
Let, X and Y be two samples of sizes m and n. Suppose
N=m+n. Compute the rank of all N observations. Then,
the statistic,
Wm= Sum of the ranks of all observations of variable X.
Two-group (independent) samples- Wilcoxon
Rank-Sum Test (Nonparametric)
Asthmatic score A
Score
Rank
71
82
77
92
88
1
3 3.5
2
7
6
Rank Sum
19.5
Asthmatic score B
Score
85
82
94
97
...
Rank
5
4 3.5
8
9
...
25.5
Two-group (independent) samplesWilcoxon Rank-Sum Test (Nonparametric)
SPSS:
– Two Groups: Analyze> Nonparametric Tests> 2
Independent Samples
Two-group (matched) samples - paired tstatistic
Use: Compares equality of means of two
matched or paired samples (e.g. pretest
versus posttest)
Assumptions:
– Parent population is normal
– Sample observations (subjects) are
independent
Two-group (matched) samples - paired tstatistic
Formula
– The following statistic follows t distribution with n-1 d.f.
d
t
sd / n
Where, d is the difference of two matched samples and Sd is the standard
deviation of the variable d.
More on test statistic
One-sided
– There can only be on direction of effect
– The investigator is only interested in one
direction of effect.
– Greater power to detect difference in
expected direction
Two-sided
– Difference could go in either direction
– More conservative
More on test statistic
One group
Two groups
One sided
A single mean differs
from a known value in a
specific direction. e.g.
mean > 0 or median > 0
Two means differ from
one another in a specific
direction. e.g., mean2 <
mean1
median2 < median1
Two sided
A single mean differs
from a known value in
either direction. e.g.,
mean ≠ 0 or median 0
Two means are not
equal. That is, mean1 ≠
mean2
median1 ≠ median2
Two-group (matched) samples Wilcoxon
Signed-Rank Test (Nonparametric)
USE:
– Compares medians of two paired samples.
Test Statistic
–
–
–
–
Obtain Difference Scores, Di = X1i - X2i
Take Absolute Value of Differences, Di
Assign Ranks to absolute values (lower to higher), Ri
Sum up ranks for positive differences (T+) and negative
differences (T-)
Test Statistic is smaller of T- or T+ (2-tailed)
Example of Wilcoxon signed rank
test (two matched samples)
Subject
Hours of Sleep
Difference
Rank Ignoring
Sign
Drug
Placebo
1
6.1
5.2
0.9
3.5
2
7.0
7.9
-0.9
3.5
3
8.2
3.9
4.3
10
4
7.6
4.7
2.9
7
5
6.5
5.3
1.2
5
6
8.4
5.4
3.0
8
7
6.9
4.2
2.7
6
8
6.7
6.1
0.6
2
9
7.4
3.8
3.6
9
10
3rd &
5.8
6.3
-0.5
1
4th ranks are tied hence averaged.
P-value of this test is 0.02. Hence the test is significant at any level more
than 2%, indicating the drug is more effective than placebo.
Two-group (matched) samples Wilcoxon
Signed-Rank Test (Nonparametric)
SPSS:
– Two Matched Groups: Analyze> Nonparametric
Tests> 2 Related Samples
Comparing > 2 independent
samples: F statistic (Parametric)
Use:
– Compares means of more than two groups
– Testing the equality of population variances.
Comparing > 2 independent
samples: F statistic (Parametric)
Let X and Y be two independent Chi-square variables with
n1 and n2 d.f. respectively, then the following statistic
follows a F distribution with n1 and n2 d.f.
Fn1 ,n2
X / n1
Y / n2
Let, X and Y are two independent normal variables with
sample sizes n1 and n2. Then the following statistic follows
a F distribution with n1 and n2 d.f.
Fn1 ,n2
s x2
2
sy
Where, sx2 and sy2 are sample variances of X and Y.
Comparing > 2 independent samples: F
statistic (Parametric)
Hypotheses:
H0: µ1= µ2=…. =µn
Ha: µ1≠ µ2 ≠ …. ≠µn
Comparison will be done using analysis of
variance (ANOVA) technique.
ANOVA uses F statistic for this comparison.
The ANOVA technique will be covered in
another class session.
Proportion Tests
Use
– Test for equality of two Proportions
E.g. proportions of subjects in two treatment groups who
benefited from treatment.
– Test for the value of a single proportion
E.g., to test if the proportion of smokers in a population
is some specified value (less than 1)
Proportion Tests
Formula
– One Group:
z
pˆ p0
p0 (1 p0 )
n
– Two Groups:
z
pˆ 1 pˆ 2
1 1
ˆp (1 pˆ )( )
n1 n2
x1 x2
where pˆ
.
n1 n2
Proportion Test
SPSS:
– One Group: Analyze> Nonparametric Tests> Binomial
– Two Groups?
Proportion of males in Dataset 1
SPSS:
– recode sex as numeric Transform> Recode>Into Different Variables> Make all
selections there and click on Change after recoding
character variable into numeric.
– Analyze> Nonparametric test> Binomial> select Test
variable> Test proportion
Set null hypothesis = 0.5
The p-value = 1.0
Chi-square statistic
USE
– Testing the population variance σ2= σ02.
– Testing the goodness of fit.
– Testing the independence/ association of attributes
Assumptions
– Sample observations should be independent.
– Cell frequencies should be >= 5.
– Total observed and expected frequencies are equal
Chi-square statistic
Formula: If xi (i=1,2,…n) are independent and
normally distributed with mean µ and standard
deviation σ, then,
2
n
xi
2
is
a
distributi on with n d.f.
i 1
If we don’t know µ, then we estimate it using a
sample mean and then,
xi x
2
is
a
distributi on with (n - 1) d.f.
i 1
n
2
Chi-square statistic
For a contingency table we use the following chisquare test statistic,
2
(
O
E
)
i
2 i
, distribute d as 2 with (n - 1) d.f.
Ei
i 1
n
Oi Observed Frequency
Ei Expected Frequency
Chi-square statistic
Female
O(E)
9 (10)
Total
Group 1
Male
O(E)
9 (10)
Group 2
8 (10)
12 (10)
20
Group 3
11 (10)
9(10)
20
30
30
60
20
Chi-square statistic – calculation of
expected frequency
To obtain the expected frequency for any
cell, use:
Corresponding row total X column total /
grand total
E.g: cell for group 1 and female,
substituting: 30 X 20 / 60 = 10
Chi-square statistic
SPSS:
– Analyze> Descriptive stat> Crosstabs>
statistics> Chi-square
– Select variables.
– Click on Cell button to select items you want
in cells, rows, and columns.
Credits
Thanks are due to all whose works have
been consulted prior to the preparation of
these slides.
Questions