Basics of Statistics - University of Delaware
Download
Report
Transcript Basics of Statistics - University of Delaware
Student’s t statistic
Use
Test for equality of two means
E.g., compare two groups of subjects given
different treatments
Test for value of a single mean
E.g., test to see if a single group of subjects
differs from a known value
Also ‘matched sample’ test where a single
group is compared before and after treatment
(test for zero treatment effect)
Advanced
Tests of significance of correlation/regression
coefficients.
Student’s t statistic
Assumptions
Parent population is normal
Sample observations (subjects) are
independent.
Robustness
To normality: Affects Type I error and power
and may lead to inappropriate
interpretation. In real life, we can’t expect
exactly normal data but it should not be too
much skewed
Student’s t statistic
Formula (single group)
Let x1, x2, ….xn be a random sample from a normal
population with mean µ and variance σ2, then the
following statistic is distributed as Student’s t with
(n-1) degrees of freedom.
x
t
s/ n
Student’s t statistic
Formula (two groups)
Case 1: Two matched samples
The following statistic follows t distribution with n-1 d.f.
d
t
sd / n
Where, d is the difference of two matched samples and Sd is
the standard deviation of the variable d.
Student’s t statistic
Formula (two groups)
Case 2: Equal Population Standard Deviations:
The following statistic is distributed as t distribution with (n1+n2 -2)
d.f.
( x1 x2 )
t
1 1
Sp
n1 n2
The pooled standard deviation,
(n1 1) S12 (n2 1) S 22
Sp
n1 n2 2
n1 and n2 are the sample sizes and S1 and S2 are the sample
standard deviations of two groups.
Student’s t statistic
Formula (two groups)
Case 3: Unequal population standard deviations
The following statistic follows t distribution.
t
( x1 x2 ) ( 1 2 )
s12 s22
n1 n2
The d.f. of this statistic is,
s
2
/ n1 s / n2
v 2
2
2
2
( s1 / n1 ) ( s2 / n2 )
n1 1
n2 1
2
1
2
2
Student’s t statistic
One-sided
There can only be on direction of effect
The investigator is only interested in one
direction of effect.
Greater power to detect difference in
expected direction
Two-sided
Difference could go in either direction
More conservative
Student’s t statistic
One group
Two groups
One sided
A single mean differs
Two means differ from
from a known value in a one another in a
specific direction. e.g.
specific direction. e.g.,
mean > 0
mean2 < mean1
Two sided
A single mean differs
from a known value in
either direction. e.g.,
mean ≠ 0
Two means are not
equal. That is, mean1 ≠
mean2
Student’s t statistic
SPSS
One Group: Analyze>Compare Means> OneSample T Test
Two Groups (Matched Samples):
Analyze>Compare Means> Paired Samples T
Test
Two Groups: Analyze>Compare Means>
Independent Samples T Test
Student’s t statistic
R
The default t-test is
t.test(x, y = NULL, alternative = "two.sided", mu = 0,
paired = False, var.equal = FALSE, conf.level = 0.95)
Where x and y are two data for two numeric variables.
We need to change only default settings matching with
the case we want to perform. For example,
One Group: t.test(x, alternative=“greater”, mu=30)
Two Groups (Matched Samples): t.test(x, y,
alternative= "less", mu = 0, paired = TRUE,)
Two Groups: t.test(x,y, alternative=“greater”, mu=0,
var.equal = TRUE)
Student’s t-statistic
MS Excel (in Tools -> Data Analysis…)
One Group: Not available
Two Groups (Matched Samples):
t-Test: Paired two sample for mean
Two Groups (Independent Samples):
t-Test: Two-Sample Assuming Equal Variances
t-Test: Two-Sample Assuming Unequal Variances
Example 1
Consider the heights of children 4 to 12 years
old in dataset 1 of our course website
(variable ‘hgt’). Suppose we want to test if
the average height (µ) for this age group in
the population is 50 inches, using our sample
of 60 children. We will use 5% level of
significance.
This is a one-sample, two-sided test.
Example 1
Hypotheses:
H0: µ = 50
Ha: µ ≠ 50
Computation in Excel:
Excel does not have a 1-sample test, but we can
fool it.
Create a dummy column parallel to the hgt column
with an equal number of cells, all set to 0.0
Run the Matched sample test using hgt and the
dummy column and 50 as the hypothesized mean
difference.
The p-value for two tail test is 0.0092
Example 1
Using SPSS:
Analyze> Caompare Means >One Sample T
Test > Select hgt > Test value: 50 > ok
P-value is .009
Using R,
t.test(df1$hgt, mu=50)
Two-tail p-value is .0092
Example 2
Suppose we want to compare the height of two
groups (hgt in each sex from dataset).
H0: Mean heights are equal for the two sexes.
Ha: Mean heights are not equal
Using MS-Excel:
Sort data by sex (data>sort>by:sex)
In Data Analysis… t-test:Two-sample Assuming equal variance
select the range of hgt for all sex = f as Variable 1 Range
select the range of hgt for all sex = m as Variable 2 Range
P-value for two-sided test = 0.205
Example 2
Using SPSS:
Analyze>Compare Means>Independent-Samples Ttest>
Select hgt as a Test Variable
Select sex as a Grouping Variable
In Define Groups, type f for Group 1 and m for Group
2
Click Continue then OK
It gives us the p-value 0.205. We can assume equal
variance as the p-value of F statistic for testing
equality of variances is 0.845.
Sign Test (Nonparametric)
Use:
(1) Compare the median of a single group with a
specified value (instead of single sample t-test).
(2) Compare medians of two matched groups
(instead of Two matched samples t-test)
Test Statistic:
Number of positive difference of (median-c). The
number of positive difference follows a Binomial
distribution.
Sign Test (Nonparametric)
SPSS: Analyze> Nonparametric Tests>
Binomial
R: sign.test(x, y = NULL, md = 0,
alternative = "two.sided", conf.level =
0.95)
For testing the median (md) of a single
sample, use data only for one variable.
To compare paired data, use two paired
variables.
NB: This test requires the BSDA package
Wilcoxon Signed-Rank Test:
USE:
Compares medians of two paired samples.
Test Statistic:
Consider n pairs of data of two variables x
and Y, then the following statistic is known
as Wilcoxon signed rank statistic.
WS = Sum of the rank of positive
differences after assigning ranks to the
absolute value of differences.
Wilcoxon Rank-Sum Test
Use: Compares medians of two
independent groups.
Test Statistic:
Let, X and Y be two samples of sizes m and
n. Suppose N=m+n. Compute the rank of all
N observations. Then, the statistic,
Wm= Sum of the ranks of all observations of
variable X.
Wilcoxon Signed-Rank Test &
Wilcoxon Rank-Sum Test
SPSS:
Two Matched Groups: Analyze>
Nonparametric Tests> 2 Related Samples
Two Groups: Analyze> Nonparametric
Tests> 2 Independent Samples
Wilcoxon Signed-Rank Test:
/Wilcoxon Rank-Sum Test
R:
The default test is
wilcox.test(x, y, alternative = "two.sided", mu
= 0, paired = FALSE, exact = FALSE, conf.int =
FALSE, conf.level = 0.95)
Two matched Groups: wilcox.test(x, y, alternative =
“less", paired = TRUE)
Two Groups: wilcox.test(x, y, alternative =
“greater“)
Example 3 (two matched samples)
Subject
Hours of Sleep
Difference
Rank
Ignoring Sign
Drug
Placebo
1
6.1
5.2
0.9
3.5
2
7.0
7.9
-0.9
3.5
3
8.2
3.9
4.3
10
4
7.6
4.7
2.9
7
5
6.5
5.3
1.2
5
6
8.4
5.4
3.0
8
7
6.9
4.2
2.7
6
8
6.7
6.1
0.6
2
9
7.4
3.8
3.6
9
10
5.8
6.3
-0.5
1
3rd & 4th ranks are tied hence averaged.
P-value of this test is 0.02. Hence the test is significant at any level more
than 2%, indicating the drug is more effective than placebo.
Proportion Tests
Use
Test for equality of two Proportions
E.g. proportions of subjects in two treatment
groups who benefited from treatment.
Test for the value of a single proportion
E.g., to test if the proportion of smokers in a
population is some specified value (less than 1)
Proportion Tests
Formula
One Group:
z
Two Groups:
z
pˆ p0
p0 (1 p0 )
n
pˆ 1 pˆ 2
1 1
pˆ (1 pˆ )( )
n1 n2
where pˆ
x1 x2
.
n1 n2
Proportion Test
SPSS:
One Group: Analyze> Nonparametric Tests> Binomial
Two Groups?
R:
The default tests are:
One Group: binom.test(x, n, p = 0.5, alternative =
"two.sided", conf.level = 0.95)
Two Groups: prop.test(c(x,y), c(m,n), p = NULL,
alternative = "two.sided", conf.level = 0.95, correct
= TRUE)
X, Y are the number of successes and m and n
are the sample sizes
Example 4: Proportion of males in
Dataset 1
R:
n=60 and there are 30 males
binom.test(30,60) returns a p-value of 1.0.
SPSS:
recode sex as numeric Transform> Recode>Into Different Variables> Make
all selections there and click on Change after
recoding character variable into numeric.
Analyze> Nonparametric test> Binomial> select Test
variable> Test proportion
Set null hypothesis = 0.5
The p-value = 1.0
Chi-square statistic
USE
Testing the population variance σ2= σ02.
Testing the goodness of fit.
Testing the independence/ association of attributes
Assumptions
Sample observations should be independent.
Cell frequencies should be >= 5.
Total observed and expected frequencies are
equal
Chi-square statistic
Formula: If xi (i=1,2,…n) are independent
and normally distributed with mean µ and
standard deviation σ, then,
xi
2
is
a
distribution withn d.f.
i 1
n
2
If we don’t know µ, then we estimate it using
a sample mean and then,
xi x
2
is
a
distribution with(n -1) d.f.
i 1
n
2
Chi-square statistic
For a contingency table we use the following
chi- square test statistic,
(Oi Ei ) 2
, distributed as 2 with (n - 1) d.f.
Ei
i 1
n
2
Oi ObservedFrequency
Ei ExpectedFrequency
Chi-square statistic
SPSS:
Analyze> Descriptive stat> Crosstabs>
statistics> Chi-square
Select variables.
Click on Cell button to select items you
want in cells, rows, and columns.
Example 5 (class demonstration)
Make a contingency table using two variables
sex and grp from our dataset.
Analyze> Descriptive statistics> crosstabs>
select variables for rows and columns
Statistics> Chi-square> Continue> Cells>
selection> ok.
It will give us a contingency table and p-value
of Pearson Chi-square Tests.
For this particular case, the p-value of PearsonChi-square test is 0.549 and d.f. is 2.
F-statistic
Use:
Testing the equality of population
variances.
Testing the significance of difference of
several means in analysis of variance.
F-statistic
Let X and Y be two independent Chi-square variables with
n1 and n2 d.f. respectively, then the following statistic
follows a F distribution with n1 and n2 d.f.
Fn1 ,n2
X / n1
Y / n2
Let, X and Y are two independent normal variables with
sample sizes n1 and n2. Then the following statistic follows
a F distribution with n1 and n2 d.f.
Fn1 ,n2
s x2
2
sy
Where, sx2 and sy2 are sample variances of X and Y.
F-statistic
Hypotheses:
H0: µ1= µ2=…. =µn
Ha: µ1≠ µ2 ≠ …. ≠µn
Comparison will be done using analysis of
variance (ANOVA) technique. ANOVA uses F
statistic for this comparison. The ANOVA
technique will be covered in another class
session.