#### Transcript Determining Sample Sizex

```Determining
Sample Size
1
Statistical Significance
What factors influence the probability of a statistical significance?
2
Statistical Significance
What factors influence the probability of a statistical significance?
◦
◦
◦
◦
Alpha
Sample Size
Amount of variability in sample
Magnitude of differences between groups/categories/intervals
3
Determining Sample size
𝑛=
𝑡∗𝑠 2
𝐸
Where
◦
◦
◦
◦
n = sample size
t = t score associated with desired significance level
s is the estimated standard deviation
E = the amount error that can be tolerated
4
Determining Sample size
𝑛=
𝑡∗𝑠 2
𝐸
Where
◦
◦
◦
◦
n = sample size
t = t score associated with desired significance level
s is the estimated standard deviation
E = the amount error that can be tolerated
5
Where to get s?
If we don’t have population data, how do we know or even estimate s?
◦ One solution take a small sample
◦ Not always practical
6
Sample Size for Proportion
𝑛=
𝑡∗𝑠 2
𝐸
With a proportion the largest s is associated with a proportion of .5
Using .5 is thus a “prudent” assumption when choosing sample size
7
Example
Example: How big should the NYS Housing and Community Renewal
survey be?
◦ Want to be at least 90% confident
◦ Can tolerate a margin of error of plus or minus three percentage points
◦ 𝑛=
𝑡∗𝑠 2
𝐸
8
POWER
9
IS MY COIN FAKE?
How many flips before you are confident coin is fake
10
IS MY COIN FAKE?
How many flips before you are confident coin is fake?
Probability
1
0.5
2
0.25
3
0.125
4
0.0625
5
0.03125
6
0.015625
7
0.007813
8
0.003906
9
0.001953
10
0.000977
11
Relationship between Power
and hypothesis testing
Accept Null Hypothesis
Reject Null Hypothesis
Null Hypothesis is true
Correct decision
Type I error( alpha
typically set to 5%)
Null Hypothesis is False
Type II error
Correct decision:
Probability of making
this decision correctly is
defined as Power
Probability of making
this correct inference
12
Requirements to estimate
Power
Type of test (e.g. two-sample independent t-test, one tail)
Alpha
Effect size of interest
How much accuracy is desirable
Sample size
Standard deviation of sample
13
Requirements to estimate
Power
Type of test (e.g. two-sample independent t-test, one tail)
◦ Given
14
Requirements to estimate
Power
Alpha
◦ Prefer to avoid Type I error-reject null hypothesis
although null hypothesis is true (lower alpha (.01)
◦ Prefer to avoid Type II error –accept null hypothesis
although null hypothesis is false (higher alpha (.05)
15
Requirements to estimate
Power
Effect size of interest
◦ Determined by theory or intuition
•
Are men heavier than women? What is an
“important” difference?
• Two kilograms?
• Twenty kilograms?
◦
16
Requirements to estimate
Power
Effect size of interest
◦ Cohen’s D
◦ 𝐶𝑜ℎ𝑒𝑛𝑑 =
•
•
𝑀𝑡 −𝑀𝑐
𝑆𝐷𝑝𝑜𝑜𝑙𝑒𝑑
Mt mean treatment or group 1
Mc mean control or group 2
◦ Sdpooled=
𝑆𝐷𝑡2 𝑁𝑡 −1 +𝑆𝐷𝑐2 𝑁𝑐 −1
𝑁𝑐 +𝑁𝑡 −2
17
Requirements to estimate
Power
Cohen’s D
◦ Tells us how big a difference is substantively important
◦ Expresses difference in standard deviation units
Rules of thumb
◦ .2 small effect
◦ .5 moderate effect
◦ .8 large effect
Consider using Cohen’s D if you have no intuition
about effect size or what is an important difference
18
Stata Examples
Class data
◦ Are men heavier than women?
10.2 in text book
Captain Beaver is warned by Colonel Verleaf that if the mean
efficiency rating for the 150 platoons under Verleaf’s
command falls below 80, Captain Beaver will be transferred to
Minot Air Base (A base in the middle of nowhere). Beaver
takes a sample of 20 platoos and finds the following: mean =
85; s=13.5
◦
◦
◦
◦
Null hypothesis µ = 80
Alternative hypothesis µ = 85
sd = 13.5
n = 20
19
Problem 10.2
. *PROBLEM 10.2
. power onemean 80 85, sd(13.5) n(20)
Estimated power for a one-sample mean test
t test
Ho: m = m0 versus Ha: m != m0
Study parameters:
alpha
N
delta
m0
ma
sd
=
=
=
=
=
=
0.0500
20
0.3704
80.0000
85.0000
13.5000
Estimated power:
power =
0.3495
20
Stata Example
Power for Proportion
12.10 in Book
VISTA manager William suspects 50% of his volunteers are over 65 years
old. A survey of 16 volunteers reveals 44% that are over age 65. How
much power does he have?
21
Problem 12.10
.
.
*PROBLEM 12.10
power oneproportion .5 .44, test(wald) n(16)
Estimated power for a one-sample proportion test
Wald z test
Ho: p = p0 versus Ha: p != p0
Study parameters:
alpha
N
delta
p0
pa
=
=
=
=
=
0.0500
16
-0.0600
0.5000
0.4400
Estimated power:
power =
0.0772
22
Sample size and power
Estimated power for a one-sample mean test
t test
H0: μ = μ 0 versus Ha: μ ≠ μ 0
1
Power (1-β)
.8
.6
.4
.2
0
0
20
40
60
Sample size (N)
80
100
Parameters: α = .05, δ = .37, μ0 = 80, μa = 85, σ = 14
23
Effect Size and Power
Estimated power for a one-sample mean test
t test
H0: μ = μ 0 versus Ha: μ ≠ μ 0
1
Power (1-β)
.8
.6
.4
.2
0
60
70
80
Alternative mean (μ a)
90
100
Parameters: α = .05, N = 20, μ0 = 80, σ = 14
24
Are incomes higher in Mixed
Income Developments
NYSHCR survey of tenants
0=Not mixed income, 1 = mixed income
25
Are incomes higher in Mixed
Income Developments
. *ARE INCOMES HIGHER IN MIXED INCOME DEVELOPMENTS?
. ttest household_income, by(mixed_income)
Two-sample t test with equal variances
Group
Obs
Mean
0
1
2,000
395
combined
2,395
diff
Std. Err.
Std. Dev.
[95% Conf. Interval]
22499.16
26554.08
412.2586
716.2176
18436.76
14234.54
21690.66
25145.99
23307.66
27962.17
23167.92
365.2108
17872.95
22451.76
23884.08
-4054.925
980.8007
-5978.231
-2131.618
diff = mean(0) - mean(1)
Ho: diff = 0
Ha: diff < 0
Pr(T < t) = 0.0000
t =
degrees of freedom =
Ha: diff != 0
Pr(|T| > |t|) = 0.0000
-4.1343
2393
Ha: diff > 0
Pr(T > t) = 1.0000
26
Are incomes higher in Mixed
Income Developments
. power twomeans 22499 26554, sd1(18436) sd2(14234) n1(2000) n2(395)
Estimated power for a two-sample means test
Satterthwaite's t test assuming unequal variances
Ho: m2 = m1 versus Ha: m2 != m1
Study parameters:
alpha
N
N1
N2
N2/N1
delta
m1
m2
sd1
sd2
=
0.0500
=
2395
=
2000
=
395
=
0.1975
= 4055.0000
= 2.25e+04
= 2.66e+04
= 1.84e+04
= 1.42e+04
Estimated power:
power =
0.9984
27
```