Transcript Lecture5
Stat 651
Lecture 5
Copyright (c) Bani Mallick
1
Topics in Lecture #5
Confidence intervals for a population mean
m when the population standard deviation
s is known.
Properties of confidence intervals: what
things make them longer and shorter.
Sample size calculation for a population
mean m when the population standard
deviation s is known : a simple illustration
of a method.
Copyright (c) Bani Mallick
2
Book Sections Covered in Lecture #5
Chapter 5.1
Chapter 5.2
Chapter 5.3
Copyright (c) Bani Mallick
3
Lecture 4 Review: Pr(X < c) for
Normal Populations
Compute the z-score
c-μ
z=
σ
Look up value in Table 1
Copyright (c) Bani Mallick
4
Lecture 4 Review: Pr(X > c) for
Normal Populations
Compute the z-score
c-μ
z=
σ
Look up the value for z in Table 1
Subtract this value from 1.0
Copyright (c) Bani Mallick
5
Lecture 4 Review: Inference
The sample mean
is a random variable
Its own “population” mean is m
It’s standard deviation is σ/ n
Note how the standard deviation of the
sample mean becomes smaller as the
sample size becomes larger
More data = more precision!!!!!
Copyright (c) Bani Mallick
6
Lecture 4 Review: Central Limit
Theorem
The sample mean
is a random variable
Its own “population” mean is m
It’s standard deviation is σ/ n
In “large enough” samples, the sample mean
is very nearly normally distributed, i.e., has a
bell--shaped histogram
Copyright (c) Bani Mallick
7
Confidence Interval for a Population
Mean
A considerable part of basic statistics is to
make inferences about the population mean m
It is impossible to know the value of m
exactly.
This is a key factoid: why do I say this with
such certainty?
Copyright (c) Bani Mallick
8
Confidence Interval for a Population
Mean
A considerable part of basic statistics is to
make inferences about the population mean m
It is impossible to know the value of m
exactly.
Because (almost) every sample will give
you a unique sample mean, and that
sample mean will not equal the
population mean.
Copyright (c) Bani Mallick
9
Confidence Interval for a Population
Mean
What we can do is to construct an interval of
possible values for the population mean m.
The interval is determined by how much
“confidence” we want in saying that the
population mean m is in the interval.
The interval is always of the form
confidence factor
Copyright (c) Bani Mallick
10
Confidence Interval for a Population
Mean
confidence factor
The confidence factor is determined by how
much confidence we want in concluding that the
population mean is actually in the interval
Which interval has higher confidence of including
the population mean?
-100 to -50 OR
-150 to 0
Copyright (c) Bani Mallick
11
Confidence Interval for a Population
Mean: Formal Method
The first method assumes that the population
standard deviation s is known.
Suppose we want to be 95% confident that our
interval includes the population mean m, i.e., the
probability is 95% that the population mean m is
in the interval.
Here is the interval:
s
1.96
n
s
to 1.96
n
Copyright (c) Bani Mallick
12
WOMEN’S INTERVIEW SURVEY OF
HEALTH (WISH)
I computed the reported mean caloric intake
at the start of the study, and the mean
reported caloric intake at the end
My random variable X was the change
(difference)
My hypothesis is that the population mean of
X is < 0. In other words, I think women
report less calories the more they are
asked about their diet (Hawthorne Effect).
Copyright (c) Bani Mallick
13
WISH: Change in Caloric Intake
2000
247
1000
0
-1000
-2000
217
239
Does it look like
a big change?
Note that the
scale of the box plot
is -3000 to 2000
208
-3000
N=
271
Change in mean Energ
Copyright (c) Bani Mallick
14
WISH
The sample size is n = 271
The sample mean change
I am going to pretend that the population
standard deviation is s = 600.
s
1.96
n
= -180
s
to 1.96
n
Copyright (c) Bani Mallick
15
WISH: Change in Reported Caloric
Intake
n = 271, s = 600, = -180
s
1.96
71
n
s
1.96
- 180 - 71 - 251
n
s
1.96
- 180 71 - 109
n
95% CI = -251 to -109
Copyright (c) Bani Mallick
16
Review
s = 600, n = 271, = -180
Then, with 95% probability, true population
mean change is in the interval from -251 to 109
The chance is 95% that the population
mean change is between 251 and 109
calories lower
Is there a Hawthorne effect?
Copyright (c) Bani Mallick
17
Confidence Intervals
You can construct a confidence interval for
the population mean with any level of
confidence.
Generally, people report the 95% CI, but
sometimes they report the 90% and 99%
confidence intervals.
This is easy to do via a formula, and even
easier to do via SPSS.
Copyright (c) Bani Mallick
18
Confidence Interval for a Population
Mean m when s is Known
Want 90%, 95% and 99% chance of interval
including m.
s
s
90%
1.645
to 1.645
n
n
95%
99%
s
1.96
n
s
2.58
n
s
to 1.96
n
s
to 2.58
n
Copyright (c) Bani Mallick
19
Confidence Intervals
There is a general formula given on page 200
If you want a (1-a)100% confidence interval
for the population mean m when the population
s.d. s is known, use the formula
za / 2
s
n
to z a / 2
s
n
The term za/2 is the value in Table 1 that gives
probability 1 - a/2.
a = 0.10, za/2 = 1.645: a = 0.05, za/2 = 1.96,
a = 0.01, za/2 = 2.58
Copyright (c) Bani Mallick
20
WISH
The sample size is n = 271
The sample mean change
I am going to pretend that the population
standard deviation is s = 600.
I want a 99% confidence interval: za/2 = 2.58
s
2.58
n
= -180
s
to 2.58
n
Copyright (c) Bani Mallick
21
WISH: Change in Reported Caloric
Intake
n = 271, s = 600, = -180
s
2.58
94
n
s
2.58
- 180 - 94 - 274
n
s
2.58
- 180 94 - 86
n
99% CI = -274 to -86
Copyright (c) Bani Mallick
22
WISH: Change in Reported Caloric
Intake
99% CI = -274 to -86
The chance is 99% that the population mean
change in reported caloric intake is between
274 and 86 calories
The chance is less than 1% that there is no
change in the population mean.
Copyright (c) Bani Mallick
23
WISH: Change in Reported Caloric
Intake
99% CI = -274 to -86
95% CI = -251 to -109
Note that the 99% CI is longer than the 95%
CI.
This is clear(!): the more confidence you
want, the longer the CI has to be.
Put another way, the less willing you are to be
wrong, the more conservative your claims.
Copyright (c) Bani Mallick
24
Effect of Sample Size
95% CI = -251 to -109 with n = 271
If n = 1000, the 95% CI would be from
-217 to -143
Note how the CI gets shorter in length as the
sample size gets larger.
This is a general fact: the larger the
sample size the shorter the CI.
Copyright (c) Bani Mallick
25
Effect of Population Standard
Deviation
95% CI = -251 to -109 with s= 600
If s = 2000, the 95% CI would be from
-418 to +58
Note how the CI gets longer in length as the
population standard deviation s gets larger.
This is a general fact: the larger the
population standard deviation s the
longer the CI.
Copyright (c) Bani Mallick
26
Using SPSS to Construct CI
SPSS actually assumes that the population
standard deviation is unknown: we will consider
this case later.
Its default is a 95% CI
You can easily change to any level of confidence
SPSS demo using Wish Data
Copyright (c) Bani Mallick
27
Sample Size Determination
In general, this is a relatively complex issue,
depending very heavily on the experiment.
I will show you a simple calculation in the
special case that the population standard
deviation s is known.
Of course, s is not known in practice, and more
complex methods are required, but this will give
you a feel for the process.
Copyright (c) Bani Mallick
28
Sample Size Determination
The usual answer to “what sample size
should I take” is “what can you afford”.
Remember, more precision with larger sample
sizes
Less precision with smaller sample sizes
Copyright (c) Bani Mallick
29
Sample Size Determination
confidence factor
The length of a confidence interval is
2 x confidence factor
Thus, our 95% CI for WISH was -251 to -109,
so that the length was 142 calories
What if I wanted the length to be 100 calories?
Then the CI would have to be
50
Copyright (c) Bani Mallick
30
Sample Size Determination
confidence factor z a / 2
s
n
s
n
The length of the CI is
If I want the length of a confidence interval to
be
2z a / 2
2xE
then I have to set
2E 2 z a / 2
s
n
Now I do some algebra
Copyright (c) Bani Mallick
31
Sample Size Determination
I want the length of a confidence interval to be
2xE
then the sample size I need is
s
n za / 2
E
2
Copyright (c) Bani Mallick
32
Sample Size Determination
Consider WISH, where = 600. Suppose I want
the confidence interval length of 95% CI to be
2xE = 100
E = 50, za/2 = 1.96
s
600
n z a / 2 1.96
553
E
50
2
Copyright (c) Bani Mallick
2
33
Sample Size Determination
Consider WISH, where = 600. Suppose I want
the confidence interval length of 95% CI to be
2xE = 60
E = 30, za/2 = 1.96
s
600
n z a / 2 1.96
1,537
E
30
2
Copyright (c) Bani Mallick
2
34
Sample Size Determination
95% confidence
Length = 100, E = 50, n = 553
Length = 60, E = 30, n = 1,557
General fact: the more precise you want to
be (shorter CI), the larger the sample size
you will need.
Copyright (c) Bani Mallick
35
Sample Size Determination
General fact: the larger the population
standard deviation, the larger the sample
size you will need to have a CI of length
2xE
Copyright (c) Bani Mallick
36
Reactiver Oxygen Species (ROS)
Data
Rats fed with Fish oil enhanced diets
Response is the change in ROS for an animal
when the cells are exposed to butyrate
Copyright (c) Bani Mallick
37
ROS Data
14
10
12
10
3
8
6
4
2
0
-2
N=
20
Change in Response
Copyright (c) Bani Mallick
38
ROS Data
Sample mean = 3.21
Sample size is n = 20
Pretend s = 3.33
Then
95% interval for population mean change is
[3.21 - 0.74 * 1.96, 3.21 + 0.74 * 1.96] =
[1.76, 4.66]:
Does butyrate increase ROS? How
certain are we?
s
n
= 0.74
Copyright (c) Bani Mallick
39
ROS Data
s = 3.33, n = 20
95% interval for population mean change is
[1.76, 4.66]
The length of the CI is 2xE = 2.90
What sample size would I need to make the
length of the CI = 1.00? Here 2xE = 1.00, E =
0.50, and
2
2
s
3.33
n z a / 2 1.96
170
E
0.50
Copyright (c) Bani Mallick
40