Transcript Lecture5

Stat 651
Lecture 5
Copyright (c) Bani Mallick
1
Topics in Lecture #5



Confidence intervals for a population mean
m when the population standard deviation
s is known.
Properties of confidence intervals: what
things make them longer and shorter.
Sample size calculation for a population
mean m when the population standard
deviation s is known : a simple illustration
of a method.
Copyright (c) Bani Mallick
2
Book Sections Covered in Lecture #5

Chapter 5.1

Chapter 5.2

Chapter 5.3
Copyright (c) Bani Mallick
3
Lecture 4 Review: Pr(X < c) for
Normal Populations

Compute the z-score
c-μ
z=
σ

Look up value in Table 1
Copyright (c) Bani Mallick
4
Lecture 4 Review: Pr(X > c) for
Normal Populations

Compute the z-score
c-μ
z=
σ

Look up the value for z in Table 1

Subtract this value from 1.0
Copyright (c) Bani Mallick
5
Lecture 4 Review: Inference


The sample mean
is a random variable

Its own “population” mean is m

It’s standard deviation is σ/ n

Note how the standard deviation of the
sample mean becomes smaller as the
sample size becomes larger

More data = more precision!!!!!
Copyright (c) Bani Mallick
6
Lecture 4 Review: Central Limit
Theorem


The sample mean
is a random variable

Its own “population” mean is m

It’s standard deviation is σ/ n

In “large enough” samples, the sample mean
is very nearly normally distributed, i.e., has a
bell--shaped histogram
Copyright (c) Bani Mallick
7
Confidence Interval for a Population
Mean

A considerable part of basic statistics is to
make inferences about the population mean m

It is impossible to know the value of m
exactly.

This is a key factoid: why do I say this with
such certainty?
Copyright (c) Bani Mallick
8
Confidence Interval for a Population
Mean

A considerable part of basic statistics is to
make inferences about the population mean m

It is impossible to know the value of m
exactly.

Because (almost) every sample will give
you a unique sample mean, and that
sample mean will not equal the
population mean.
Copyright (c) Bani Mallick
9
Confidence Interval for a Population
Mean

What we can do is to construct an interval of
possible values for the population mean m.

The interval is determined by how much
“confidence” we want in saying that the
population mean m is in the interval.

The interval is always of the form
  confidence factor
Copyright (c) Bani Mallick
10
Confidence Interval for a Population
Mean
  confidence factor

The confidence factor is determined by how
much confidence we want in concluding that the
population mean is actually in the interval

Which interval has higher confidence of including
the population mean?
-100 to -50 OR
-150 to 0
Copyright (c) Bani Mallick
11
Confidence Interval for a Population
Mean: Formal Method

The first method assumes that the population
standard deviation s is known.

Suppose we want to be 95% confident that our
interval includes the population mean m, i.e., the
probability is 95% that the population mean m is
in the interval.

Here is the interval:
s
  1.96
n
s
to   1.96
n
Copyright (c) Bani Mallick
12
WOMEN’S INTERVIEW SURVEY OF
HEALTH (WISH)

I computed the reported mean caloric intake
at the start of the study, and the mean
reported caloric intake at the end

My random variable X was the change
(difference)

My hypothesis is that the population mean of
X is < 0. In other words, I think women
report less calories the more they are
asked about their diet (Hawthorne Effect).
Copyright (c) Bani Mallick
13
WISH: Change in Caloric Intake
2000
247
1000
0
-1000
-2000
217
239
Does it look like
a big change?
Note that the
scale of the box plot
is -3000 to 2000
208
-3000
N=
271
Change in mean Energ
Copyright (c) Bani Mallick
14
WISH

The sample size is n = 271

The sample mean change

I am going to pretend that the population
standard deviation is s = 600.
s
  1.96
n
 = -180
s
to   1.96
n
Copyright (c) Bani Mallick
15
WISH: Change in Reported Caloric
Intake

n = 271, s = 600,  = -180
s
1.96
 71
n
s
  1.96
 - 180 - 71  - 251
n
s
  1.96
 - 180  71  - 109
n

95% CI = -251 to -109
Copyright (c) Bani Mallick
16
Review

s = 600, n = 271,  = -180

Then, with 95% probability, true population
mean change is in the interval from -251 to 109

The chance is 95% that the population
mean change is between 251 and 109
calories lower

Is there a Hawthorne effect?
Copyright (c) Bani Mallick
17
Confidence Intervals

You can construct a confidence interval for
the population mean with any level of
confidence.

Generally, people report the 95% CI, but
sometimes they report the 90% and 99%
confidence intervals.

This is easy to do via a formula, and even
easier to do via SPSS.
Copyright (c) Bani Mallick
18
Confidence Interval for a Population
Mean m when s is Known




Want 90%, 95% and 99% chance of interval
including m.
s
s
90%
  1.645
to   1.645
n
n
95%
99%
s
  1.96
n
s
  2.58
n
s
to   1.96
n
s
to   2.58
n
Copyright (c) Bani Mallick
19
Confidence Intervals

There is a general formula given on page 200

If you want a (1-a)100% confidence interval
for the population mean m when the population
s.d. s is known, use the formula
  za / 2
s
n
to   z a / 2
s
n

The term za/2 is the value in Table 1 that gives
probability 1 - a/2.

a = 0.10, za/2 = 1.645: a = 0.05, za/2 = 1.96,
a = 0.01, za/2 = 2.58
Copyright (c) Bani Mallick
20
WISH

The sample size is n = 271

The sample mean change 

I am going to pretend that the population
standard deviation is s = 600.

I want a 99% confidence interval: za/2 = 2.58
s
  2.58
n
= -180
s
to   2.58
n
Copyright (c) Bani Mallick
21
WISH: Change in Reported Caloric
Intake

n = 271, s = 600,  = -180
s
2.58
 94
n
s
  2.58
 - 180 - 94  - 274
n
s
  2.58
 - 180  94  - 86
n

99% CI = -274 to -86
Copyright (c) Bani Mallick
22
WISH: Change in Reported Caloric
Intake

99% CI = -274 to -86

The chance is 99% that the population mean
change in reported caloric intake is between
274 and 86 calories

The chance is less than 1% that there is no
change in the population mean.
Copyright (c) Bani Mallick
23
WISH: Change in Reported Caloric
Intake

99% CI = -274 to -86

95% CI = -251 to -109

Note that the 99% CI is longer than the 95%
CI.

This is clear(!): the more confidence you
want, the longer the CI has to be.

Put another way, the less willing you are to be
wrong, the more conservative your claims.
Copyright (c) Bani Mallick
24
Effect of Sample Size

95% CI = -251 to -109 with n = 271

If n = 1000, the 95% CI would be from
-217 to -143

Note how the CI gets shorter in length as the
sample size gets larger.

This is a general fact: the larger the
sample size the shorter the CI.
Copyright (c) Bani Mallick
25
Effect of Population Standard
Deviation

95% CI = -251 to -109 with s= 600

If s = 2000, the 95% CI would be from
-418 to +58

Note how the CI gets longer in length as the
population standard deviation s gets larger.

This is a general fact: the larger the
population standard deviation s the
longer the CI.
Copyright (c) Bani Mallick
26
Using SPSS to Construct CI

SPSS actually assumes that the population
standard deviation is unknown: we will consider
this case later.

Its default is a 95% CI

You can easily change to any level of confidence

SPSS demo using Wish Data
Copyright (c) Bani Mallick
27
Sample Size Determination

In general, this is a relatively complex issue,
depending very heavily on the experiment.

I will show you a simple calculation in the
special case that the population standard
deviation s is known.

Of course, s is not known in practice, and more
complex methods are required, but this will give
you a feel for the process.
Copyright (c) Bani Mallick
28
Sample Size Determination

The usual answer to “what sample size
should I take” is “what can you afford”.

Remember, more precision with larger sample
sizes

Less precision with smaller sample sizes
Copyright (c) Bani Mallick
29
Sample Size Determination
  confidence factor

The length of a confidence interval is
2 x confidence factor



Thus, our 95% CI for WISH was -251 to -109,
so that the length was 142 calories
What if I wanted the length to be 100 calories?
Then the CI would have to be
  50
Copyright (c) Bani Mallick
30
Sample Size Determination
  confidence factor    z a / 2
s
n
s
n

The length of the CI is

If I want the length of a confidence interval to
be
2z a / 2
2xE
then I have to set

2E  2 z a / 2
s
n
Now I do some algebra
Copyright (c) Bani Mallick
31
Sample Size Determination

I want the length of a confidence interval to be
2xE
then the sample size I need is
s

n   za / 2 
E

2
Copyright (c) Bani Mallick
32
Sample Size Determination

Consider WISH, where = 600. Suppose I want
the confidence interval length of 95% CI to be
2xE = 100

E = 50, za/2 = 1.96
s
600 


n   z a / 2    1.96
  553
E
50 


2
Copyright (c) Bani Mallick
2
33
Sample Size Determination

Consider WISH, where = 600. Suppose I want
the confidence interval length of 95% CI to be
2xE = 60

E = 30, za/2 = 1.96
s
600 


n   z a / 2    1.96
  1,537
E
30 


2
Copyright (c) Bani Mallick
2
34
Sample Size Determination

95% confidence

Length = 100, E = 50, n = 553

Length = 60, E = 30, n = 1,557

General fact: the more precise you want to
be (shorter CI), the larger the sample size
you will need.
Copyright (c) Bani Mallick
35
Sample Size Determination

General fact: the larger the population
standard deviation, the larger the sample
size you will need to have a CI of length
2xE
Copyright (c) Bani Mallick
36
Reactiver Oxygen Species (ROS)
Data

Rats fed with Fish oil enhanced diets

Response is the change in ROS for an animal
when the cells are exposed to butyrate
Copyright (c) Bani Mallick
37
ROS Data
14
10
12
10
3
8
6
4
2
0
-2
N=
20
Change in Response
Copyright (c) Bani Mallick
38
ROS Data

Sample mean = 3.21

Sample size is n = 20

Pretend s = 3.33

Then

95% interval for population mean change is

[3.21 - 0.74 * 1.96, 3.21 + 0.74 * 1.96] =
[1.76, 4.66]:

Does butyrate increase ROS? How
certain are we?
s
n
= 0.74
Copyright (c) Bani Mallick
39
ROS Data


s = 3.33, n = 20
95% interval for population mean change is
[1.76, 4.66]

The length of the CI is 2xE = 2.90

What sample size would I need to make the
length of the CI = 1.00? Here 2xE = 1.00, E =
0.50, and
2
2
s
3.33 


n   z a / 2    1.96
  170
E
0.50 


Copyright (c) Bani Mallick
40