dishonest-coin principle
Download
Report
Transcript dishonest-coin principle
16 Mathematics of Normal Distributions
16.1 Approximately Normal Distributions
of Data
16.2 Normal Curves and Normal
Distributions
16.3 Standardizing Normal Data
16.4 The 68-95-99.7 Rule
16.5 Normal Curves as Models of RealLife Data Sets
16.6 Distribution of Random Events
16.7 Statistical Inference
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 2
Statistical Inference
Suppose that we have an honest coin and
intend to toss it 100 times. We are going to
do this just once, and we will let X denote the
resulting number of heads. Been there, done
that! What’s new now is that we a have a
solid understanding of the statistical behavior
of the random variable X–it has an
approximately normal distribution with mean
= 50 and standard deviation = 5–and this
allows us to make some very reasonable
predictions about the possible values of X.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 3
Statistical Inference
For starters, we can predict the chance that X
will fall somewhere between 45 and 55 (one
standard deviation below and above the
mean)–it is 68%. Likewise, we know that the
chance that X will fall somewhere between
40 and 60 is 95%, and between 35 and 65 is
a whopping 99.7%.
What if, instead of tossing the coin 100 times,
we were to toss it n times?
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 4
Statistical Inference
Not surprisingly, bell-shaped distribution
would still be there–only the values of and
would change. Specifically, for n sufficiently
large (typically n ≥ 30), the number of heads
in n tosses would be a random variable with
an approximately normal distribution with
mean = n/2 heads and standard deviation
n / 2 heads. This is an important fact
for which we have coined the name the
honest-coin principle.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 5
THE HONEST-COIN PRINCIPLE
Let X denote the number of heads in n
tosses of an honest coin (assume
n ≥ 30). Then, X has an approximately
normal distribution with mean = n/2
and standard deviation n / 2.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 6
Example 16.9 Coin-Tossing
Experiments: Part 2
An honest coin is going to be tossed 256
times. Before this is done, we have the
opportunity to make some bets. Let’s say that
we can make a bet (with even odds) that if
the number of heads tossed falls somewhere
between 120 and 136, we will win; otherwise,
we will lose. Should we make such a bet?
Let X denote the number of heads in 256
tosses of an honest coin.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 7
Example 16.9 Coin-Tossing
Experiments: Part 2
By the honest-coin principle, X is a random
variable having a distribution that is approximately normal with mean = 256/2 = 128
heads and standard deviation 256 / 2 8
heads. The values 120 to 136 are exactly one
standard deviation below and above the
mean of 128, which means that there is a
68% chance that the number of heads will fall
somewhere between 120 and 136.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 8
Example 16.9 Coin-Tossing
Experiments: Part 2
We should indeed make this bet! A similar
calculation tells us that there is a 95% chance
that the number of heads will fall somewhere
between 112 and 144, and the chance that
the number of heads will fall somewhere
between 104 and 152 is 99.7%.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 9
Dishonest Coin
What happens when the coin being tossed
is not an honest coin? Surprisingly, the
distribution of the number of heads X in n
tosses of such a coin is still approximately
normal, as long as the number n is not too
small (a good rule of thumb is n ≥ 30). All
we need now is a dishonest-coin principle
to tell us how to find the mean and the
standard deviation.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 10
THE DISHONEST-COIN
PRINCIPLE
Let X denote the number of heads in n
tosses of a coin (assume n ≥ 30). Let p
denote the probability of heads on each
toss of the coin. Then, X has an
approximately normal distribution with
mean = n • P and standard deviation
n p 1 p .
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 11
Example 16.10 Coin-Tossing
Experiments: Part 3
A coin is rigged so that it comes up heads
only 20% of the time (i.e., p = 0.20). The coin
is tossed 100 times (n = 100) and X is the
number of heads in the 100 tosses. What can
we say about X?
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 12
Example 16.10 Coin-Tossing
Experiments: Part 3
According to the dishonest-coin principle, the
distribution of the random variable X is
approximately normal with mean
m = 100 0.20 = 20 and standard deviation
100 0.20 0.80 4.
Applying the 68-95-99.7 rule with = 20 and
= 4 gives the following facts:
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 13
Example 16.10 Coin-Tossing
Experiments: Part 3
■
■
■
There is about a 68% chance that X will be
somewhere between 16 and 24
( – ≤ X ≤ + ).
There is about a 95% chance that X will be
somewhere between 12 and 28
( – 2 ≤ X ≤ + 2 ).
The number of heads is almost guaranteed
(about 99.7%) to fall somewhere between
8 and 32 ( – 3 ≤ X ≤ + 3 ).
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 14
Example 16.10 Coin-Tossing
Experiments: Part 3
In this example, heads and tails are no longer
interchangeable concepts–heads is an
outcome with probability p = 0.20 while tails is
an outcome with much higher probability
(0.8). We can, however, apply the principle
equally well to describe the distribution of the
number of tails in 100 coin tosses of the same
dishonest coin: The distribution for the
number of tails is approximately normal with
mean = 100 0.80 = 80 and standard
deviation 100 0.80 0.20 4.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 15
Central Limit Theorem
The dishonest-coin principle is a special
version of one of the most important laws in
statistics, a law generally known as the
central limit theorem. We will now briefly
illustrate why the importance of the
dishonest-coin principle goes beyond the
tossing of coins.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 16
Example 16.11 Sampling for Defective
Light Bulbs
An assembly line produces 100,000 light
bulbs a day, 20% of which generally turn out
to be defective. Suppose that we draw a
random sample of n = 100 light bulbs. Let X
represent the number of defective light bulbs
in the sample. What can we say about X?
A moment’s reflection will show that, in a
sense, this example is completely parallel to
Example 16.10–think of selecting defective
light bulbs as analogous to tossing heads with
a dishonest coin.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 17
Example 16.11 Sampling for Defective
Light Bulbs
We can use the dishonest-coin principle to
infer that the number of defective light bulbs
in the sample is a random variable having an
approximately normal distribution with a mean
of 20 light bulbs and standard deviation of 4
light bulbs.
Using these facts, we can draw the following
conclusions:
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 18
Example 16.11 Sampling for Defective
Light Bulbs
■
■
■
There is a 68% chance that the number of
defective light bulbs in the sample will fall
somewhere between 16 and 24.
There is a 95% chance that the number of
defective light bulbs in the sample will fall
somewhere between 12 and 28.
The number of defective light bulbs in the
sample is practically guaranteed (a 99.7%
chance) to fall somewhere between 8 and
32.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 19
Example 16.11 Sampling for Defective
Light Bulbs
Probably the most important point here is that
each of the preceding facts can be rephrased
in terms of sampling errors (Chapter 13). For
example, say we had 24 defective light bulbs
in the sample; in other words, 24% of the
sample (24 out of 100) are defective light
bulbs. If we use this statistic to estimate the
percentage of defective light bulbs overall,
then the sampling error would be 4%
(because the estimate is 24% and the value
of the parameter is 20%).
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 20
Example 16.11 Sampling for Defective
Light Bulbs
By the same token, if we had 16 defective
light bulbs in the sample, the sampling error
would be –4%. Coincidentally, the standard
deviation is = 4 light bulbs, or 4% of the
sample. (We computed it in Example 16.10.)
Thus, we can rephrase our previous
assertions about sampling errors as follows:
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 21
Example 16.11 Sampling for Defective
Light Bulbs
■
When estimating the proportion of
defective light bulbs coming out of the
assembly line by using a sample of 100
light bulbs, there is a 68% chance that the
sampling error will fall somewhere between
–4% and 4%.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 22
Example 16.11 Sampling for Defective
Light Bulbs
■
When estimating the proportion of
defective light bulbs coming out of the
assembly line by using a sample of 100
light bulbs, there is a 95% chance that the
sampling error will fall somewhere between
–8% and 8%.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 23
Example 16.11 Sampling for Defective
Light Bulbs
■
When estimating the proportion of
defective light bulbs coming out of the
assembly line by using a sample of 100
light bulbs, there is a 99.7% chance that
the sampling error will fall somewhere
between –12% and 12%.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 24
Example 16.12 Sampling with Larger
Samples
Suppose that we have the same assembly
line as in Example 16.11, but this time we are
going to take a really big sample of n = 1600
light bulbs. Before we even count the number
of defective light bulbs in the sample, let’s see
how much mileage we can get out of the
dishonest-coin principle. The standard
deviation for the distribution of defective light
bulbs in the sample is 1600 0.2 0.8 16,
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 25
Example 16.12 Sampling with Larger
Samples
which just happens to be exactly 1% of the
sample (16/1600 = 1%). This means that
when we estimate the proportion of defective
light bulbs coming out of the assembly line
using this sample, we can have some sort of
a handle on the sampling error.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 26
Example 16.12 Sampling with Larger
Samples
■
■
■
We can say with some confidence (68%)
that the sampling error will fall somewhere
between –1% and 1%.
We can say with a lot of confidence (95%)
that the sampling error will fall somewhere
between –2% and 2%.
We can say with tremendous confidence
(99.7%) that the sampling error will fall
somewhere between –3% and 3%.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 27
Example 16.13 Measuring the Margin of
Error of a Poll
In California, school bond measures require a
66.67% vote for approval. Suppose that an
important school bond measure is on the
ballot in the upcoming election. In the most
recent poll of 1200 randomly chosen voters,
744 of the 1200 voters sampled, or 62%,
indicated that they would vote for the school
bond measure. Let’s assume that the poll was
properly conducted and that the 1200 voters
sampled represent an unbiased sample of the
entire population.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 28
Example 16.13 Measuring the Margin of
Error of a Poll
What are the chances that the 62% statistic is
the result of sampling variability and that the
actual vote for the bond measure will be
66.67% or more?
Here, we will use a variation of the dishonestcoin principle, with each vote being likened to
a coin toss: A vote for the bond measure is
equivalent to flipping heads, a vote against
the bond measure is equivalent to flipping
tails.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 29
Example 16.13 Measuring the Margin of
Error of a Poll
In this analogy, the probability (p) of “heads”
represents the proportion of voters in the
population that support the bond measure: If
p turns out to be 0.6667 or more, the bond
measure will pass. Our problem is that we
don’t know p, so how can we use the
dishonest-coin principle to estimate the mean
and standard deviation of the sampling
distribution?
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 30
Example 16.13 Measuring the Margin of
Error of a Poll
We start by letting the 62% (0.62) statistic
from the sample serve as an estimate for the
actual value of p in the formula for the
standard deviation given by the dishonestcoin principle. (Even though we know that this
is only a rough estimate for p, it turns out to
give us a good estimate for the standard
deviation .)
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 31
Example 16.13 Measuring the Margin of
Error of a Poll
Using p = 0.62 and the dishonest-coin
principle, we get
n p 1 p 1200 0.62 0.38 16.8
votes. This number represents the
approximate standard deviation for the
number of “heads” (i.e., voters who will vote
for the school bond measure) in the sample.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 32
Example 16.13 Measuring the Margin of
Error of a Poll
If we express this number as a percentage of
the sample size, we can say that the standard
deviation represents approximately 1.4% of
the sample (16.8/1200 = 0.014).
The standard deviation for the sampling
distribution of the proportion of voters in favor
of the measure expressed as a percentage of
the entire sample is called the standard
error. (For our example, we have found
above that the standard error is approximately
1.4%.)
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 33
Example 16.13 Measuring the Margin of
Error of a Poll
In sampling and public opinion polls, it is
customary to express the information about
the population in terms of confidence
intervals, which are themselves based on
standard errors: A 95% confidence interval is
given by two standard errors below and
above the statistic obtained from the sample,
and a 99.7% confidence interval is given by
going three standard errors below and above
the sample statistic.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 34
Example 16.13 Measuring the Margin of
Error of a Poll
For the school bond measure, a 95%
confidence interval is 62% plus or minus
2 (1.4%) = 2.8%. This means that we can
say with 95% confidence (we would be right
approximately 95 out of 100 times) that the
actual vote for the bond measure will fall
somewhere between 59.2% (62 – 2.8) and
64.8% (62 + 2.8) and thus that the bond
measure will lose.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 35
Example 16.13 Measuring the Margin of
Error of a Poll
Take a 99.7% confidence interval of 62% plus
or minus 3 (1.4%) = 4.2%–it is almost
certain that the actual vote will turn out
somewhere in that range. Even in the most
optimistic scenario, the vote will not reach the
66.67% needed to pass the bond measure.
Copyright © 2010 Pearson Education, Inc.
Excursions in Modern Mathematics, 7e: 16.7 - 36