Transcript Figure 15.2

Chapter 15
Sections 1-5
Bell Ringer
24.
a.
c.
Event A occurs with probability 0.1. Event B occurs with probability 0.6. If A and B are independent, then
P(A and B) = 0.70.
b.
P(A or B) = 0.64.
P(A and B) = 0.64.
d.
P(A or B) = 0.70.
25.
An event A will occur with probability 0.5. An event B will occur with probability 0.6. The probability that both A
and B will occur is 0.1. The conditional probability of B, given A, is
a.
5/6.
b.
1/5.
c.
1/6.
d.
It cannot be determined from the information given.
26.
An event A will occur with probability 0.5. An event B will occur with probability 0.6. The probability that both A
and B will occur is 0.1. We may conclude
a.
events A and B are independent.
b.
events A and B are disjoint.
c.
either A or B always occurs.
d.
None of the above
I CAN:
Daily Agenda
Remember s and p: statistics come from samples, and parameters come from populations.
As long as we were just doing data analysis, searching for patterns, or summarizing features of our data, the
distinction between population and sample was not important. Now, as we begin to understand what our data
(sample) tell us about a population, it is essential.
The notation we use must reflect this distinction. We write μ (the Greek letter mu) for the population mean and
σ (the Greek letter sigma) for the population standard deviation. These are fixed parameters that are unknown
when we use a sample for inference.
The sample mean is the familiar , the average of the observations in the sample.
The sample standard deviation is denoted by s, the standard deviation of the observations in the sample.
These are statistics that would almost certainly take different values if we chose another sample from the same
population. The sample mean and sample standard deviation s from a sample or an experiment are estimates
of the mean μ and standard deviation σ of the underlying population.
15.1Genetic Engineering. Here’s an idea for treating advanced melanoma, the most serious kind of skin
cancer: genetically engineer white blood cells to better recognize and destroy cancer cells, then infuse these cells
into patients. The subjects in a small initial study of this approach were 11 patients whose melanoma had not
responded to existing treatments. One outcome of this experiment was measured by a test for the presence of
cells that trigger an immune response in the body and so may help fight cancer. The mean counts of active cells
per 100,000 cells for the 11 subjects were 3.8 before infusion and 160.2 after infusion. Is each of the boldface
numbers a parameter or a statistic?
15.2 Florida Voters. Florida played a key role in recent presidential elections. Voter registration records in
February 2014 show that 39% of Florida voters are registered as Democrats and 35% as Republicans. (Most of
the others did not choose a party.)To test a random digit dialing device that you plan to use to poll voters for the
2014 Senate elections, you use it to call 250 randomly chosen residential telephones in Florida. Of the registered
voters contacted, 34% are registered Democrats. Is each of the boldface numbers a parameter or a statistic?
15.3 Steroid Use. Researchers surveyed 500 American anabolic androgenic steroid
users, ranging in age from 16 to 62, and found that 98.8% of them were male. The
proportion of all Americans between the ages of 16 and 62 who are male
is 50.0%.The median age at which those surveyed began using steroids was 22. Is each of
the boldface numbers a parameter or a statistic?
Statistical inference uses sample data to draw conclusions about the entire
population.
Because good samples are chosen randomly, statistics such as computed
from these samples are random variables.
We can describe the behavior of a sample statistic by a probability model that
answers the question, “What would happen if we did this many times?”
Here is an example that will lead us toward the probability ideas most
important for statistical inference.
If is rarely exactly right and varies from sample to sample, why is it
nonetheless a reasonable estimate of the population mean μ? Here
is one answer: if we keep on taking larger and larger samples, the
statistic is guaranteed to get closer and closer to the parameter μ.
We have the comfort of knowing that if we can afford to keep on
measuring more subjects, eventually we will estimate the mean
odor threshold of all adults very accurately.
This remarkable fact is called the law of large numbers. It is
remarkable because it holds for any population, not just for some
special class such as Normal distributions.
The law of large numbers is the foundation of such business enterprises as
gambling casinos and insurance companies. The winnings (or losses) of a
gambler on a few plays are uncertain—that’s why some people find gambling
exciting. In Figure 15.1, the mean of even 100 observations is not yet very close
to μ.
It is only in the long run that the mean outcome is predictable. The house plays
tens of thousands of times. So the house, unlike individual gamblers, can count
on the long-run regularity described by the law of large numbers. The average
winnings of the house on tens of thousands of plays will be very close to the
mean of the distribution of winnings. Needless to say, this mean guarantees the
house a profit. That’s why gambling can be a business.
15.5 Insurance. The idea of insurance is that we all face risks that are unlikely but carry high cost. Think of a
fire or flood destroying your apartment. Insurance spreads the risk: we all pay a small amount, and the insurance
policy pays a large amount to those few of us whose apartments are damaged. An insurance company looks at
the records for millions of apartment owners and sees that the mean loss from apartment damage in a year is μ =
$125 per person. (Most of us have no loss, but a few lose most of their possessions. The $125 is the average
loss.) The company plans to sell renters insurance for $125 plus enough to cover its costs and profit. Explain
clearly why it would be unwise to sell only 12 policies. Then explain why selling thousands of such policies is a
safe business.
The law of large numbers assures us that if we measure enough subjects, the statistic will
eventually get very close to the unknown parameter μ.
But the odor threshold study in Example 15.2 had just 10 subjects. What can we say about
estimating μ by from a sample of 10 subjects?
Put this one sample in the context of all such samples by asking, “What would happen if we
took many samples of 10 subjects from this population?” Here’s how to answer this question:
•Take a large number of samples of size 10 from the population.
•Calculate the sample mean
for each sample.
•Make a histogram of the values of
.
•Examine the shape, center, and variability of the distribution displayed in the histogram.
In practice it is too expensive to take many samples from a large population such as all adult
U.S. residents. But we can imitate many samples by using software. Using software to imitate
chance behavior is called simulation.
FIGURE 15.2
The idea of a sampling distribution: take many samples from the same population, collect the ’s from all
the samples, and display the distribution of the ’s. The histogram shows the results of 1000 samples.
We can use the tools of data analysis to describe any
distribution. Let’s apply those tools to Figure 15.2. What can we
say about the shape, center, and variability of this distribution?
•Shape: It looks Normal! Detailed examination confirms that the
distribution of from many samples is very close to Normal.
•Center: The mean of the 1000 ’s is 24.95. That is, the
distribution is centered very close to the population mean μ = 25.
•Variability: The standard deviation of the 1000 ’s is
2.217, notably smaller than the standard deviation σ = 7 of the
population of individual subjects.
Although these results describe just one simulation of a sampling
distribution, they reflect facts that are true whenever we
use random sampling.
15.6 Sampling Distribution versus Population Distribution. The 2012 American Time Use Survey contains
data on how many minutes of sleep per night each of 12,443 survey participants estimated they get.2 The times
follow the Normal distribution with mean 528.8 minutes and standard deviation 137.2 minutes. An SRS of 100 of
the participants has a mean time of
= 509.23 minutes. A second SRS of size 100 has mean
= 530.32
minutes. After many SRSs, the many values of the sample mean follow the Normal distribution with mean 528.8
minutes and standard deviation 13.72 minutes.
(a)What is the population? What values does the population distribution describe? What is this distribution?
(b)What values does the sampling distribution of x describe? What is the sampling distribution?
Figure 15.2 suggests that when we choose many SRSs from a population, the sampling distribution of
the sample means is centered at the mean of the original population and is less variable (spread out) than
the distribution of individual observations. Here are the facts.
By “large population” we mean that the size of the population is much larger than the size of the sample—say, at
least 20 times as large.
These facts about the mean and the standard deviation of the sampling distribution of are true
for any population, not just for some special class such as Normal distributions.
They have important implications for statistical inference:
The upshot of all this is that we can trust the sample mean from a large
random sample to estimate the population mean accurately.
If the sample size n is large, the standard deviation of is small, and almost all
samples will give values of that lie very close to the true parameter μ.
However, the standard deviation of the sampling distribution gets smaller
only at the rate . To cut the standard deviation of in half, we must take four
times as many observations, not just twice as many.
So very precise estimates (estimates with very small standard deviation) may
be expensive.
We have described the center and variability of the sampling distribution of a sample mean ,
but not its shape. The shape of the sampling distribution depends on the shape of the
population distribution.
In one important case there is a simple relationship between the two distributions: if the
population distribution is Normal, then so is the sampling distribution of the sample mean.
15.8A Sample of Young Men. A government sample survey plans to measure the LDL (bad) cholesterol level of an
SRS of men aged 20 to 34. The researchers will report the mean from their sample as an estimate of the mean LDL
cholesterol level μ in this population.
(a)Explain to someone who knows no statistics what it means to say that is an “unbiased” estimator of μ.
(b)The sample result is an unbiased estimator of the population truth μ no matter what size SRS the study uses. Explain
to someone who knows no statistics why a large sample gives more trustworthy results than a small sample.
Draw an SRS of size n from any population with mean μ and finite standard deviation σ. The central limit
theorem says that when n is large, the sampling distribution of the sample mean is approximately Normal:
The central limit theorem allows us to use Normal probability calculations to answer questions about sample
means from many observations even when the population distribution is not Normal.
How large a sample size n is needed for to be close to Normal depends on the
population distribution. More observations are required if the shape of the population
distribution is far from Normal. Here are two examples in which the population is far
from Normal.