Transcript PowerPoint
Mar. 22 Statistic for the day:
Percent of Americans 18 or older who
believe Martha Stewart’s sentence
should include jail time: 53%
Source: gallup.com
Assignment:
Read Chapter 18
Exercises p329: 1, 2, 5, 6, 8, 10
These slides were created by Tom Hettmansperger and in some cases
modified by David Hunter
Sample percentages: categorical variable
Do you believe Martha Stewart got a fair trial?
Do you believe Martha Stewart’s sentence should
include jail time?
Gallup Poll
Yes
No
No opinion
Fair trial
66% 27%
7%
Jail time
53% 40%
7%
The Gallup Poll was based on 1005 telephone interviews.
Based on the sample of 1005 we estimate that 53% of
the population of millions believes that Martha Stewart’s
sentence should include jail time.
If we take a new sample of 1005 we will get a new
sample percentage. It will generally not be exactly 53%.
If we take lots of samples of 1005 we will get lots of
sample percentages.
Next we look at the histogram for the percentages.
Histogram of PERCENT, with Normal Curve
Frequency
20
10
0
49
50
51
52
53
54
55
56
57
58
59
PERCENT
200 percentages based on 200 samples of 1005 each.
Mean = 53% (or .53)
Standard deviation = (57% − 50%) / 4 = 1.75% (or .0175)
How do we measure and assess the uncertainty
in the sample percentage?
1
marginof error
samplesize
So in our example, if the sample size is 1600,
then the MARGIN OF ERROR is:
1
1 1 .032
samplesize 1005 31.7
Or 3.2 %
And we report 53% + 3.2%
We defined the margin of error to be 2 standard deviations.
We estimated the standard deviation from the histogram to
be .0175. This nearly agrees since 2x.0175 = .035. Pretty close!
Summary: Gallup Poll
We have a simple random sample from
the population of telephone owners.
The sample size used was 1005.
We find the percentage from our sample.
The MARGIN OF ERROR is 1 divided by
the square root of the sample size.
For 1005 the MARGIN OF ERROR is .032.
Hence we report: PERCENTAGE + .032
The margin of error does not depend on the
population size, only on the sample size!
Goals
•
To refine the idea of standard deviation (for later
use in a refined margin of error).
•
We also want to relate this to the normal curve.
In the past we:
1. used a sample to get a sample proportion
2. used a formula to get the margin of error
3. reported the sample proportion + the margin of error
Now we want a formula for the standard deviation.
Then we will use the new standard deviation formula
to calculate a new margin of error.
Formula for estimating the standard
deviation of a sample proportion
(don’t need histogram):
sample proportion (1 sample proportion )
sample size
.53 (1 .53)
.016
1005
If we happen to know the true population proportion we use it
instead of the sample proportion.
Histogram with Normal Curve
1000 percentages each based on
a sample of 1005
90
80
70
Frequency
60
50
40
30
20
10
0
0.466
std dev =
.016
0.482
0.498
0.514
0.530
0.546
4 standard dev iations
0.562
0.578
0.594
Summary:
1. We take a sample of 1005 phone interviews
2. We estimate the percent of the American public
that thinks that Martha Stewart should go to jail:
53%
3. To assess the uncertainty in the 53% sample
figure, we think of a normal curve of percentages
centered at .530 with standard deviation of
.016.
4. So the normal curve has 95% of its distribution
between .530 – 2x.016 and .530 + 2x.016 or
Estimate 53% (.53) with 50% to 56% (.50 to .56)
the reasonable interval of values.
What to expect from sample proportions
Facts: fingerprints may be influenced by prenatal hormones.
Most people have more ridges on right hand than left.
People who have more on the left hand are said to have
leftward asymmetry.
Women are more likely to have this trait than men.
The proportion of all men who have this trait is about 15%
In a study of 186 heterosexual and 66 homosexual men
26 (14%) heterosexual men showed the trait and
20 (30%) homosexual men showed the trait
(Reference: Hall, J. A. Y. and Kimura, D. "Dermatoglyphic
Asymmetry and Sexual Orientation in Men", Behavioral
Neuroscience, Vol. 108, No. 6, 1203-1206, Dec 94. )
Is it unusual to observe a sample of 66 men and observe
a sample proportion of 30%?
We now know what the distribution of sample proportions
based on a sample of 66 should look like. We will suppose
that the true proportion in the population of men is 15%.
Standard
deviation
.15 ( 1 .15 )
.044
66
Histogram of proportions, with Normal Curve
n = 66, true proportion = .15, standard deviation
= .044
Frequency
15
10
5
2 std devs
0
0.0
0.062
0.1
0.15
0.2
0.238
0.3
homosexual men
4 standard deviations
The sample proportion for homosexual men (30%) is too
large to come from the expected distribution of sample
proportions.
Sample means: measurement variables
Suppose we want to estimate the mean weight at PSU
Histogram of Weight, with Normal Curve
40
Frequency
30
20
10
0
100
200
300
Weight
Data from stat 100 survey. Sample size 237.
Mean value is 152.5 pounds.
Standard deviation is about (240 – 100)/4 = 35
What is the uncertainty in the mean?
We need a margin of error for the mean.
Suppose we take another sample of 237.
What will the mean be?
Will it be 152.5 again?
Probably not.
Consider what happens if we take 1000 samples
each of size 237 and compute 1000 means.
Histogram of 1000 means with normal
curve, based on samples of size 237
Frequency
100
50
0
145
150
155
Weight
Standard deviation is about
(157 – 148)/4 = 9/4 = 2.25
160
Formula for estimating the standard deviation
of the sample mean (don’t need histogram)
Just like in the case of proportions, we would
like to have a simple formula to find the
standard deviation of the mean without having
to resample a lot of times.
Suppose we have the standard deviation of the
original sample. Then the standard deviation
of the sample mean is:
standard deviation of the data
sample size
So in our example of weights:
The standard deviation of the sample is about 35.
Hence by our formula:
Standard deviation of the mean is 35 divided by
the square root of 237:
35/15.4 = 2.3
(Recall we estimated it to be 2.25)
So the margin of error of the sample mean is
2x2.3 = 4.6
Report 152.5 + 4.6 or 147.9 to 157.1
Example: SAT scores
Suppose nationally we know that the SAT has a
mean of 425 points and a standard deviation of 120 points.
Draw by hand a picture of what you expect the distribution
of sample means based on samples of size 100 to look like.
Sample means have a normal distribution
mean 425
standard deviation 120/10 = 12
So draw a bell shaped curve, centered at 425, with 95%
of the bell between 425 – 24 = 401 and 425 + 24 = 449
Normal Curve of SAT means
based on samples of 100
Frequency
15
10
5
4 std devs
0
390
400
410
420
425
430
440
450
460
mean = 425 std dev = 12
A sample of 100 SATs with a mean of 460 would be very
unusual. A sample of 100 with a mean of 440 would not be
unusual.