Sampling distribution of

Download Report

Transcript Sampling distribution of

Chapter 5
Sampling Distributions
Chapter 5.1
Sampling Distributions of
sample mean X-bar
Review Chap3: Population versus sample
• Population: The entire group
of individuals in which we are
interested but can’t usually
assess directly.

Sample: The part of the
population we actually examine
and for which we do have data.
Population
Sample
• A parameter is a number
describing a characteristic of
the population.

A statistic is a number
describing a characteristic of a
sample.
Objectives (chapter 5.1)
Sampling distribution of a sample mean

Sampling distribution of sample mean (x-bar)

For normally distributed populations

The central limit theorem
Question: In high school, when doing Physics and Chemistry
experiments, why do we need to repeat an experiment for multiple
times? Then take an average as our final experiment result. It sounds
to only waste our time, energy and materials on the repetition. Is it
correct?
Simple random sample (SRS)
Data are summarized by statistics
(mean, standard deviation, median,
quartiles, correlation, etc..)
Concerns:
1) Is sample mean related to population mean?
2) If yes, what will be the relationship? Or say, how far or how close is a sample
mean away from the population mean?
Sampling Distribution of sample mean of 10 random digits
(1) Select 10 random digits from Table B, and then take the sample mean;
(2) Repeat this process 4 times for each student from Dr. Chen’s class.
More details with illustration:
1. Based on Table B (random digit table), we randomly select a line, for example line 106 in
this case:
2. Take sample average of random digits of (6, 8, 4, 1, 7, 3, 5, 0, 1, 3). We will
have sample mean as
sample mean #1=(6+8+4+1+7+3+5+0+1+3) /10=3.8;
Now we move forward to another set of 10 random digits of (1, 5, 5, 2, 9, 7, 2, 7, 6,
5). We will have the sample mean as
sample mean #2=(1+5+5+2+9+7+2+7+6+5) /10=4.9;
Repeat this procedure 4 times until you get sample mean #4.
Sampling Distribution of sample mean of 10 random digits
(2)2.2 2.2
(7)2.9 2.9 3.0 3.0 3.0 3.0 3.0
(8)3.1 3.1 3.2 3.2 3.3 3.4 3.4 3.5
(12)3.6 3.6 3.7 3.8 3.8 3.8 3.8 3.8 3.8 3.8 3.9 4.0
(16)4.1 4.1 4.2 4.2 4.2 4.3 4.3 4.3 4.3 4.4 4.4 4.4 4.5 4.5 4.5 4.5
(10)4.6 4.7 4.7 4.7 4.8 4.9 4.9 4.9 4.9 4.9
(8)5.2 5.2 5.3 5.3 5.3 5.3 5.5 5.5
(4)5.9 6.0 6.0 6.0
(4)6.2 6.2 6.3 6.4
(1)6.8
Q: Draw a histogram with classes as:
Class
Counts
(2, 2.5]
(2.5,3]
(3, 3.5]
(3.5, 4]
(4, 4.5]
(4.5, 5]
(5, 5.5]
(5.5, 6]
(6, 6.5]
(6.5, 7]
Sampling Distribution of sample mean of 10 random digits
Class
(2, 2.5]
(2.5,3]
(3, 3.5]
(3.5, 4]
(4, 4.5]
(4.5, 5]
(5, 5.5]
(5.5, 6]
(6, 6.5]
(6.5, 7]
Counts
2
7
8
12
16
10
8
4
4
1
Sampling
distribution
of “x bar”
Histogram
of some
sample
averages
Q: Write a journal about how to get the sampling distribution of Sample mean X-bar
today, by answering the following questions:
1) How to obtain X-bar’s, starting from Table B for each student?
2) How many X-bar’s did we have totally in the class?
3) How to make a histogram for X-bar? What is the name of the histogram?
4) What did the smooth curve represent?
5) For the smooth curve, what did the horizontal axis and vertical axis present?
Sampling Distribution
Select 10 random digits
from Table B
1st Sample
3
8
6
8
3
4
9
4
Sample
mean
8
=6
8
= 4.5
7
2nd Sample
9
0
8
4
6
3
4
2
5
6
2
7
2
7
6
0
3
7
1
25th Sample
Population
5
There is some variability in values
of a statistic over different samples.
0
9
1
6
3
4
9
8
1
= 4.6
Population Distribution for 10 random digits
Population distribution of 0-9 random digits
X
Prob
0
1
2
1/10 1/10 1/10
3
4
5
6
1/10
1/10
1/10
1/10
7
8
1/10 1/10
9
1/10
Sampling Distribution of sample mean of 10 random digits
(1) Select 10 random digits from Table B, and then take the sample mean;
(2) Repeat this process 25 times for each students Spring 2012.
(3) Make a histogram of sample mean’s from the class with 1098 X-bar’s. The
probability distribution looks like a Normal distribution.
Sampling
distribution
of “x bar”
Histogram
of some
sample
averages
The probability distribution of
a statistic is called its
sampling distribution.
For the histogram:
Center of X-bar = 4.541
SD of X-bar 
= 0.9
X
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.400 3.600 4.400 4.451 5.400 7.800
Sampling Distribution of sample mean of 10 random digits
9
0
8
4
6
3
2
5
1
7
7 8
8 4
4 6 2
3 6 3
0
8
2 7 3 6
8 4 7 9
1 8
3 7 6 2 3
5 9 6
2
8
7
1
13 549
2 6
0 3 9 4
5
4
0 3 1
Population
Center of X-bar = 4.5
 0.9
SD of X-bar =
Mean and standard deviation of a sample mean
For any population with mean m and standard deviation s:
The mean, or center of the sampling distribution of x bar, is equal to
the population mean m :

The standard deviation of the sampling distribution is s/√n, where n
is the sample size :

Sample Mean’s are less
variable than individual
observations.
For normally distributed populations
When a variable in a population is normally distributed, the sampling
distribution of x bar for all possible samples of size n is also normally
distributed.
Sampling distribution
If the population is N(m, s)
then the sample means
distribution is N(m, s/√n).
Population
Review: Chapter 1.3, Normal distribution
An important property of a
density curve is that areas
under the curve correspond
to relative frequencies
Example: The National Collegiate Athletic Association (NCAA) requires Division I
athletes to score at least 820 on the combined math and verbal SAT exam to
compete in their first college year. The SAT scores of 2003 were approximately
normal with mean 1026 and standard deviation 209. What proportion of all
students would be NCAA qualifiers (SAT ≥ 820)?
x  820; m  1026; s  209
( x  m ) (820  1026)  206
z


 0.99
s
209
209
Use Calculator and find :
normalcdf( -0.99, E99, 0, 1)  0.84
Sampling distribution of a sample mean=distribution of
X
Population
Distribution of X, (n=1):
Exact N(m , s )
Sampling distribution of X , (n>1) :
Exact N(m ,
s
)
n
Not Exact Normal, but with
Approximately N(m ,
s
)
n
Mean m , and SD s
Standardize: Z-score of X
(By Central Limit Theorem)
X m
 s 

; Reverse: X  m  
*Z
 s 
 n


 n
16
Example: Soda Drink
Let X denote the actual volume of soda in a randomly
selected can.
Suppose X~N(12oz, 0.4oz), 16 cans are to be selected.
a) The average volume is normally distributed with mean____
and standard deviation___.
b) Find the probability that the sample average is greater than
12.1 oz.
Mean of x-bar = 12;
SD of x-bar = 0.1;
P(Z>1) = 1-0.8413
= 0.1587.
If the population is N(m, s)
then the sample means
distribution is N(m, s/√n).
Examples

Diabetes during pregnancy. A patient is classified as having
gestational diabetes if the glucose level is above 140 mg/dl one
hour after a sugary drink. Patient Sheila’s glucose level follows a
Normal distribution with m125 mg/dl, s10 mg/dl.

(a) If a single glucose measurement is made, what is the probability
that Sheila is diagnosed as having gestational diabetes.

(b) If measurements are made instead on three separated days and
the mean result is compared with criterion 140 mg/dl, what is the
probability that Sheila is diagnosed as having gestational diabetes.
(a) n=1:
Let X be Sheila’s measured
glucose level. (a) P(X > 140)
= P(Z > 1.5) = 0.0668.
(b) n=3:
If x is the mean of three
measurements, then x-bar has a
N(125, 10/√3 ) or N(125 mg/dl,
5.7735 mg/dl) distribution, and P(x
> 140) = P(Z >2.60) = 0.0047.
If the population is N(m, s) then
the sample means distribution
is N(m, s/√n).
For Normal distributed
populations
If the population is N(m, s)
then the sample means
distribution is N(m, s/√n).
Concern:
What will happen when
sample size gets bigger
and bigger?
Review----Sampling Distribution of sample mean of 10 random digits
9
0
8
4
6
3
2
5
1
7
7 8
8 4
4 6 2
3 6 3
0
8
2 7 3 6
8 4 7 9
1 8
3 7 6 2 3
5 9 6
2
8
7
1
13 549
2 6
0 3 9 4
5
4
0 3 1
Population
Center of X-bar = 4.5
 0.9
SD of X-bar =
Central Limit Theorem (CLT)
m
Population with
strongly skewed
distribution
Sampling
distribution of
for n = 10
observations
s
Sampling
distribution of
for n = 2
observations
Sampling
distribution of
for n = 25
observations
For Non-Normal
distributed populations
CLT says that:
Even if the population is
NOT Normal, but with
mean m and SD s, when
sample size is large
enough, the sample means
distribution is N(m, s/√n)
approximately.
Concern:
What will happen when
sample size gets bigger
and bigger?
Sampling distribution of a sample mean=distribution of
X
Population
Distribution of X, (n=1):
Exact N(m , s )
Sampling distribution of X , (n>1) :
Exact N(m ,
s
)
n
Not Exact Normal, but with
Approximately N(m ,
s
)
n
Mean m , and SD s
Standardize: Z-score of X
(By Central Limit Theorem)
X m
 s 

; Reverse: X  m  
*Z
 s 
 n


 n
23
IQ scores: population vs. sample
In a large population of adults, the mean IQ is 112 with standard
deviation 20. Suppose 200 adults are randomly selected for a market
research campaign.
The
distribution of the sample mean IQ is:
A) Exactly normal, mean 112, standard deviation 20
B) Approximately normal, mean 112, standard deviation 20
C) Approximately normal, mean 112 , standard deviation 1.414
D) Approximately normal, mean 112, standard deviation 0.1
C) Approximately normal, mean 112 , standard deviation 1.414
Population distribution : N(112; 20)
Sampling distribution for n = 200 is N(112; 1.414)
Examples

Songs on an iPod. An ipod has about 10,000 songs. The
distribution of the play time for these songs is highly skewed.
Assume that the standard deviation for the population is 280
seconds.

(a) What is the standard deviation of the average time when
you take an SRS of 10 songs from this population?

(b) How many songs would you need to sample if you wanted
the standard deviation of x-bar to be 15 seconds?
(a)
The standard deviation is
σ/√10 = 280/√10 ~ 88.5438
seconds.
(b)
In order to have σ/√n = 280/√n = 15 seconds,
we need √n = 280/15 ~ 18.667,
so n ~ (18.667)^2 = 348.5 — use n = 349.
Example: children’s attitudes toward reading

In the journal Knowledge Quest (Jan/Feb 2002), education
professors at the University of Southern California investigated
children’s attitudes toward reading. One study measured third
through sixth graders’ attitudes toward recreational reading on a
140-point scale. The mean score for this population of children
was 106 with a standard deviation of 16.4.

In a random sample of 36 children from this population,

a) what is the sampling distribution of x-bar?

b) find P(x<100).
Answer to Example 4
X follows Approximately N(m ,
s
)=N(106,
Standardize: Z-score of X 
 s 


 n
)= N(106, 2.7333)
36
n
X m
16.4
=
100  106
2.7333
 2.20

Z=-2.20

Probability=normalcdf(-E99, -2.20, 0, 1)=0.0139
More Exercise on Chapter 5.1:
1. You were told that the weight of a new born baby follows normal distribution
with mean 7 pounds and SD 0.5 pounds. The average weight of the next 16
new born in your local hospital is around ______, with SD _____.
what’s the prob that the average is between 7.2 and 7.5 pounds?
2. The carbon monoxide in a certain brand of cigarette (in milligrams) follows
normal distribution with mean 12 and SD 1.8. For 40 randomly selected
cigarettes,
a) What is the sampling distribution of sample mean?
b) Find the prob that the average carbon monoxide is between 10 and 13.
3. The amount of time that a drive-through bank teller spends on a customer
follows normal distribution with mean 4 minutes and SD 1.5 minutes. For the
next 50 customers, find the prob that the average time spent is more than 5
minutes
4. The rate of water usage per hour (in Thousands of gallons) by a community
follows normal distribution with mean 5 and SD 2. For the next 30 hours,
a)
What is the sampling distribution of sample mean?
b)
Find the probability that the average rate of usage per hour is less than 4?
Answer: 1. new SD=0.125, Z7.2=1.6, Z7.5=4, area=1-0.9452=0.0548
2. new SD=0.285, Z10=-7.02, Z13=3.5, area is almost 100%
3. new SD=0.212, Z5=4.72, area is almost zero.
4. new SD=0.365, Z4=-2.74, area=1-0.9452=0.0031.
28
EX: 5.7, 5.8, 5.18(a-c), 5.24, 5.21,5.12
Chapter 5.2
Sampling Distributions of
sample proportion p-hat
Review: Sampling proportion p-hat
Sample proportion: (p-hat, or relative frequency)
p̂ 
Population proportion:
p
count in the sample
Total
Reminder from Chapter 3: Sampling variability
Each time we take a random sample from a population, we are likely to
get a different set of individuals and calculate a different statistic. This
is called sampling variability.
If we take a lot of random samples of the same size from a given
population, the variation from sample to sample—the sampling
distribution—will follow a predictable pattern.
Sampling Distribution of sample proportion of 10 random digits
(1) Select 10 random digits from Table B, and then take the sample proportion of
EVEN numbers;
(2) Repeat this process 4 times for each student from Dr. Chen’s class.
More details with illustration:
1. Based on Table B (random digit table), we randomly select a line, for example line 106 in
this case:
2. Take sample proportion of EVEN numbers of random digits of (6, 8, 4, 1, 7, 3, 5,
0, 1, 3). We will have sample proportion of EVEN #’s and gives
sample proportion #1 = 4/10=0.4;
Now we move forward to another set of 10 random digits of (1, 5, 5, 2, 9, 7, 2, 7, 6,
5), and we will have sample mean and gives
sample proportion #2 = 3 /10=0.3;
Repeat this procedure 4 times until you get sample proportion #4.
Sampling Distribution of sample mean of 10 random digits
(1)0.1
(2)0.2 0.2
(5)0.3 0.3 0.3 0.3 0.3
(21)0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
0.4 0.4 0.4 0.4
(17)0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
(17)0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6
(7)0.7 0.7 0.7 0.7 0.7 0.7 0.7
(5)0.8 0.8 0.8 0.8 0.8
(1)0.9
Q: Draw a histogram with classes as: (for line 101-120 in Table B)
Class
Counts
(0, 0.1]
(0.1, 0.2]
(0.2, 0.3]
(0.3, 0.4]
(0.4, 0.5]
(0.5, 0.6]
(0.6, 0.7]
(0.7, 0.8]
(0.8, 0.9]
Sampling Distribution of sample mean of 10 random digits
Class
(0, 0.1]
(0.1, 0.2]
(0.2, 0.3]
(0.3, 0.4]
(0.4, 0.5]
(0.5, 0.6]
(0.6, 0.7]
(0.7, 0.8]
(0.8, 0.9]
Counts
1
2
5
21
17
17
7
5
1
Q: Write a journal about how to get the
sampling distribution of Sample
proportion p-hat today, by answering the
following questions:
Sampling
distribution
of “p hat”
Histogram
of some
sample
proportion
1) How to obtain p-hat’s from Table B for
each student?
2) How many p-hat’s did we have totally in
the class?
3) How to make a histogram for p-hat?
What is the name of the histogram?
4) What did the smooth curve represent?
5) For the smooth curve, what did the
horizontal axis and vertical axis
present?
Sampling Distribution
Select 10 random digits from
Table B and find sample
proportion of even #
1st Sample
3
8
6
8
3
4
9
4
8
7
2nd Sample
9
0
8
4
6
3
4
2
5
6
2
7
2
7
6
8
0
3
7
1
25th Sample
Population
5
There is some variability in values
of a statistic over different samples.
0
9
1
6
3
4
9
8
1
Sample
proportion
Sampling Distribution of sample proportion of even # of 10 random digits
(1) Select 10 random digits from Table B, and then take the sample proportion of even #.
(2) Repeat this process a lot of times, say 10,000 times.
(3) Make a histogram of these 10,000 sample mean’s. The probability distribution looks like a
Normal distribution.
Sampling
distribution
of “p-hat”
Histogram
of some
sample
proportion
The probability distribution of
a statistic is called its
sampling distribution.
Center of p-hat = 0.5018
SD of p-hat = 0.1598
Note: n=10.
SD of p-hat =
p(1  p)
n
Sampling distribution of the sample proportion
The sampling distribution of p̂
is never exactly normal. But as the sample size
increases, the sampling distribution of p̂becomes approximately normal.
The normal approximation is most accurate for any fixed n when p is close to
0.5, and least accurate when p is near 0 or near 1.
Sampling Distribution of p̂

If data are obtained from a SRS and np>10 and n(1-p)>10, then
the sampling distribution of p̂ has the following form:

For sample percentage:
p̂ is approximately normal with mean p and

standard deviation:
p(1  p)
n
Sampling distribution of a sample Proportion=distribution of
p follows Approximately N(p ,
p (1  p )
)
n
Standardize: Z-score of p 
p p
p (1  p )
;
n
Reverse: p  p 
p (1  p )
*Z
n
39
p
Example 1
Maureen Webster, who is running for mayor in a large city, claims that she is
favored by 53% of all eligible voters of that city. Assume that this claim is
true. In a random sample of 400 registered voters taken from this city.
Find Population proportion p= _________.
a.) What is the sampling distribution of p-hat?
b) What is the probability of getting a sample proportion less than 49% in which
will favor Maureen Webster?
c.) Find the probability of getting a sample proportion in between 50% and
55%.
(b) Z=(0.49-0.53)/0.02495 = -1.60
Pr(Z<-1.60)
=normalcdf(-E99, -1.6, 0, 1)
= 0.0548
(c) Z=(0.5-0.53)/0.02495 = -1.20;
Z=(0.55-0.53)/0.02495 = 0.80;
Pr(-1.20 <Z<0.80)
=normalcdf(-1.20, 0.80, 0, 1)
=0.673
Example 2

The Gallup Organization surveyed 1,252 debit cardholders in the
U.S. and found that 180 had used the debit card to purchase a
product or service on the Internet (Card Fax, November 12, 1999).
Suppose the true percent of debit cardholders in the U.S. that have
used their debit cards to purchase a product or service on the
Internet is 15%.


Calculate p hat (sample proportion ).
The sample proportion (p hat ) is approximately normal with mean
= ______ and standard deviation = ______.

Find the probability of getting a sample proportion smaller than
14.4%.
ANS: Z=(0.144-0.15)/0.01=-0.6
Pr(Z<-0.6)= normalcdf(-E99, -0.6, 0, 1)
= 0.2743
More Exercise on Chapter 5.2:
1. 30% of all autos undergoing an emissions inspection at a city fail in
the inspection. Among 200 cars randomly selected in the city, the
percentage of cars that fail in the inspection is around_____, with
SD______. Find the prob that the percentage is between 31% and
35%.
2. 60% of all residents in a big city are Democrats. Among 400 residents
randomly selected in the city,
a) What is the sampling distribution of p-hat?
b) Find Pr(sample percentage<58%)
3. In airport luggage screening it is known that 3% of people have
questionable objects in their luggage. For the next 1600 people, use
normal approximation to find the prob that at least 4% of the people
have questionable objects.
4. It is known that 60% of mice inoculated with a serum are protected
from a certain disease. If 80 mice are inoculated,
a) What is the sampling distribution of p-hat?
b) find the prob that at least 70% are protected from the disease.
42
HWQ: 5.22, 5.23(a,b) 5.73
Sampling distribution of a sample mean=distribution of
X
Population
Distribution of X, (n=1):
Exact N(m , s )
Sampling distribution of X , (n>1) :
Exact N(m ,
s
)
n
Not Exact Normal, but with
Approximately N(m ,
s
)
n
Mean m , and SD s
Standardize: Z-score of X
(By Central Limit Theorem)
X m
 s 

; Reverse: X  m  
*Z
 s 
 n


 n
43
Sampling distribution of the sample proportion
The sampling distribution of p̂
is never exactly normal. But as the sample size
increases, the sampling distribution of p̂becomes approximately normal.
The normal approximation is most accurate for any fixed n when p is close to
0.5, and least accurate when p is near 0 or near 1.
Summary to Chapter 5
1. If X~N(µ, σ) exactly, then
a) what is the mean of X-bar?
b) what is SD of X-bar?
c) what is the sampling distribution of X-bar? (You need to specify
what the curve look like? What is the center/Mean? What is the
Spread/SD? Is it EXACT, or Approximate by Central Limit Theorem.)
2. If X is NOT normal, but with population mean µ and population SD σ.
When sample size is big enough,
a) what is the mean of X-bar?
b) what is SD of X-bar?
c) what is the sampling distribution of X-bar? (You need to specify
what the curve look like? What is the center/Mean? What is the
Spread/SD? Is it EXACT, or Approximate by Central Limit Theorem.)
3. With population proportion p and sample size n,
a) what is the mean of p-hat?
b) what is SD of p-hat?
c) what is the sampling distribution of p-hat? (You need to specify
what the curve look like? What is the center/Mean? What is the
Spread/SD? Is it EXACT, or Approximate by Central Limit Theorem.)
Summary to Chapter 5 (Popup Quiz)
1. If X~N(µ, σ) exactly, then
a) what is the mean of X-bar?
b) what is SD of X-bar?
c) what is the sampling distribution of X-bar? Is it EXACT, or
Approximate by Central Limit Theorem?
2. If X is NOT normal, but with population mean µ and population SD σ.
When sample size is big enough,
a) what is the mean of X-bar?
b) what is SD of X-bar?
c) what is the sampling distribution of X-bar? Is it EXACT, or
Approximate by Central Limit Theorem?
3. With population proportion p and sample size n,
a) what is the mean of p-hat?
b) what is SD of p-hat?
c) what is the sampling distribution of p-hat? Is it EXACT, or
Approximate by Central Limit Theorem?
Pop-Up Quiz:
Q: How to get the sampling distribution of Sample mean X-bar, from our IN-class EX?
1) How to obtain X-bar’s, starting from Table B for each student?
2) How many X-bar’s did we have totally in the class?
3) What is the name of the histogram?
4) What is the name of the smooth curve?
5) For the smooth curve, what did the horizontal axis and vertical axis present?
6) What math notations for the center of sampling distribution of sample mean X-bar?
7) What math notations for the spread of sampling distribution of sample mean X-bar?
8) If the population distribution X~N(µ, σ), what will be the sampling distribution of sample
mean X-bar will follow?