LectureNotes(chapter7) - University of South Alabama

Download Report

Transcript LectureNotes(chapter7) - University of South Alabama

Sampling distributions
chapter 7 ST210
Nutan S. Mishra
Department of Mathematics and Statistics
University of South Alabama
Useful links
• http://oak.cats.ohiou.edu/~wallacd1/ssamp
le.html
• http://garnet.acns.fsu.edu/~jnosari/05.PDF
• http://www.ruf.rice.edu/~lane/stat_sim/sam
pling_dist/
Sampling distribution
In chapter 2 we defined a population parameter as a function of all the population
values.
Let population consists of N observations then population mean and population
standard deviation are parameters
N

x
i 1
i
N
2
(
x
)

2
x  N

N
For a given population, the parameters are fixed values.
Sampling distribution
On the other hand if we draw a sample of size n from a population of size N,
then a function of the sample values is called a statistics
For example sample mean and sample standard deviation are sample
n
statistics.
x
x
i 1
i
n
2
(
x
)

2
x


n
s
n 1
Since we can draw a large number of samples from the population the value of
sample statistic varies from sample to sample
Sampling distribution
Since value of a sample statistic varies from sample to sample, the
statistic itself is a random variable and has a probability distribution.
For Example sample mean x is random variable and it has a probability
distribution.
Example: Start with a toy example
Let the population consists of 5 students who took a math quiz of 5
points.
Name of the students and corresponding scores are as follows:
Name of the student
A
B
C
D
E
Score
2
3
4
4
5
For this population mean µ = 3.6 and standard deviation σ = 1.02
Sampling distribution
Now we repeatedly draw samples of size three from the population of size
5. then the possible samples are 10 as listed below
The population parameters are µ = 3.6 and s.d. σ = 1.02
Sample
sample
Sample values
x
s
1
A,B,C
2,3,4
3
1
2
A,B,D
2,3,4
3
1
3
A,B,E
2,3,5
3.33
1.53
4
A,C,D
2,4,4
3.33
1.16
5
A,C,E
2,4,5
3.67
1.53
6
A,D,E
2,4,5
3.67
1.53
7
B,C,D
3,4,4
3.67
.58
8
B,C,E
3,4,5
4
1
9
B,D,E
3,4,5
4
1
10
C,D,E
4,4,5
4.33
.58
Sampling distribution
X= score of a student in the math quiz
Sampling distribution of sample mean
Population distribution
x
f
P(x)
x
f
P(x )
2
1
.2
3
2
.2
3
1
.2
3.33
2
.2
4
2
.4
3.67
3
.3
5
1
.2
4
2
.2
4.33
1
.1
Thus we see that the sample mean x is a new random variable and has a
probability distribution.
Question: What is the mean of this random variable and what is its variance?
Exercise 7.8
Here are some guidelines to solve
1. X= teaching experience of a faculty
2. Write the two columns x and p(x)
3. Total number of samples of size 4 from a
population of size 5 is (5 choose 4) = 5
4. List all the 5 samples and compute their
sample means.
5. Compute the quantities in part b and c.
Sampling distribution
Let N be the size of the population and n be the size of the
sample
mean of sample mean  x  
If n/N > .05
and standard devation of sample mean
x 
And if n/N ≤.05

n
N n
N 1
mean of sample mean  x  
and standard devation of sample mean
x 

n
Sampling distribution of sample mean
Theorem
Let X be a random variable with population mean µ and population standard
deviation σ . If we collect the samples of size n then the new random
variable sample mean x has the mean same as µ and standard
deviation σ/√n
We can denote them as follows:
mean of x   x  
standard deviation of x   x  
n
Sampling distribution of sample mean
mean of x   x  
standard deviation of x   x  
n
Standard deviation of sample mean decreases as the
sample size increases.
The mean of the sample remains unaffected with the
change in sample size.
Sample mean is called an estimator of the population
mean.
Because whenever population mean is unknown we will
use sample mean in place.
Exercise 7.13
X has a large population with µ=60 and σ = 10
Assuming n/N ≤ .05, the parameters of sample
mean are
when n  18
mean of x   x    60
standard deviation of x   x  
 10 / 18  2.36
n
when n  90
mean of x   x    60
standard deviation of x   x  
 10 / 90  1.05
n
Sampling distribution of sample mean
x
P( x)
3
.2
3.33
.2
3.67
.3
4
.2
4.33
.1
From the above table when we compute the mean and variance
They are (complete this with the help of chapter 5 slides)
Sampling distribution of sample mean
We have seen that distribution of the sample mean
x is derived from the distribution of x
Thus distribution of x is called parent distribution.
The next question is to investigate what is the
relationship between the parent distribution and
the sampling distribution of x .
Sampling distribution of sample mean
Let the distribution of x is normal with mean µ and standard
deviation σ then it is equivalent to saying that
Let the parent population is normal with mean µ and
standard deviation σ
If we draw a sample of size n from such a population then
• Mean of x that is  x is equal to the mean of the
population µ.
• Standard deviation of x that is  x is equal to σ/√n
• The shape of the distribution of x is normal whatever be
the value of n
Sampling distribution of sample mean
If X~ N(µ, σ) then
x
~ N ((µ, σ/√n)
Where n is size of the sample
drawn from the population
Central Limit Theorem
For a large sample size, the sampling distribution
of x is approximately normal, irrespective of the
shape of the population distribution.
What size of the sample is considered to be large?
A sample of size ≥ 30 is considered to be large.
Useful link:
http://www.austin.cc.tx.us/mparker/1342/cltdemos.h
tm
Exercise 7.28
Given that population distribution is skewed to the left.
That is X is not distributed as normal.
a.
x
When n=400 (i.e. when we repeatedly draw samples of size
400 from the population) and compute the sample mean for
all such samples then what would be the distribution of x .
Answer : since the sample size is large, in such a case the
distribution of x according to Central Limit theorem will be
normal that is
x
~ N( µ, σ/√400)
Sampling distribution of sample mean
If the random sample comes from a normal
population, the sampling distribution of sample
mean is normal regardless the size of the sample.
If the shape of the parent population is not known or
not normal then distribution of sample mean is
approximately normal when ever n is large
(≥30).(this is central limit theorem)
If the shape of the parent population is not known or
not normal and sample size is small then we can
not say readily about the shape of sample
distribution
Estimators
• Sample mean x is an estimator of population mean
µ
• By this we mean when ever value of µ is not
available we will use x .
• Sample mean x is an unbiased estimator of
population mean µ
• Unbiased estimator means in the long run value of
x approaches to the true value of µ. In other words
expected value of x is equal to µ.
Sampling error
• Recall that for a given population value of µ is fixed and
is a variable whose value varies from sample to sample x
• When we use x in place of µ some error is inevitable
• The difference between µ and x is called sampling error
Sampling error = x - µ
• The sampling error occurs purely due to chance. The
chance of being a specific sample being selected.
• Other type of errors may occur in the estimation : for
example error in recording a value or a missing value.
Such types of errors are called non-sampling errors
Example of sampling error
• Now we repeatedly draw samples of size three from the
population of size 5. then the possible samples are 10 as
listed below
• The population parameters are µ = 3.6 and s.d. σ = 1.02
Sample
sample
Sample values
x
Sampling error =
1
A,B,C
2,3,4
3
-.6
2
A,B,D
2,3,4
3
-.6
3
A,B,E
2,3,5
3.33
-.27
4
A,C,D
2,4,4
3.33
-.27
5
A,C,E
2,4,5
3.67
.07
6
A,D,E
2,4,5
3.67
.07
7
B,C,D
3,4,4
3.67
.07
8
B,C,E
3,4,5
4
.4
9
B,D,E
3,4,5
4
.4
10
C,D,E
4,4,5
4.33
.73
x
-µ
Example of sampling error
Sample
sample
Sample values
Sampling error = x -µ
1
A,B,C
2,3,4
3
-.6
2
A,B,D
2,3,4
3
-.6
3
A,B,E
2,3,5
3.33
-.27
4
A,C,D
2,4,4
3.33
-.27
5
A,C,E
2,4,5
3.67
.07
6
A,D,E
2,4,5
3.67
.07
7
B,C,D
3,4,4
3.67
.07
8
B,C,E
3,4,5
4
.4
9
B,D,E
3,4,5
4
.4
10
C,D,E
4,4,5
4.33
.73
The last column in the above table computes the error in estimation. That is
while drawing a sample of size 3 from the given population, if we get say
sample number 3, and use the corresponding x value to estimate the
population mean µ then the error in estimation is -.27 units.
Exercise 7.4
Population consists of six numbers
15,13,8,17, 9,12
a. Population mean = 12.33
b. Liza selected a sample with n=4 and values 13,8,9,12.
sample mean = 10.5. then sampling error = 10.5-12.33 =
-1.83
c. while calculating sample mean Liza mistakenly entered
a 6 in place of 9 in the above sample. That is she entered
13,8,6,12. That is a non-sampling error has occurred.
And the sample mean is 9.75.
Total error = sampling error + non-sampling error.
Total error = 9.75 – 12.33 = -2.58 out of which -1.83 is the
sampling error . Thus non sampling error = -2.58 - (-1.83)
= -.75
Exercise 7.49
• X= GPA of a student enrolled at a large university
• X~ N( 3.02, .29) (This x represents the characteristics of
whole population of students)
• That is average GPA of all the students in the population
is 3.02 and standard deviation is .29.
• We draw a sample of size n=20 from this population and
compute the sample mean x
• To find P( x >3.10) (as asked in part a)
• To compute such a probability we must know what is the
distribution of x
• Since the sample is small but the parent population is
normal hence x ~ N( 3.02 , .29/√20)
• at this point we convert the probability statement in the
form of probability statement in z using the transformation
formula
3.10  3.02
x
• P( x >3.10) = P(z >
) = P(z > .29 / 20 )
/ n
Exercise 7.52
X = time spent by a college student in studying /week
X~ right skewed ( 8.4, 2.7)
that is the population of all college student spend 8.4
hrs/week on the average with a standard deviation of 2.7
hrs. And the distribution is right skewed (i.e. not normal)
If we draw a sample of size n=45 students from this
population and compute the sample mean then we are
asked to find P(8 < x <9)
To find such a probability we must know the distribution of x
Though the parent distribution is right skewed, since sample
size large , we apply the CLT to conclude that
x ~ N(8.4 , 2.7/√45 )
8  8.4
9  8.4
x
P(8<
<9) = P(2.7 / 45< z < 2.7 / 45 )
Population and sample proportions
Consider a categorical variable with just two
categories.
Let the population size be N out of which X falls in
category I.
Then population proportion of category I = X/N
(denoted by p)
Thus population proportion p = X/N
If we draw a sample of size n from this population
and observe that x out of n fall in category I then
sample proportion of category I = x/n (denoted by p̂
Thus sample proportion p̂ = x /n
Population and sample proportions
A population consists of 9000 families in a small town. Out of
these, 3600 families have their houses insured.
Then population proportion of house insured families = p =
3600/9000 = .4
Suppose we drew a sample of size 100 from the above
population and observed that 42 families out of 100 have
house insurance. Then the sample proportion of the
house insured families
p̂ = 42/100 = .42
Sampling error = p̂ - p = .42 - .40 = .02
Sampling distribution of p̂
When we draw multiple samples from the population
we get different values for p̂ . Thus p̂ is a random variable
and has a sampling distributi on
pq
mean of p̂   p̂  p and  p̂ 
where q  1 - p
n
Central Limit theo rem for sample proportion :
If sample size n is considerab ley large then
pq
p̂ ~ N( p,
)
n
here n is considered to be large if np  5 and nq  5
Exercise 7.60
N = 1000, X = 640
Then population proportion p = 640/1000 = .64
n= 40 , x = 24
then sample proportion p̂ = 24/40 = .60
Exercise 7.70
Given population proportion p  .63
and n/N  .05
to find  p̂ and  p̂ when sample of size 100 is drawn.
n  100
then  p̂  p  .63
pq
.63 * .37
and  p̂ 

 .0483
n
100