7.0 Sampling and Sampling Distribution

Download Report

Transcript 7.0 Sampling and Sampling Distribution

7.0 Sampling and Sampling
Distribution
7.2
7.1
Sampling Methods
Introduction to Sampling Distribution
Why?
Types
Sampling
Frame
Plan
2
Institut Matematik Kejuruteraan, UniMAP
BQT 173
less timeconsuming
more
practical
WHY??
less costly
less
cumbersome
3
Institut Matematik Kejuruteraan, UniMAP
BQT 173
Sampling Frame
• listing of items that make up the population
• data sources such as population lists, directories, or
maps
Sampling Plan
• The way a sample is selected
• determines the quantity of information in the sample
• allow to measure the reliability or goodness of your
inference
4
Institut Matematik Kejuruteraan, UniMAP
BQT 173
Types of Samples
Samples
Nonprobability
Samples
Convenient
Probability
Samples
Judgement
Simple
Random
5
Institut Matematik Kejuruteraan, UniMAP
Systematic
Stratified
Cluster
BQT 173
Selection of elements is left primarily to the interviewer.
Easy, inexpensive, or convenient to the sample
limitations- not representative of the population.
Recommended for pre testing Q, generating ideas, insight @ hypotheses.
Eg: a survey was conducted by one local TV stations involving a small number
of housewives, white collar workers & blue collar workers. The survey
attempts to elicit the respondents response towards a particular drama series
aired over the channel.
6
Institut Matematik Kejuruteraan, UniMAP
BQT 173
The population elements are selected based on the judgment of
the researcher.
From the judgment, the elements are representative of the
population of interest.
Eg: testing the consumers’ response towards a brand of instant
coffee, Indocafe at a wholesale market.
7
Institut Matematik Kejuruteraan, UniMAP
BQT 173
Definition:
If a sample of n is drawn from a population of N in such a way that every
possible sample of size n has the same chance of being selected,
the sample obtained is called a simple random sampling.
N – number of units in the population
n – number of units in sample
8
Institut Matematik Kejuruteraan, UniMAP
BQT 173
Do not have any bias element (every element treated equally).
Target population is homogenous in nature (the units have
similar characteristics)
Eg: canteen operators in primary school, operators in cyber
cafes, etc..
Disadvantages:
Sampling frame are not updated. Sampling frame are costly to
produce.
Impractical for large study area.
9
Institut Matematik Kejuruteraan, UniMAP
BQT 173
• Definition:
a sample obtained by randomly selecting one element
from the 1st k elements in the frame & every kth
element there is called a 1-in-k systematic sample,
with a random start.
• k – interval size
population size
•
k
sample size
N

n
10
Institut Matematik Kejuruteraan, UniMAP
BQT 173
Systematic Sampling
 eg:
Let say, there are a total of N=500 primary school canteen operators in the Klang Valley
in 1997 who are registered with the Ministry of Education. We required a sample of
n=25 operators for a particular study.
Step 1: make sure that the list is random(the name sorted
alphabetically).
Step 2: divide the operators into interval contain k operators.
k = population size = 500/25 = 20 for every 20 operators
sample size
selected only one to
represent that interval
Step 3: 1st interval only, select r at random. Let say 7. operators with id
no.7 will be 1st sample. The rest of the operators selected in
remaining intervals will depend on this number.
Step 4: after 7 has been selected, the remaining selection will be
operators with the following id no.
11
Institut Matematik Kejuruteraan, UniMAP
BQT 173
Stratified Sampling
 Definition:
obtained by separating the population elements into non
overlapping groups, called strata, & then selecting a random
sample from each stratum.
 Large variation within the population.
 Eg: lecturers that can be categorized as lecturers, senior lecturers,
associate prof & prof.
12
Institut Matematik Kejuruteraan, UniMAP
BQT 173
Stratified Sampling
Step 1: segregate population units into individual characteristics.
Each unit appears in into1 group/stratum. This what meant
by ‘non overlapping’. Denote the units as Nh (population
stratum size).
Step 2: obtain random sampling frames which are current for each
stratum.
Step 3: each frame select a random sample using one of method
that has been discussed. Select the unit proportionately,
i.e large stratum size should be represented with more
units compared to small stratum size. Denote the
represent unit of a particular stratum as nh (sample
stratum size).
Cluster Sampling
 Definition:
probability sample in which each sampling unit is a
collection, @ cluster of elements.
 Advantages- can be applied to a large study areas
- practical & economical.
- cost can be reduced-interviewer only need
to stay within the specific area instead
travelling across of the study area.
 Disadvantages – higher sampling error.
14
Institut Matematik Kejuruteraan, UniMAP
BQT 173
7.0 Sampling and Sampling
Distribution
7.2 Introduction to Sampling Distribution
• Sampling distribution is a probability
distribution of a sample statistic based on all
possible simple random sample of the same
size from the same population.
7.2.1 Sampling Distribution of Mean ( )
Mean of sampling distribution = mean of population
Mean of sample mean
,
X
Variance of sampling distribution
x  
x 
2
n
Standard deviation of sampling distribution
x 
16
Institut Matematik Kejuruteraan, UniMAP

n
BQT 173
Central Limit Theorem
 If we are sampling from a population that has an unknown
probability distribution, the sampling distribution of the
sample mean will still be approximately normal with mean 
and standard deviation  , if the sample size n is large.
n
17
Institut Matematik Kejuruteraan, UniMAP
BQT 173
Properties and shape of sampling
distribution of the sample mean
 n ≥ 30 , the sampling distribution of the sample mean is normally
distributed
 2 

x ~ N   ,
n 

 n < 30, the sampling distribution of the sample mean is normally
distributed if the sample is from the normal population and variance is
known
2
 
x ~ N   ,
n

18
Institut Matematik Kejuruteraan, UniMAP



BQT 173
 t-distribution with n-1 degree of freedom if the sample is from the
normal population but the variance is unknown
T
x
2
s
n
~ t n 1
The value of Z
x
Z
x
19
Institut Matematik Kejuruteraan, UniMAP
BQT 173
Example
The amount of time required to change the oil and filter of
any vehicles is normally distributed with a mean of 45
minutes and a standard deviation of 10 minutes. A random
sample of 16 cars is selected.
 What is the standard error of the sample mean to be?
 What is the probability of the sample mean between 45 and 52
minutes?
 What is the probability of the sample mean between 39 and 48
minutes?
 Find the two values between the middle 95% of all sample
means.
20
Institut Matematik Kejuruteraan, UniMAP
BQT 173
X : the amount of time required to change the oil and filter of any vehicles

X ~ N 45,102

n  16
X : the mean amount of time required to change the oil and filter of any vehicles
 102 
X ~ N  45,

16 

a) the standard error,  
10
 2.5
16
52  45 
 45  45
b) P  45  X  52   P 
Z

2.5 
 2.5
 P  0  Z  2.8
 0.4974
21
Institut Matematik Kejuruteraan, UniMAP
BQT 173
48  45 
 39  45
c) P  39  X  48   P 
Z 

2.5 
 2.5
 P  2.4  Z  1.2 
 0.4918  0.3849
 0.8767
P  a  X  b   0.95
d)
b  45 
 a  45
Z 
P
  0.95
2.5 
 2.5
P  z a  Z  zb   0.95
from table:
z a  1.96
zb  1.96
a  45
 1.96  a  40.1
2.5
b  45
 1.96  b  49.9
2.5
22
Institut Matematik Kejuruteraan, UniMAP
BQT 173
Sampling Distribution of the Sample
Proportion
X
p
N
and
pˆ 
x
n
where
 N = total number of elements in the population;
 X = number of elements in the population that possess a
specific characteristic;
 n = total number of elements in the sample; and
 x = number of elements in the sample that possess a
specific characteristic
23
Sampling Distribution of Sample Proportion, p̂
for Infinite Population
 The probability distribution of the sample proportion p̂, is
called its sampling distribution. It gives various values that p̂
can assume and their probabilities.
 For the large values of n (n ≥30), the sampling distribution is
very closely normally distributed.
Mean of the Sample Proportion
 The mean of the sample proportion, p̂ is denoted by  p̂
and is equal to the population proportion, p.
 pˆ  p
24
Standard Deviation of the Sample
Proportion
 pˆ 
pq
n
 where p is the population proportion, and n is the sample
size.
p (1  p ) 

ˆ ~ N  p,
p

n


25
For a small values of n:
 the population is binomial distributed,
X ~ B(n, p )
x

ˆ    P  X  x  nC x p x q n  x
P p
n

Var X   npq
mean, E X   np
The value of Z :
26
Z 
ˆp
p
 pˆ
Example 4.2
 The National Survey of Engagement shows about 87% of
freshmen and seniors rate their college experience as “good”
or “excellent”. Assume this result is true for the current
population of freshmen and seniors. Let be the proportion of
freshmen and seniors in a random sample of 900 who hold
this view. Find the mean and standard deviation of p̂ .
27
Solution:
 Let p the proportion of all freshmen and seniors who rate their college
experience as “good” or “excellent”. Then,
p = 0.87 and q = 1 – p = 1 – 0.87 = 0.13
 The mean of the sample distribution of
p̂ is:
 pˆ  p  0.87
The standard deviation of p̂ :
 pˆ 
 pˆ
28
 pˆ
pq
n
0.870.13

900
 0.011
Sampling Distribution for the
Difference between Two Means
Suppose we have two populations, and which are normally distributed.
X 1 ~ N ( 1 ,  1 )
and
X 2 ~ N ( 2 ,  2 )
Sampling distribution for X 1 and X 2 :
  12 

X 1 ~ N  1 ,

n
1


29
and
2



2

X 2 ~ N   2 ,

n
2 

MEAN
X
1X 2
 E  X1  X 2 
 E  X1   E  X 2 
 1  2
VARIANCE
Var  X 1  X 2   Var  X 1   Var   X 2 
 Var  X 1    1 Var  X 2 
2

30
 12
n1

 22
n2
Therefore the distribution of X 1  X 2 can be written as:

 12  2 2 
X 1  X 2 ~ N  1  2 ,


n1
n2 

X

Z 
1
 X 2    1   2 

2
1
n1
31

2
2
n2
~ N  0,1
Sampling Distribution for the
Difference between Two Proportions
Now say we have two binomial populations with proportion of successes p̂1 and p̂2

p1 (1  p1 ) 


p
(
1

p
)
ˆ
2
2
ˆ

P1 ~ N 

P2 ~ N 
p2 ,
 p1 ,



n1
n


2


mean,
ˆ P
ˆ 
 Pˆ  P  E  P
1
2
1
2
 
 
ˆ E P
ˆ
E P
1
2
 p1  p2
32
Variance
ˆ P
ˆ 
 P2  P  Var  P
1
2
1
2
 
 
ˆ  ( 1) 2 Var P
ˆ
 Var P
1
2
 p1 (1  p1 )   p2 (1  p2 ) 



n
n

1
 
2

Using the Central Limit Theorem, the distribution of Pˆ  Pˆ is
1
2

p1 (1  p1 )
p2 (1  p2 ) 
ˆ
ˆ

P1  P2 ~ N 

 p1  p2 ,

n
n
1
2


33
Example 4.4
 A certain change in a process for manufacture of component parts
was considered. It was found that 75 out of 1500 items from the
existing procedure were found to be defective and 80 of 2000
items from the new procedure were found to be defective. If one
random sample of size 49 items were taken from the existing
procedure and a random sample of 64 items were taken from the
new procedure, what is the probability that
 the proportion of the defective items from the new procedure
exceeds the proportion of the defective items from the existing
procedure?
 proportions differ by at most 0.015?
 the proportion of the defective items from the new procedure
exceeds proportion of the defective items from the existing
procedure by at least 0.02?
34
Institut Matematik Kejuruteraan, UniMAP
BQT 173
Solution:
PˆN : the proportion of defective items from the new procedure
Pˆ : the proportion of defective items from the existing procedure
E
80
75
 0.04
pE 
 0.05
2000
1500
0.04(0.96) 

ˆ ~ N  0.05, 0.05(0.95) 
~ N  0.04,
P
E



64
49




0.05(0.95) 0.04(0.96) 

ˆ
 PE ~ N  0.04  0.05,


49
64


 Pˆ ~ N  0.01, 0.0016 
pN 
PˆN
PˆN
PˆN
35
E
Institut Matematik Kejuruteraan, UniMAP
BQT 173



ˆ P
ˆ P P
ˆ P
ˆ 0
a) P P
N
E
N
E

0   0.01 

 PZ 

0.0016


 P  Z  0.25 
 0.4013
ˆ P
ˆ | 0.015  P 0.015  P
ˆ P
ˆ  0.015
b) P | P
N
E
N
E




0.015   0.01 
 0.015   0.01
 P
Z

0.0016
0.0016


 P  0.125  Z  0.625 
 0.2838
ˆ P
ˆ  0.02  P P
ˆ P
ˆ  0.02
c) P P
N
E
N
E




0.02   0.01 

 PZ 

0.0016


 P  Z  0.75 
 0.2266
36
Institut Matematik Kejuruteraan, UniMAP
BQT 173