No Slide Title

Download Report

Transcript No Slide Title

Lesson 6:
Sampling Methods and the
Central Limit Theorem
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-1
Outline
Point estimate
Why sample the population?
Probability sampling
Choice of sampling method: Sampling straws
Sampling distribution of the sample means
Probability histograms and empirical histograms
Central Limit Theorem
Normal approximation to Binomial
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-2
Inferential Statistics
 Making statements about a population by examining
sample results
Sample statistics
(known)
Sample
Ka-fu Wong © 2007
Inference
Population parameters
(unknown, but can
be estimated from
sample evidence)
Population
ECON1003: Analysis of Economic Data
Lesson6-3
Inferential Statistics
Inferential Statistics
Estimation
Ka-fu Wong © 2007
Hypothesis Testing
ECON1003: Analysis of Economic Data
Lesson6-4
Point Estimates
 Examples of point estimates are
 the sample mean,
 the sample standard deviation,
 the sample variance,
 the sample proportion.
 A point estimate is one value ( a single point ) that is used
to estimate a population parameter.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-5
Estimating the percentage of Earth
covered by water
 Experiments:
 Paint a dot on your thumb.
 Catch the globe and tell me whether the dot on your
thumb lands on water.
 Estimate the percentage of Earth covered by water by the
average of all trials.
 Idea: If we draw many observations with replacement, the
sample average will approach the population proportion.
Code water as 1 and land as 0, the sample average will be
an estimate of the proportion will be the percentage of
Earth covered by water.
Truth:
Water covers 71% of the Earth's surface.
e.g., http://pao.cnmoc.navy.mil/educate/neptune/trivia/earth.htm
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-6
Why Sample the Population?
 The physical impossibility of checking all items in the
population.
 The cost of studying all the items in a population.
 The sample results are usually adequate.
 Contacting the whole population would often be timeconsuming.
 The destructive nature of certain tests.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-7
Probability Sampling
 A probability sample is a sample selected such that each
item or person in the population being studied has a
known likelihood of being included in the sample.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-8
Methods of Probability Sampling
 Simple Random Sample: A sample formulated so that each
item or person in the population has the same chance of
being included.
 Systematic Random Sampling: The items or individuals of
the population are arranged in some order. A random
starting point is selected and then every k-th member of
the population is selected for the sample.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-9
Methods of Probability Sampling
 Stratified Random Sampling: A population is first divided
into subgroups, called strata, and a sample is selected from
each stratum.
 Stratification is the process of grouping members of the
population into relatively homogeneous subgroups
before sampling. The strata should be mutually
exclusive : every element in the population must be
assigned to only one stratum. The strata should also be
collectively exhaustive: no population element can be
excluded. Then random or systematic sampling is
applied within each stratum. This often improves the
representativeness of the sample by reducing sampling
error.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-10
Methods of Probability Sampling
 Cluster Sampling: A population is first divided into primary
units then samples are selected from the primary units.
 Cluster sampling is an example of 'two-stage sampling'
or 'multistage sampling': in the first stage a sample of
areas is chosen; in the second stage a sample of
respondent within those areas is selected.
 This can reduce travel and other administrative costs. It
also means that one does not need a sampling frame for
the entire population, but only for the selected clusters.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-11
Independent identically distributed (iid)
 “random draws from any population, with replacement”
are independent identically distributed (i.i.d.).
 Independent: the probability of drawing the current
observation does not depend on what has been drawn
previously.
 Identically distributed: the probability of drawing the
current observation is the same as what has been drawn
previously and what will be drawn in the future.
Most of the things covered in this Lesson holds even
when we do not have iid observations.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-12
Choice of sampling method
-- “Sampling Straws”
 Choice of sampling method is important.
 An exercise of “Sampling Straws” experiments will
illustrate that some sampling method can produce a biased
estimate of the population parameters.
 The bag contain a total of 12 straws, 4 of which are 4
inches in length, 4 are 2 inches long, and 4 are 1 inch
long.
 The population mean length is 2.33 (=4*(1+2+4)/12)
 Randomly draw 4 straws one by one with replacement.
 Compute the sample mean.
 The average of the sample means of experiments is
generally larger than 2.33.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-13
Choice of sampling method
-- “Sampling Straws”
 The sample scheme is biased because the longer straws
have a higher chance of being drawn, if the draw is truly
random (say, draw your first touched straw).
 The draw may not be random because we can feel the
length of the straw before we pull out the straw.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-14
Choice of sampling method
-- “Sampling Straws”
1
2
3
4
5
6
7
8
9
10 11 12
 Alternative sampling scheme:
 Label the straws 1 to 12.
 Label 12 identical balls 1 to 12.
 Draw four balls with replacement.
 Measure the corresponding straws and compute the
sample mean.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-15
Choice of sampling method
-- “Telephone interview”
 Suppose we are interested in estimating unemployment
rate by a phone survey.
1. Interview a group selected based on a random sample
of mobile phone numbers.
2. Interview a group selected based on a random sample
of residential phone numbers.
3. Interview a group selected based on a random sample
of mobile and residential phone numbers.
Which sampling method will yield a good estimate of the
population unemployment rate?
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-16
Non-Probability Sampling
 In nonprobability sample, whether an observation is
included in the sample is based on the judgment of the
person selecting the sample.
 The sampling error is the difference between a sample
statistic and its corresponding population parameter.
 Sampling error is almost always nonzero.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-17
Property of sample means
 Unbiasedness
 Consistency
 Central Limit Theorem
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-18
Unbiasedness
 A point estimator θˆ is said to be an unbiased
estimator of the parameter  if the expected value, or
mean, of the sampling distribution of θˆ is ,
E( θˆ )  θ
 Examples:
 The sample mean is an unbiased estimator of μ
 The sample variance is an unbiased estimator of σ2
 The sample proportion is an unbiased estimator of P
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-19
Bias
 Let θˆ be an estimator of 
 The bias in θˆ is defined as the difference between its
mean and 
ˆ )  E( θ
ˆ)θ
Bias(θ
 The bias of an unbiased estimator is 0
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-20
Sample mean is an unbiased estimator of 
 Let x1, …., xn be drawn from the population of mean  with
replacement. (i.e., X1,…,Xn are i.i.d.)
 Sample mean m = (x1 +…. + xn)/n
 E(m) = E[ (x1 +…. + xn)/n ]
= [ E(x1) +…. +E(xn) ]/n
= [ + … +  ]/n
=
 m is an unbiased estimator (with zero bias) of the
population mean.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-21
Sample mean is a biased estimator of 
 Let x1, …., xn be drawn from the population of mean  with
replacement. (i.e., X1,…,Xn are i.i.d.)
 Sample mean m = (x1 +…. + xn+100)/n
 E(m) = E[ (x1 +…. + xn +100)/n ]
= [ E(x1) +…. +E(xn)+100]/n
= [ + … +  +100]/n
=  + 100/n
 m is a biased estimator (with upward bias) of the
population mean.
 m is asymptotically unbiased. That is, bias approaches
zero as n increases.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-22
Consistency
 A consistent estimator is an estimator that converges in
probability to the quantity being estimated as the sample
size grows.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-23
Consistency
 Let x1, …., xn be drawn from the population of mean  with
replacement. (i.e., X1,…,Xn are i.i.d.)
 Sample mean m = (x1 +…. + xn)/n
 E(m) = 
 Var(m) = Var[ (x1 +…. + xn)/n ]
=[ Var(x1)+…. + Var(xn)] /n2
=Var(x)/n
 Hence, m will approach  as n increases.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-24
Consistency
 Let x1, …., xn be drawn from the population of mean  with
replacement. (i.e., X1,…,Xn are i.i.d.)
 Sample mean m = (x1 +…. + xn+100)/n
 E(m) =  +100/n converges to  as n increases
 Var(m) = Var[ (x1 +…. + xn + 100)/n ]
=[ Var(x1)+…. + Var(xn)] /n2
=Var(x)/n
 Hence, m will approach  as n increases.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-25
Most Efficient Estimator
 Suppose there are several unbiased estimators of 
 The most efficient estimator or the minimum variance unbiased
estimator of  is the unbiased estimator with the smallest variance
ˆ and θˆ be two unbiased estimators of , based on the same
 Let θ
2
1
number of sample observations. Then,
ˆ is said to be more efficient than θˆ if
 θ
1
2
Var(θˆ 1)  Var(θˆ 2 )
 The relative efficiency of θˆ 1 with respect to θˆ 2 is the ratio of
their variances:
Var(θˆ 2 )
Relative Efficiency 
Var(θˆ )
1
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-26
Sampling Distribution of the Sample
Means
 The sampling distribution of the sample mean is a
probability distribution consisting of all possible sample
means of a given sample size selected from a population.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-27
EXAMPLE 1
 The law firm of Hoya and Associates has five partners. At their
weekly partners meeting each reported the number of hours they
billed clients for their services last week.
1.
2.
3.
4.
5.
Partner
Hours
Dunn
Hardy
Kiers
Malinowski
Tillman
22
26
30
26
22
 The population mean is 25.2 hours.
22  26  30  26  22

 25.2
5
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-28
Example 1
 If two partners are selected randomly, how many
different samples are possible?
This is the combination of 5 objects taken 2 at a time.
That is:
5!
 10
5 C2 
2! (5  2)!
There are a total of 10 different samples.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-29
Example 1
Ka-fu Wong © 2007
continued
Partners
Total
Mean
1,2
48
24
1,3
52
26
1,4
48
24
1,5
44
22
2,3
56
28
2,4
52
26
2,5
48
24
3,4
56
28
3,5
52
26
4,5
48
24
ECON1003: Analysis of Economic Data
Lesson6-30
EXAMPLE 1
continued
 Organize the sample means into a frequency distribution.
RV
Sample Mean
Frequency
22
24
26
28
1
4
3
2
Relative Frequency
probability
1/10
4/10
3/10
2/10
 The mean of the sample means is 25.2 hours.
22(1)  24( 4)  26(3)  28(2)
X 
 25.2
10
The mean of the sample means is exactly equal to the population mean.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-31
Example 1
 Population variance
= [ (22-25.2)2+(26-25.2)2 +… + (22-25.2)2 ] / 5 = 8.96
 Variance of the sample means:
=[ (1)(22-25.2)2+(4)(24-25.2)2 + (3)(26-25.2)2 + (2)(2225.2)2 ] / ( 1+2+3+2) = 3.36
 The variance of sample means < variance of population
variance
 3.36/8.96 = 0.375 < 1/2
The ratio is different from 1/2 (i.e., 1/n)
because it is like sampling without replacement.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-32
Example
 Suppose we had a uniformly
distributed population
containing equal proportions
(hence equally probable
instances) of (0,1,2,3,4). If
you were to draw a very large
number of random samples
from this population, each of
size n=2, the possible
combinations of drawn values
and the sums are
Sums
Combinations
0
0,0
1
0,1 1,0
2
1,1 2,0 0,2
3
1,2 2,1 3,0 0,3
4
1,3 3,1 2,2 4,0 0,4
5
1,4 4,1 3,2 2,3
6
3,3 4,2 2,4
7
3,4 4,3
8
4,4
Note that this is sampling with replacement.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-33
Example
 Population mean = mean of sample means
 Population mean
= (0+1+2+3+4)/5=2
 Mean of sample means
= [ (1)(0) + (2)(0.5) + …+(1)(4) ] / 25
=2
Means
 Variance of sample means
= Population variance/ sample size
 Population variance
=(0-2)2 + … + (4-2)2 / 5
=2
 Variance of sample means
=(1)(0-2)2+… +(1)(4-2)2 / 25
=1
Ka-fu Wong © 2007
Combinations
0.0
0,0
0.5
0,1 1,0
1.0
1,1 2,0 0,2
1.5
1,2 2,1 3,0 0,3
2.0
1,3 3,1 2,2 4,0 0,4
2.5
1,4 4,1 3,2 2,3
3.0
3,3 4,2 2,4
3.5
3,4 4,3
4.0
4,4
ECON1003: Analysis of Economic Data
Lesson6-34
Probability Histograms
 In a probability histograms, the area of the bar represents
the chance of a value happening as a result of the random
(chance) process
 Empirical histograms (from observed data) for a
process converge to the probability histogram
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-35
Examples of empirical histogram
 Roll a fair die: 50, 200 times
30
50 times
200 times
20
20
10
10
Percent
Percent
30
0
1
DIE
2
3
4
5
0
6
1
2
3
4
5
6
DIE
The empirical histogram will approach the probability
histogram as the number of draws increase.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-36
Empirical histogram #1
Two balls in the bag:
Draw 1 ball 1000 times with replacement. Plot a relative
frequency histogram (empirical probability histogram).
0.5
The empirical histogram looks like
the population distribution !!!
What is the probability of getting
a red ball in any single draw?
0.5
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-37
Empirical histogram #2
5 balls in the bag:
Draw 1 ball 1000 times with replacement. Plot a relative
frequency histogram (empirical probability histogram).
0.6
0.4
The empirical histogram looks like
the population distribution !!!
What is the probability of getting
a red ball in any single draw?
0.6
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-38
Empirical histogram #3
5 balls in the bag:
0
1
2
3
4
Draw 1 ball 1000 times with replacement. Plot a relative
frequency histogram (empirical probability histogram).
The empirical histogram looks like
the population distribution !!!
What is the probability of getting a
“three” in any single draw? 0.2
0.2
0
Ka-fu Wong © 2007
1
2
3
What is the expected value (i.e.,
population mean) of a single draw?
0.2*0 + 0.2*1 + … + 0.2*4 = 2
Variance = 0.2*(-2)2 + 0.2*(-1)2
+… +0.2*(2)2 = 2
4
ECON1003: Analysis of Economic Data
Lesson6-39
Empirical histogram #3 continued
5 balls in the bag:
0
1
2
3
4
Draw 2 balls 1000 times with replacement. Compute the sample
mean. Plot a relative frequency histogram (empirical probability
histogram) of the 1000 sample means.
Means
Combinations
0.0
0,0
0.5
0,1 1,0
1.0
1,1 2,0 0,2
1.5
1,2 2,1 3,0 0,3
2.0
1,3 3,1 2,2 4,0 0,4
2.5
1,4 4,1 3,2 2,3
3.0
3,3 4,2 2,4
3.5
3,4 4,3
4.0
4,4
Ka-fu Wong © 2007
All combinations are equally likely.
0.2
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
0.5
ECON1003: Analysis of Economic Data
1
1.5
2
2.5
3
3.5
4
Lesson6-40
Empirical histogram #3 continued
5 balls in the bag:
0
1
2
3
4
Draw 2 ball 1000 times with replacement. Compute the sample
mean. Plot a relative frequency histogram (empirical probability
histogram) of the 1000 sample means.
What is the probability of getting
a sample mean of 2.5 in any single
0.2
draw?
0.16
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
What is the expected sample
mean of a single draw?
0.04*0 + 0.08*0.5 +… + 0.04*4
=2
0
0.5
Ka-fu Wong © 2007
1
1.5
2
2.5
3
3.5
4
Variance of sample mean =
0.04*(-2)2 + 0.08 *(-1.5)2 + … +
0.04*(2)2 = 1
ECON1003: Analysis of Economic Data
Lesson6-41
Distribution of Sample means of different sample
sizes and from different population distribution
 http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html
 http://www.kuleuven.ac.be/ucs/java/index.htm and choose basic and
distribution of mean.
 http://faculty.vassar.edu/lowry/central.html
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-44
Central Limit Theorem #1
5 balls in the bag:
0
1
2
3
4
Draw n (n>30) ball 1000 times with replacement. Compute the
sample mean. Plot a relative frequency histogram (empirical
probability histogram) of the 1000 sample means.
The Central Limit Theorem says
1. The empirical histogram looks like a normal density.
2. Expected value (mean of the normal distribution) = mean of the
original population mean = 2.
3. Variance of the sample means = variance of the original
population /n = 2/n.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-45
Central Limit Theorem #2
Some unknown number of numbered balls in the bag:
0
1
2
3
4
?
?
We know only that the population mean is  and the variance is 2.
Draw n (n>30) ball 1000 times with replacement. Compute the
sample mean. Plot a relative frequency histogram (empirical
probability histogram) of the 1000 sample means.
The Central Limit Theorem says
1. The empirical histogram looks like a normal density.
2. Expected value (mean of the normal distribution) = .
3. Variance of the sample means = 2 /n.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-46
Confidence interval #1
Some unknown number of numbered balls in the bag:
0
1
2
3
4
?
?
We know only that the population mean is  and the variance is 2.
The Central Limit Theorem says
1. The empirical histogram looks like a normal density.
2. Expected value (mean of the normal distribution) = .
3. Variance of the sample means = 2 /n.
What is the probability that the sample mean of a randomly drawn
sample lies between   /n ?
0.6826
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-47
Central Limit Theorem
 For a population with a mean  and a variance 2 the sampling
distribution of the means of all possible samples of size n
generated from the population will be approximately normally
distributed.
 The mean of the sampling distribution equal to  and the
variance equal to 2/n.
The population distribution
The sample mean of n observation
Ka-fu Wong © 2007
X ~ ( ,  )
2
Xn ~ N(, / n)
ECON1003: Analysis of Economic Data
2
Lesson6-48
Central Limit Theorem: Sums
 For a large number of random draws, with replacement,
the distribution of the sum approximately follows the
normal distribution
 Mean of the normal distribution is
 n* (expected value of one random draw)
 SD for the sum (SE) is
n 
 This holds even if the underlying population is not
normally distributed
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-49
Central Limit Theorem: Averages
 For a large number of random draws, with replacement,
the distribution of the average = (sum)/n approximately
follows the normal distribution
 The mean for this normal distribution is
 (expected value of one random draw)
 The SD for the average (SE) is

n
 This holds even if the underlying population is not
normally distributed
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-50
Law of large numbers
 The sample mean converges to the population mean as n
gets large.
 For a large number of random draws from any
population, with replacement, the distribution of the
average = (sum)/n approximately follows the normal
distribution
 The mean for this normal distribution is the
(expected value of one random draw)
 The SD for the average (SE) is

n
 SD for the average tends to zero as n increases.
 This holds even if the underlying population is not
normally distributed
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-51
Central Limit Theorem Simulation
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-52
Effect of Sample Size
Regardless of the
underlying
population, the
larger the sample
size, the more
nearly normally
distributed is the
population of all
possible sample
means.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-53
Central Limit Theorem
 If a population follows the normal distribution, the
sampling distribution of the sample mean will also follow
the normal distribution.
 To determine the probability a sample mean falls within a
particular region, use:
X 
z
 n
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-54
Central Limit Theorem
 If the population does not follow the normal distribution,
but the sample is of at least 30 observations, the sample
means will follow the normal distribution.
 To determine the probability a sample mean falls within a
particular region, use:
X 
z
s n
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-55
Example 2
 Suppose the mean selling price of a gallon of gasoline in
the United States is $1.30. Further, assume the
distribution is positively skewed, with a standard deviation
of $0.28. What is the probability of selecting a sample of
35 gasoline stations and finding the sample mean within
$.08 of the population mean ($1.30)?
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-56
Example 2
continued
 The first step is to find the z-values corresponding to
$1.22 (=1.30-0.08) and $1.38 (=1.30+0.08). These
are the two points within $0.08 of the population
mean.
X   $1.38  $1.30
z

 1.69
s n
$0.28 35
X   $1.22  $1.30
z

 1.69
s n
$0.28 35
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-57
Example 2
continued
 Next we determine the probability of a z-value
between -1.69 and 1.69. It is:
P(1.69  z  1.69)  2(.4545)  .9090
 We would expect about 91 percent of the sample means to
be within $0.08 of the population mean.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-58
Sampling Distribution of Sample
Proportion
If a random sample of size n is taken from a
population then the sampling distribution of the
sample proportion
is
ˆ
p
Approximately normal, if n is large.
Has mean
 pˆ = p
Has standard deviation  pˆ
p(1 - p)
=
n
Approximately normal because the sample proportion is a
simple average of zeros and ones from difference trials.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-59
The Normal Approximation to the
Binomial revisited
 The normal distribution (a continuous distribution)
yields a good approximation of the binomial
distribution (a discrete distribution) for large values of
n.
 The normal probability distribution is generally a good
approximation to the binomial probability distribution
when n  and n(1-  ) are both greater than 5.
iid
Recall for the binomial experiment:
 There are only two mutually exclusive outcomes
(success or failure) on each trial.
 A binomial distribution results from counting the
number of successes.
 Each trial is independent.
 The probability is fixed from trial to trial, and the
number of trials n is also fixed.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-60
The Normal Approximation to the
Binomial revisited
Recoding: Failure as 0 and success as 1.
 x/n is simply the proportion of success and hence the simple
average of the outcomes from the n trials.
 x/n will be approximately normal according to CLT.
 Hence x (=n*x/n) will also be approximately normal according to
CLT.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-61
Chi-square distribution:
 If z1, z2, z3….. zn i.i.d.standard normal variables, then
X2 = z12 +z22+z32…..zn2 has a c2 (n) distribution, with n
degrees of freedom.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-62
Sample Variance
 Let x1, x2, . . . , xn be a random sample from a population.
The sample variance is
1 n
2
s 
(x

x
)
 i
n  1 i1
2
 The square root of the sample variance is called the sample
standard deviation
 The sample variance is different for different random
samples from the same population
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-63
Sampling Distribution of sample Variances

The sampling distribution of s2 has mean σ2
E(s 2 )  σ 2

If the population distribution is normal, then
2σ 4
Var(s ) 
n 1
2

If the population distribution is normal then
(n - 1)s2
σ2
has a c2 distribution with n – 1 degrees of freedom
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-64
The Chi-square Distribution
 The chi-square distribution is a family of distributions, depending
on degrees of freedom:
 d.f. = n – 1
0 4 8 12 16 20 24 28
c2
0 4 8 12 16 20 24 28
c2
0 4 8 12 16 20 24 28
c2
 Text Table 7 contains chi-square probabilities
d.f. = 1
Ka-fu Wong © 2007
d.f. = 5
ECON1003: Analysis of Economic Data
d.f. = 15
Lesson6-65
Degrees of Freedom (df)
Idea: Number of observations that are free to vary
after sample mean has been calculated
Example: Suppose the mean of 3 numbers is 8.0
Let X1 = 7
Let X2 = 8
What is X3?
If the mean of these three values is 8.0,
then X3 must be 9
(i.e., X3 is not free to vary)
Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2
(2 values can be any numbers, but the third is not free to vary
for a given mean)
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-66
Chi-square Example
 A commercial freezer must hold a selected temperature
with little variation. Specifications call for a standard
deviation of no more than 4 degrees (a variance of 16
degrees2).
 A sample of 14 freezers is to be tested
 What is the upper limit (K) for the sample
variance such that the probability of
exceeding this limit, given that the
population standard deviation is 4, is less
than 0.05?
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-67
Finding the Chi-square Value
2
(n
1)s
χ2 
σ2
Is chi-square distributed with (n – 1) = 13
degrees of freedom
 Use the the chi-square distribution with area 0.05 in
the upper tail:
c2(13) = 22.36 (α = .05 and 14 – 1 = 13 d.f.)
probability
α = .05
c2
c2(13) = 22.36
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-68
Chi-square Example
c2(13) = 22.36 (α = .05 and 14 – 1 = 13 d.f.)
So:
 (n  1)s2

2

P(s  K)  P
 χ (13)   0.05
 16

2
(n  1)K
 22.36
16
or
so
K
(where n = 14)
(22.36)(16)
 27.52
(14  1)
If s2 from the sample of size n = 14 is greater than 27.52, there
is strong evidence to suggest the population variance exceeds 16.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-69
Lesson 6:
Sampling Methods and the Central
Limit Theorem
- END -
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson6-70