Confidence Interval on a Proportion - SI-35-02

Download Report

Transcript Confidence Interval on a Proportion - SI-35-02

CONCEPTS OF ESTIMATION
As its name suggests, the objective of estimation is to
determine the approximate value of a population parameter
on the basis of a sample statistic.
An estimator of a population parameter is a random variable
that is a function of the sample data.
An estimate is the calculation of a specific value of this
random variable. We can use sample data to estimate a
population parameter in two ways;
1. Point estimator
2. Interval estimator
IE - 2333
SWN
1
Definition 1 A point estimator draws inferences about a
population by estimating the value of an un
known parameter using a single value or point.
Definition 2 An interval estimator draws inferences about a
population by estimating the value of an
unknown parameter using an interval that is
likely to include the value of the population
parameter
IE - 2333
SWN
2
ESTIMATING THE POPULATION MEAN, WHEN
THE POPULATION VARIANCE IS KNOWN
Suppose that a population has unknown mean  and known
variance 2. A random sample of size n is taken from this
population, say : X1, X2, . . . Xn. The sample mean X is
reasonable point estimator for the unknown mean . A
100(1-) percent confidence interval on  can be obtained
by considering the sampling distribution of the sample mean X
 
2

E X   and Var X 
n
Therefore, the distribution of the statistic :
X 
Z
is a standard normal distribution
 n
IE - 2333
SWN
3


P  z 2  Z  z 2  1   , so that

P  z 2 

X 

P X  z 2 
n

 z 2  1   ,




X

z

n
2
n
  1  ,
This equation says that, with repeated sampling from this
population, the proportions of value of X for which  falls
between x  z

2

and
x

z

n
2
n
is equal to 1   
This interval is called the confidence interval estimator of 
IE - 2333
SWN
4
Definition : If x is the sample mean of random sample of size
n from a population with known variance 2,
a (1-)100% confidence interval on  is given by
x  z 2 
n
   x  z 2 
percentage point
distribution.
n
, where z 2 Is the upper
of
the
standard


IE - 2333
2
normal
Note :
The probability (1- ) is called the confidence level x  z
is called the lower confidence limit (LCL) x  z
is called the upper confidence limit (UCL)


2
n

2
n
SWN
5
SELECTING THE SAMPLE SIZE TO ESTIMATE
A POPULATION MEAN
One of the most common questions asked of statistician is,
how large should the sample taken is a survey be?
The answer to this question depends on three factors:
1. the parameter to be estimated
2. the desired confidence level of the interval estimator
3. the maximum error of estimation, where error of
estimation is the absolute difference between the point
estimator and the parameter; for example, the point
estimator of  is x , so in that case:
error of estimation  x  
The maximum error of estimation is also called the error
bound and is denoted B.
IE - 2333
SWN
6
Suppose the parameter of interest in an experiment is the
population mean . The confidence interval estimator
(assuming a normal population, with the population variance
known ) is : x  z 
2
n
If we want to estimate  to within a certain specified bound
B, we will want the confidence interval estimator to be :x  B
As a consequence, we have :
z 2

n
 B, solving for n,
we get the following result ;
 z 2  
n

 B 
IE - 2333
2
sample size necessary to estimate
 to within a bound B
SWN
7
CONFIDENCE INTERVAL ON A PROPORTION
It is often necessary to construct a confidence interval on a
proportion. Suppose that a random sample of size n has
been taken from a large (possibly infinite) population and
that X (n) observations in this sample belong to a class of
interest.
X is a point estimator of the proportion of the
Then P 
n population p that belong to this class.
If, X ~ BIN (n,p), we known that the sampling distribution of P
p 1 p 
is approximately normal with mean p and variance
n
Thus, Z  P  p  NOR  0,1
p 1 p 
n
IE - 2333
SWN
8
To construct the confidence interval on p, note
that :


P  z 2  Z  z 2  1  


P

p
 z 2   1  
so that : P   z 2 
p 1 p 


n



P P  z 2
IE - 2333
p 1 p 
n
 p  P  z 2
p 1 p 
n
  1 
SWN
9
Definition : Confidence Interval on a Proportion
If p is the proportion of observations in a random sample of
size n that belong to class of interest, then an approximate
100 (1-) percent confidence interval on the proportion p of
the population that belongs to this class, is :
p  z 2
pˆ 1 pˆ 
n
 p  p  z 2
pˆ 1 pˆ 
n
where z is the upper  2 percentage point of the standard
2
normal distribution.
IE - 2333
SWN
10
CHOICE OF SAMPLE SIZE
Since P is the point estimator of p, we can define the error in
estimating p by P as B  p  P
p 1 p 
n
If we set B  z 2
size is
and solve for n, the appropriate sample
 z 2 
n
 p 1  p 
 B 
(*)
The sample size from equation (*) will always be a maximum
for p = 0,5 that is, p(1-p)  0,25.
In other words, we are at least 100 (1-) percent confident
that the error in estimating p by P is less than B if the sample
z 
size is
n
  0, 25 
2

2
 B 
IE - 2333
SWN
11
CONFIDENCE INTERVAL ON THE MEAN OF A NORMAL
DISTRIBUTION, VARIANCE UNKNOWN
When sample size are small (n  30), we must use another
procedure. The usually assumption is that the population is
normally distributed.
This leads to confidence intervals based on the t distribution.
Let X1, X2, . . . , Xn be a random sample from a normal
distribution with unknown mean  and unknown variance 2.
The sampling distribution of the statistic
X 
is the t distribution with (n-1) degrees of
T
freedom.
S n
P  t  2 , n  1  T  t  2 , n  1   1  

P X  t  2 , n  1
IE - 2333
S
n
   X  t  2 , n  1
S
n
  1 
SWN
12
Definition : Confidence Interval on the mean of a Normal
Distribution, Variance unknown
If x and s are the mean and standard deviation of a random
sample from a normal distribution with unknown variance 2,
then a 100 (1-) percent confidence interval on  is given by
x  t  2 , n  1
S
n
   x  t  2 , n  1
S
n
where t  2 , n  1 is the upper
percentage point of the t
distribution with (n-1) degrees of freedom.

IE - 2333

2
SWN
13
CONFIDENCE INTERVAL ON THE VARIANCE
OF A NORMAL DISTRIBUTION
Suppose that we wish to find a confidence interval estimate
for the variance 2 of a normal population.
If X1, X2, …Xn is a random sample of size n from this normal
population and if S2 is the sample variance, then S2 is a
reasonable point estimator of 2.
Futhermore, S2 is used in finding the confidence interval for
2. If the population is normal, the sampling distribution of :
X
2
n

1
S
 

2
is chi-square with (n-1) degrees of freedom.
IE - 2333
SWN
14
To develop the confidence interval :
P   2 1   2 , n  1  X   2   2 , n  1   1  
2
 2

n

1
S


2 

   2 , n  1   1  
so that : P   1  2 , n  1 
2



This last equation can be rearranged as :
2
  n  1 S 2

n

1
S


2
P  2
  2
  1  


 1  2 , n  1 
   2 , n  1
IE - 2333
SWN
15
Definition : Confidence Interval on the Variance of a Normal
distribution.
If s2 is the sample variance from a random sample of n
observations from a normal distribution with unknown
variance 2, then a 100 (1-) percent confidence interval on
2 is :
2
2
n

1
s
n

1
s
 


2
  2
2 
  2 , n  1
 1   2 , n  1
2 
2

and

,
n

1
1   2 , n 1 are the upper and the
 2

where
lower  2 percentage points of the chi-square distribution with
(n-1) degrees of freedom respectively.
IE - 2333
SWN
16
ONE SIDED CONFIDENCE INTERVALS
To find a 100 (1-) percent lower-confidence interval on 2,
giving by :
2
n

1
s
  2
2
  , n  1
The 100 (1-) percent upper-confidence interval is :
2
n

1
s


2
  2
 1   , n  1
IE - 2333
SWN
17