of estimation for x MEAN is given by se = s/ n where

Download Report

Transcript of estimation for x MEAN is given by se = s/ n where

Estimation Procedures
Point Estimation
Confidence
Interval
Estimation
Three Properties of Point Estimators
1. Unbiasedness
2. Consistency
3. Efficiency
Estimate Number
1
2
3
4
5
Error
+6
+8
-10
+2
-6
Error
0
0
0
-1
0
The estimates in green are more efficient
(smaller standard error) but the estimates
in red are unbiased
20
The Sampling Distribution of
xMEAN for ‘large’ samples
xMEAN
The standard error (s.e.) of estimation
for xMEAN is given by
s.e. = s/n
where s is the population standard
deviation and n is the sample size
s.e. = s /n
Q. Why is the standard error (s.e.)
directly related to s?
A.If the population is more varied
(dispersed) it is more difficult to locate
the ‘typical’ value
In which case are you likely to predict
the population mean more accurately??
1. The age distribution of all students in
English schools, or
2. The age distribution of all students in
English sixth form colleges?
s.e. = s /n
Q. Why is the s.e. inversely related to the
sample size?
A. The larger the n, the more
‘representative’ the sample is of the
population and hence the smaller
sampling error
Confidence Interval (CI)
Sometimes, it is possible and
convenient to predict, with a certain
amount of confidence in the
prediction, that the true value of the
parameter lies within a specified
interval.
Such an interval is called a
Confidence Interval (CI)
The statement ‘ [mL, mH] is the 95%
CI of m’ is to be interpreted that
with 95% chance the population
mean lies within the specified
interval and with 5% chance it
lies outside.
Two points to appreciate about the
CI
A. The larger the standard error,
longer is the CI, ceteris paribus
B. The higher the level of
confidence, the longer is the CI,
ceteris paribus
The area shaded
orange is
approximately
98% of the whole
-2.33
0
+2.33
The area shaded
orange is
approximately
95% of the whole
-1.96
0
+1.96
Example1 (Confidence Interval
for the population mean):
Suppose that the result of
sampling yields the following:
xMEAN = 25 ; n = 36. Use this
information to construct a 95%
CI for m, given that s = 16
Since n >24, we can say that xMEAN
is approximately Normal(m, s2/36).
Standardisation means that (xMEAN
- m)/(s/6) is approximately z.
Now find the two symmetric points
around 0 in the z table such that the
area is 0.95. The answer is
z = 1.96.
Now solve
 (xMEAN - m)/(s/6) = 1.96.
 (25- m)/(16/6) = 1.96 to get
two values of m = 19.77 and m =
30.23. Thus, the 95% CI for m is
[19.77 30.23]
Question: How is the length of the CI
related to the standard error?
Answer: Ceteris Paribus, the CI
is directly related to standard error
Example 2 :(Confidence Interval
for the population mean):
Suppose that the result of
sampling yields the following:
xMEAN = 25 ; n = 36. Use this
information to construct a 95%
CI for m, given that s = 32
Now solve
 (xMEAN - m)/(s/6) = 1.96.
 (25- m)/(32/6) = 1.96 to get
two values of m = 14.55 and m =
35.45. Thus, the 95% CI for m is
[14.55 35.45]
Compare with the 95% CI for
[19.77 30.23] for s = 16
Question: How is the length of the CI
related to the level of confidence?
Answer: Ceteris Paribus, the CI
will be longer the higher the level
of confidence.
Example 3 :(Confidence Interval
for the population mean):
Suppose that the result of
sampling yields the following:
xMEAN = 25 ; n = 36. Use this
information to construct a 90%
CI for m, given that s = 16
Solve
 (xMEAN - m)/(s/6) = 1.645.
 (25- m)/(16/6) = 1.645 to get
two values of m = 20.61 and m =
29.39. Thus, the 90% CI for m is
[20.61 29.39]
Compare with the 95% CI for
[19.77 30.23]
Some Procedural Problems in
Parametric Analysis
1. The sample size n is ‘small’
The CLT does not work! To do any kind of
parametric analysis we need the
population to be normally distributed
Case 1: The population standard
deviation s is known
Theory: If X is normal(m, s2 ) then
xMEAN is also normal(m, s2 /n)
Example4: (Confidence Interval
for the population mean with
small samples):
Suppose that the result of
sampling from a normal
population with s = 4 yields the
following:
xMEAN = 25 ; n = 18. Use this
information to construct the 90% CI
for m,
Since X is normal(m, 42 ) then xMEAN is
also normal(m, 42 /18)
(xMEAN - m)/(4/18) = 1.645.
(25- m)/(4/ 18) = 1.645
m= 26.55, or m= 23.45
The required CI is [23.45, 26.55]
1. The sample size n is ‘small’
Case 2: The population standard deviation s
is unknown
Theory: If X is normal(m, s2 ) then xMEAN is
also normal(m, s2 /n) with s unknown
Theory: If xMEAN is normal(m, s2 /n) with s
unknown, then (xMEAN –m)/s/n
has a t-distribution with (n-1) degrees of
freedom.
s ≡ (fi(xi – xMEAN)2/(n-1) for grouped
data
s ≡ ((xi – xMEAN)2/(n-1) for raw data,
Example5: (Confidence Interval
for the population mean):
Suppose that the result of
sampling from a normal
population yields the following:
xMEAN = 25 ; n = 18. Use this information
to construct a 95% CI for m, given that s2
= 16
First, note that as s is unknown, we use s
for s.
But since n < 24, we can only say that
xMEAN has a t-distribution with 17 degrees
of freedom.
Now find from the t-distribution table the
two symmetric values of t such that the
area in between them is 0.95.
The answer is t =  2.11. Now
solve
(xMEAN - m)/(s/6) =  2.11
(25- m)/(16/6) = 2.11
to get two values of mL = 20.36 and
mH= 29.63. Thus the 95% CI for m
is [19.37, 30.63].
2.The population standard
deviation(s) is unknown but the
sample size is ‘large’:
We estimate s by either of the two
estimates, s or where
s ≡ ((xi – xMEAN)2/N for raw data,
and
s ≡ (fi(xi – xMEAN)2/N for grouped
data
Then we proceed as in Example1
above.
The Sampling Distribution of the
Sample proportion (p)
Suppose that the population mean
p= 0.6 and consider the following
statistical process
Sample Number
1
2
3
100
Value of p
0.48
0.54
0.65
0.5
Density
p  Sample
Proportion
p
p
This is the distribution of p provided
np and n(1- p) are  5
Density
p  Sample
Proportion
p
p
This is the distribution of p provided
np and n(1- p) are  5
Density
p  Sample
Proportion
p
p
This is the distribution of p provided
np and n(1- p) are  5
Density
p  Sample
Proportion
p
As n gets larger
p
Density
p  Sample
Proportion
p
and larger….
p
Density
p  Sample
Proportion
p
and larger….
p
Density
p  Sample
Proportion
p
and larger….
p
Density
p  Sample
Proportion
p
The distribution gets more compact
around the mean value (p)
p
Density
p  Sample
Proportion
p
The distribution gets more compact
around the mean value (p)
p
Density
p  Sample
Proportion
p
p
The distribution gets more compact
around the mean value (p)
Density
Sample Size: n3
Sample Size: n2
Sample Size: n1
p
p
The distribution of the sample proportion
(p ) for three sample sizes:
n1 < n2 < n3
Properties of p
1. p is an unbiased estimator of the
population mean m
E(p ) = p
2. Standard error of p (s.e.p) is
given by s.ep = {p(1-p)/n}
Therefore, p is a consistent
estimator of p
Example1: (Confidence Interval
for the population proportion):
Suppose that the result of
sampling yields the following:
p= 0.4 ; n = 36.
Use this information to construct a
98% CI for p.
First, we do the validity check.
This requires np  5 as well as n(1p)  5.
Because we don’t know what p is, we
use p in the place of p.
Since p = 0.4 and n > 30, the validity
check is satisfied.
We can therefore say that p is
approximately N(p,s2/36) where s2 = p(1p).
Standardisation means that (p-p)s/6 is
approximately z.
Now find the two symmetric points
around 0 in the z table such that the area
is 0.98. The answer is
z = 2.33.
Now solve
(p-p)/s/6 = 2.33
(0.4-p)/s/6)= 2.33
In this expression we do not know what
p is, so we don’t know what s is.
We use 0.4 as a point estimator for p
and calculate an estimate for s,s* =
0.49
(0.4- p)/ 0.49/6 = 2.33 to get
two values of pL = 0.21 and pH =
0.59.
Thus the 98% CI for p is [0.21
0.59]