Chapter 2 - Confidence Interval / Estimation
Download
Report
Transcript Chapter 2 - Confidence Interval / Estimation
Chapter 2
Statistical Inference
Estimation
-Confidence interval estimation for mean and
proportion
-Determining sample size
Hypothesis Testing
-Test for one and two means
-Test for one and two proportions
Statistical Inference
Statistical inference is a process of drawing an inference about
the data statistically. It concerned in making conclusion about
the characteristics of a population based on information
contained in a sample. Since populations are characterized by
numerical descriptive measures called parameters, therefore,
statistical inference is concerned in making inferences about
population parameters.
ESTIMATION
In estimation, there are two terms that firstly, should be
understand. The two terms involved in estimation are
estimator and estimate.
An estimate of a population parameter may be expressed in
two ways: point estimate and interval estimate.
Point Estimate
A point estimate of a population parameter is a single value
of a statistic. For example, the sample mean x is a point
estimate of the population mean μ. Similarly, the sample
proportion p̂ is a point estimate of the population
proportion p.
Interval estimate
An interval estimate is defined by two numbers, between
which a population parameter is said to lie.
For example, a < x < b is an interval estimate of the
population mean μ. It indicates that the population mean is
greater than a but less than b.
Point estimators
Choosing the right point estimators to estimate a parameter
depends on the properties of the estimators it selves. There
are four properties of the estimators that need to be satisfied
in which it is considered as best linear unbiased estimators.
The properties are:
Unbiased
Consistent
Efficient
Sufficient
Confidence Interval
• A range of values constructed from the sample data. So that
the population parameter is likely to occur within that range at
a specified probability.
• Specified probability is called the level of confidence.
• States how much confidence we have that this interval
contains the true population parameter. The confidence level is
denoted by
• Example :- 95% level of confidence would mean that if 100
confidence intervals were constructed, each based on the
different sample from the same population, we would expect
95 of the intervals to contain the population mean.
To compute a confidence interval, we will consider
two situations:
i. We use sample data to estimate, with X and the
population standard deviation is known.
ii. We use sample data to estimate, with X and the
population standard deviation is unknown. In this
case, we substitute the sample standard deviation
(s) for the population standard deviation
Example 2.1:
Find 95% confidence interval for a population mean for these
values :
a) n 36, x 13.3, s 2 3.42
b) n 64, x 2.73, s 2 0.1047
a) 1st Step: 1 100 95
1 0.95
0.05
0.025
2
2nd Step: Find from table page 26. Z0.025 1.96
3rd Step: Use formula.
CI x Z 2
s
n
1.8493
CI 13.3 (1.96)
36
= 13.3 0.6041
= 12.6959,13.9041
95% confidence interval of mean lies in between 12.6959 to
Example 2.2 :
The mean and standard deviation of the maximum loads
supported by sample of 60 cables are given 11.09 tons and
0.73 tons. Find 95% confidence interval of the mean of the
maximum loads all cables produced by company.
Example 2.3:
The brightness of a television picture tube can be evaluated by
measuring the amount of current required to achieve a particular
brightness level. A random sample of 10 tubes indicated a sample
mean 317.2microamps and a sample standard deviation is
15.7microamps. Find (in microamps) a 99% confidence interval
estimate for mean current required to achieve a particular
brightness level.
Solution:
s 15.7
x 317.2
s 15.7, n 10 30, x 317.2
For 99% CI: 99% 1 100%
1 0.99
0.01
0.005
2
From t normal distribution table:
t ,n 1 t0.005 ,9 3.250
2
Hence 99% CI
15.7
317.2 t0.005 ,9
10
15.7
317.2 3.250
10
301.0645,333.3355 microamps
Thus, we are 99% confident that the mean current required to
achieve a particular brightness level is between 301.0645 and
333.3355
Exercise 2.1:
Taking a random sample of 35 individuals waiting to be
serviced by the teller, we find that the mean waiting
time was 22.0 min and the standard deviation was 8.0
min. Using a 90% confidence level, estimate the mean
waiting time for all individuals waiting in the service
line.
Answer : [19.7757, 24.2243]
Confidence Interval Estimates for the differences between two
population mean,
1
i) Variance
X
1
12
and
X 2 Z
2
2
22
12
n1
are known
22
n2
ii) If the population variances, and are unknown, then the
following tables shows the different formulas that may be used
depending on the sample sizes and the assumption on the
population variances.
2
1
2
2
Equality of
variances,
12 , 2 2 when
are unknown
2
2
1
Sample size
n1 30, n2 30
X
2
1
X 2 Z
2
X1 X 2 Z S p
12 22
2
Sp
2
n1 30, n2 30
2
1
X
2
s
s
2
n1 n2
1
1
n1 n2
n1 1 s12 n2 1 s2 2
n1 n2 2
1
X 2 t
2
,v
s12 s2 2
n1 n2
2
s
s2
n1 n2
v
2
2
s12 s2 2
n
n
1 2
n1 1
n2 1
2
1
X
1 X 2 t S p
2
Sp
2
2
,v
1
1
n1 n2
n1 1 s12 n2 1 s2 2
n1 n2 2
v n1 n2 2
Example 2.3:
Two machines are used to fill plastic bottles with liquid laundry
detergent. The standard deviations of fill volume are known to be
0.10 and 0.15 fluid ounce for the two machines, respectively.
Two random samples of bottles from the machine 1 n1 14 and
bottles from machine 2 are selected, n2 12 and the sample means
fill volume are x 30.5 and x 29.4 fluid ounces.
Construct a 90% confidence interval on the mean difference in fill
volumes. Interpret the results.
1
2
1
2
Solution:
1 100% 90
Machine 1:
x1 30.5
Machine 2:
x2 29.4
1 0.10
n1 14
2 0.15
n2 12
X
12 2 2
0.102 0.152
30.5 29.4 Z 0.05
n1 n2
14
12
1
X 2 Z
2
1 0.90
0.1
0.05
2
1.1 1.6449 0.0509
1.0163,1.1837
We are 90% confidence that the mean difference to fill volumes
lies between 1.0163 and 1.1837 fluid ounces.
Exercise 2.2:
17 male undergraduate students and 20 female undergraduate
students are randomly selected from faculty of mechanical
engineering. Result for test 2 SSM 3763 shown the following
data:
Male : X M 82, S M 8
Female : X F 76, S F 6
Assume that both population are normally distributed and have
equal population variances. Construct a 95% confidence
interval for the difference in the two means.
Answer : [1.3217, 10.6783]
Example 2.4:
According to a poll, 40% of working women says that they feel
stress in working. The poll was based on a randomly selected of
1502 working women aged 30 and above. Construct a 95%
confidence interval for the corresponding population
proportion.
Solution:
Let p be the proportion of all working women age 30 and
above, who have a limited amount of time to relax, and let pˆ be
the corresponding sample proportion. From the given
information,
n = 1502 , pˆ = 0.40 , qˆ =1− pˆ = 1 – 0.40 = 0.60
ˆpqˆ
Hence, 95% CI :
p̂ Z
2
n
0.40 Z 0.025
0.4 0.6 0.4 0.01264069
1502
0.375,0.425 or 37.5% to 42%
Thus, we can state with 95% confidence that the proportion of
all working women aged 30 and above who have a limited
amount of time to relax is between 37.5% and 42.5%.
Exercise 2.3
In a random sample of 70 automobiles registered in a
certain state, 28 of them were found to have emission
levels that exceed a state standard. Find a 95%
confidence interval for the proportion of automobiles in
the state whose emission levels exceed the standard.
Answer : [0.2852, 0.5148]
Example 2.5:
Two separate surveys were carried out to investigate whether
or not the users of Plus highway were in favour of raising the
speed limit on highways. Of the 250 car drivers interviewed,
220 were in favour of raising the speed limit while of the 200
motorists interviewed , 180 were in favour of raising the speed
limit . Find a 95% confidence interval for the difference in
proportion between the car drivers and motorist who are in
favour of raising the speed limit.
Solution:
ˆpc
220
180
ˆ
0.88, pm
0.9
250
200
Hence, 95% CI :
ˆpc qˆ c ˆpm qˆ m
ˆp ˆp Z
c
m
2
nc
nm
0.88 0.9 Z 0.025
0.88 0.12 0.9 0.1
250
200
0.02 1.9600 0.03
0.0788,0.0388
We are 95% confident that the difference between the car
drivers and motorist who are in favour of raising the speed
limits lies between -0.0788 and 0.0388.
Exercise 2.4
In a test of the effect of dampness on electric
connections, 100 electric connections were tested
under damp conditions and 150 were tested under dry
conditions. Twenty of the damp connections failed and
only 10 of dry ones failed. Find a 90% confidence
interval for the difference between the proportions of
connections that fail when damp as opposed to dry.
Answer : [0.0591, 0.207]
Error of estimation and choosing the sample size
When we estimate a parameter, all we have is the estimate value
from n measurements contained in the sample. There are two
questions that usually arise:
(i) How far our estimate will lie from the true value of the
parameter?
(ii) How many measurements should be considered in the
sample?
The distance between an estimate and the estimated parameter is
called the error of estimation.
For example if most estimates are within 1.96 standard
deviations of the true value of the parameter, then we would
expect the error of estimation to be less than 1.96 standard
deviations of the estimator, with the probability approximately
equal to 0.95.
z /2
n
, where n is rounded up to the nearest number.
E
2
z
2
p 1 p
B
n
Example 2.6:
The college president asks the statistics teacher to estimate the
average age of the students at their college. The statistics teacher
would like to be 99% confident that the estimate should be
accurate within 1 year. From the previous study, the standard
deviation of the ages is known to be 3 years.
How large a sample is necessary?
Solution:
B 1, s 3, confidence coefficient 99%, thus 1 0.99
From the table,
Z0.005 2.5758
Z
2
s
3
B: Z 0.005
1
n
n
2.5758
3
1
n
n 59.71 60 student
0.01,
2
0.005
Exercise 2.5:
The diameter of a two years old Sentang tree is normally
distributed with a Standard deviation of 8 cm. How many
trees should be sampled if it is required to estimate the
mean diameter within ± 1.5 cm with 95% confidence
interval?
Answer : 110 trees
EXERCISES
Exercise 2.6
A tire manufacturer wishes to investigate the tread life
of its tires. A sample of 10 tires driven 50, 000 miles
revealed a sample mean of 0.32 inches of tread
remaining with a standard deviation of 0.09 inches.
Construct a 95 percent confidence interval for the
population mean. Would it be reasonable for the
manufacturer to conclude that after 50, 000 miles the
population mean amount of tread remaining is 0.30
inches?
Answer : [0.2556, 0.3844]
Exercise 2.7
Resin-based composites are used in restorative dentistry.
A comparison of the surface hardness of specimens
cured for 40 seconds with constant power with that of
specimens cured for 40 seconds with exponentially
increasing power. 15 specimens were cured with each
method. Those cured with constant power had an
average surface hardness (in N/mm) of 400.9 with a
standard deviation of 10.6. Those cured with
exponentially increasing powder had an average surface
hardness of 367.2 with a standard deviation of 6.1. Find
a 98% confidence interval for the difference in mean
hardness between specimens cured by two methods.
Answer: [25.7804, 41.6196]
Exercise: 2.8
The wedding ceremony for a couple, Jamie and Robbin will be
held in Menara Kuala Lumpur. A survey has been carried out
to determine the proportion of people who will come to the
ceremony. From 250 invitations, only 180 people agree to
attend the ceremony. Find a 90% confidence interval estimate
for the proportion of all people who will attend the ceremony.
Answer : [0.6733, 0.7767]