Statistical Inference

Transcript Statistical Inference

Chapter 7
Statistical Inference and Sampling
Normal Curve for Population
 Individual observations, X’s, follow a normal distribution with mean =
μ and standard deviation = σ. The following figure portrays the shape
of normal population.

μ

x
That is, X is a normal random variable. The corresponding standard
normal variable Z can be obtained by the following.
2
Z
X 

Examples on Normal Curve for Population
 The estimated miles-per-gallon ratings of a class of trucks are
normally distributed with a mean of 12.8 and a standard deviation of
3.2. What is the probability that one of these trucks selected at
random would get between 13 and 15 miles per gallon?
Z
12.8 13
15
X
P(13  X  15)  ?
X 

13  12.8
 0.0625  0.06
3.2
So, P(0  z  z1)  0.0236
z1 
Or, the area from mean to z1 = 0.0239
15  12.8
 0.6875  0.69
3.2
So, P(0  z  z 2)  0.2549
z2 
3
0
z1
z2
P( z1  z  z 2)  ?
z
Or, the area from mean to z2 = 0.2549
Or, the area from z1 to z2 = ?
So, the area from z1 to z2
= 0.2549 – 0.0239 = 0.231
Examples on Normal Curve for Population
 The examination committee of the American Society for Quality
passes 40% of those that take the exam. If the scores follow a
normal distribution with an average score of 75 and a standard
deviation of 16, what is a minimum passing score?
Z
40%
75 X
X 

X
P( X  ?)  40%  0.40
X  75
16
X  75  (16)(0.26)
0.26 
X  75  4.16  79.16
40%
4
0
z
Z
P( z  ?)  40%  0.40
The area from mean to z = 0.50 – 0.40 = 0.10
So, z = 0.26 [From Normal Dist. Table]
Estimation
 Statistical estimation is the process of estimating a parameter of a
population from a corresponding sample statistic.
 Example: Usually population means (μ) are unknown and have to be
estimated from sample means (X ).
 Two Approaches to Statistical Estimation


5
Point estimate: A single value that represents the best estimate of the
population value. For example, the sample mean (X ) is the best point
estimate for the population mean (μ). Similarly, the sample standard
deviation (s) is the best point estimate for the population standard
deviation (σ). That is,
μ = X-bar, and σ = s.
Interval estimation: Builds on point estimate to arrive at a range of
values that we are confident contain the population parameter. The
range of values is called a confidence interval. For example, the
confidence interval for population mean (μLL≤ μ ≤ μUL) can be estimated
from the sample mean.
X-bar
μLL
μUL
Note that μLL and μUL are equidistant from X-bar, and are estimated from X-bar
Distribution of X-bar
 X-bar is a random variable, because different samples drawn from the same
population on a specific characteristic will result in different values of X-bar.
 Since the sample mean, X-bar, is used to estimate the population mean, μ,
we need to understand how X-bar behaves. That is, if we observe values of
X-bar indefinitely, where will they center and how will they spread out?
 X-bar is normally distributed regardless of the shape of the sampled
population. That is, if we observe values of X-bar indefinitely and plot these
values in a graph, we will obtain a normal curve.
 The distribution of X-bar is based on the Central Limit Theorem. Central
Limit Theorem states that when obtaining large samples (generally n > 30)
from any population, the sample mean, X-bar, will follow an approximate
normal distribution.
6
X
 The probability distribution of X-bar is called the sampling distribution of Xbar.
Sampling Distribution of the Sample Mean
The mean of the distribution of X-bar is denoted by μX-bar and equals
μ. That is, μX-bar = μ.
 The standard deviation of the distribution (denoted by σX-bar) equals
σ/SQRT(n). That is, σX-bar = σ/SQRT(n). The standard deviation of the
distribution is called the standard error.

x =
7
σ
n
µx = μ
X follows a normal distribution, centered
at µ with a standard deviation  / n
X
 The corresponding standard normal variable Z of X-bar can be
obtained by the following.
Z
X  X
X

X 

n
Normal Curves for Population and Sample Mean
Population (mean = µ,
standard deviation = )
X = value from this
population
Assumes the individual
observations follow a
normal distribution
Random sample (mean = X,
standard deviation = s
X follows a normal distribution, centered
at µ with a standard deviation  / n
8
x =

μ
x
µx = μ
σ
n
X
Example of Normal Population and Sampling Distribution of Mean
 The life span of Good Old Everglo Bulbs follows a normal distribution with a
mean of life of 400 hours and a standard deviation of 30 hours.
a) What percentage of bulbs sold would you expect to last more than 445
hours?
b) What is the probability that 4 bulbs selected at random will have an
average life span of more than 445 hours?
x =
 = 30
30
4
P(X>445)
μ = 400 445
x
µx =μ=400 445
P(X-bar>445)
X
9
Z
X 


445  400
 1.5
30
P(X > 445)
= P(Z>1.5) = 0.5 – 0.4332 = 0.0668
Z
X 

n

445  400 45

 3.0
30
15
4
P(X-bar > 445)
= P(Z>3.0) = 0.5 – 0.4987 = 0.0013
Confidence Intervals (CI) for Population Mean
 According to the distribution of X-bar, the mean of all possible values
of X-bar gives the population mean. Then why estimate the
population mean?
σ
x =
n
X
µx = μ
 CI for µ builds on sample mean to arrive at a range of values that will
10
include the population mean. The boundaries of these values are
called confidence limits. There are two confidence limits – lower limit
and upper limit.
x =
σ
n
X
µLL
µUL
Confidence Intervals (CI) for Population Mean
 How can we obtain μLL and μUP?
x =
σ
n
z / 2 
We know that,
X
µLL
µUL
-zα/2
+zα/2
For µLL,
 z / 2 
X   LL

11
For µLL,
z / 2 


n
,  LL  X  z / 2 

n
n
X  UL
X 
,  UL  X  z / 2 
n
Therefore, CI for   X  z / 2 

n

n
Confidence Intervals (CI) for Population Mean
 How to obtain Z values?
The values (–z) and (+z) are equidistant from the center
of the curve. The area from (-z) to (+z) is called the
confidence level (CL).
Significance Level
Confidence Level
The significance level equals (1 – CL) and is denoted
by α (alpha). We can obtain Z values if we know
either the significance level or the confidence level.
X
µLL
µUL
-zα/2
+zα/2
Confidence Level + Significance Level = 1
To obtain the Z value, we need to know the area from
the center of the curve to the Z value. This area equals
(CL/2). Use Normal Distribution Table to obtain Z value.
 When the population standard deviation, σ, is known, the distribution
12
of X-bar follows a Z normal distribution. Therefore, we use the
following to calculate the CI for population mean when σ is known.
CI for   X  z / 2 

n
Examples on CI for Population Mean When σ Is Known
 A random sample of 100 observations is obtained from a normally
distributed population with a standard deviation of 10. What is a 95%
confidence interval for the mean of the population if the sample
mean is 40?
0.95
- Zα/2
0.475
Z α/2 = 1.96
X-bar = 40, n = 100, σ = 10, Zα/2 = 1.96

13
 10 
95% CI for   X  z / 2 
 40  1.96
  (38.04,41.96)
n
 100 
Examples on CI for Population Mean When σ Is Known
 Find the 90% confidence interval for the mean of a normally
distributed population using the following data. Assume a standard
deviation of 5.
49 50
43
65
52
45
60
38
62
0.90
0.45
- Zα/2
Zα/2 = 1.65
X-bar = 464/9 = 51.56, n = 9, σ = 5, Zα/2 = 1.65
14

 5 
95% CI for   X  z / 2 
 51.56  1.65
  (48.81,54.31)
n
 9
CI for Population Mean When σ Is Unknown
 When σ is unknown, (1) the distribution of X-bar follows a t normal
distribution instead of Z normal distribution, and (2) σ is estimated by
the sample standard deviation, s.
s
x =
n
We know that,
X 
t / 2,n 1 
s
n
X
15
µLL
µUL
-t α/2,n-1
+t α/2,n-1
X   LL
For µLL,  t / 2 , n 1 
,  LL  X  t / 2,n 1 
s
n
X  UL
For µLL,
t / 2,n 1 
,  UL  X  t / 2,n 1 
s
n
s
Therefore, CI for   X  t / 2,n 1 
n
s
n
s
n
CI for Population Mean When σ Is Unknown (Cont.)
 However, when the sample size is large (n ≥ 30), t values get closer
to z values. Also not all t values are available when degrees of
freedom is more than 30. Therefore, for convenience’s sake, when n
≥ 30 and σ is unknown, we use z distribution instead. That is,
s
CI for   X  t / 2,n 1 
n
s
CI for   X  z / 2 
n
( is unknown and n  30)
( is unknown and n  30)
 How to obtain t values?
16
We need two parameters:
(1) The area at the right of t value
(2) Degrees of Freedom = n – 1.
α
tα
Examples on How to Obtain t Values
 For a t distribution with 20 degrees of freedom, what is the value of
the t value such that the following are true?
 10% of the area under the t distribution
is to the right of the t value.
t0.10, 20 = 1.325
0.10
t0.10, 20
 10% of the area under the t distribution
is to the right of the t value.
0.90
t0.90, 20 = -1.325
17
0.10
- t0.10, 20 = t0.90, 20
 5% of the area under the t distribution
is to the left of the t value.
-t0.05, 20 = -1.725
0.05
- t0.05, 20
0.05
t0.05, 20
Examples on CI for Population Mean When σ Is Unknown
 A random sample of size 20 is selected from a normally distributed
population. The sample mean is 50 and the sample standard
deviation is 10. Find a 90% confidence interval for the population
mean.
α/2 = 0.05
α/2 = 0.05
0.90
0.45
t0.05, 19 = 1.729
X-bar = 50, n = 20, s = 10, tα/2, n-1 = 1.729

18
 10 
95% CI for   X  t / 2,n 1 
 50  1.729
  (46.13,53.87)
n
 20 
Examples on CI for Population Mean When σ Is Unknown
 Find the 95% confidence interval for the mean of a normally
distributed population using the following data.
49 50
43
65
52
45
60
α/2 = 0.025
α/2 = 0.025
0.95
0.475
t0.025, 8 = 2.306
X-bar = 464/9 = 51.55, n = 9, s = 9.15, tα/2, n-1 = 2.306
95% CI for   X  t / 2,n 1 
19

n
 9.15 
 51.56  2.306
  (44.53,58.59)
 9 
38
X
49
50
43
65
52
45
60
38
62
X-bar (X - X-bar)2
51.56
6.55
51.56
2.43
51.56
73.27
51.56
180.63
51.56
0.19
51.56
43.03
51.56
71.23
51.56
183.87
51.56
108.99
670.22
X  X 

2
s2
n 1
62

670.22
 83.78
8
s  s 2  83.78  9.15
Margin of Error, E, And Determination of the Sample Size
The general formula for constructing CI is:
CI = statistic ± (critical value) × (standard error of the statistic)
CI for   X  z / 2 

n
CI = statistic ± (Margin of Error)
E  z / 2 
20

(1) σ is estimated by s.
(2) σ is approximated by (H – L)/4.
n
z / 2  
n
E
 z / 2   
n

 E 
Sample Size for Unknown σ:
2
Examples on Determination of the Sample Size
 A national retail association wants to estimate the average amount
of dollars lost each month due to theft in its member stores. Past
records show that the highest and lowest dollar amounts lost due to
theft were $1325 and $25, respectively. If it wants to be 95%
confident that the error in its estimate is no more than $100, how
many stores would need to be included in the sample to produce an
estimate of the desired accuracy?
0.475
0.95
- Zα/2
21
Zα/2 = 1.96
n = ?, E = 100, Zα/2 = 1.96, σ ≈ (H – L)/4 = (1325 – 25)/4 = 325
 z / 2   
 (1.96)(325) 
n

 40.58  41


100


 E 
2
2
Examples on CI for Population Mean When σ Is Unknown
 A national retail association wants to estimate the average amount of
dollars lost each month due to theft in its member stores. Nine of its
member stores lost the following dollar amounts last month. If it wants to be
95% confident that the error in its estimate is no more than $5, how many
stores would need to be included in the sample to produce an estimate of
the desired accuracy?
49 50
43
65
52
45
60
38
62
n = ?, E = 5, Zα/2 = 1.96, σ = ?
X  X 

2
22
X
49
50
43
65
52
45
60
38
62
X-bar)2
X-bar (X 51.56
6.55
51.56
2.43
51.56
73.27
51.56
180.63
51.56
0.19
51.56
43.03
51.56
71.23
51.56
183.87
51.56
108.99
670.22
s2
n 1

670.22
 83.78
8
s  s 2  83.78  9.15
σ = s = 9.15
 z  
 (1.96)(9.15) 
n    /2   
 12.86  13

5


 E 
2
2

Statistical Inference

Transcript Statistical Inference

Directory