Prob. Review IV

Download Report

Transcript Prob. Review IV

South Dakota
School of Mines & Technology
Introduction to
Probability & Statistics
Industrial Engineering
South Dakota
School of Mines & Technology
Estimation
Industrial Engineering
Estimation
Point Estimates
Industrial Engineering
Overview
 Point
versus interval estimates
 Estimators of a population mean
 Estimators of a population proportion
 Estimators of a population variance
 Estimating other parameters
Method of Moments
Maximum Likelihood Estimates
Statistics

Descriptive Statistics
is used to summarize a collection of data in a clear
and understandable way.
Ex: Histograms, box plots, stem and leaf plots.

Inferential Statistics
are used to draw inferences about a population from
a sample.
Ex: 10 subjects perform a task after 3 hours of
training. They score 12 points higher than 10 subjects
who perform the same task with no training. Is the
difference real or could it be due to chance?
Point Estimators
 Suppose
we have a new type of light bulb
and we wish to test the bulbs for mean time
to burn out.
Point Estimators
 We
out.
select 10 bulbs at random and test to burn
Point Estimators
n
i
1
2
3
4
5
6
7
8
9
10
Xi
1569
1483
1540
1630
1680
1575
1782
1661
1580
1462
xi
X =
i =1 n
1569 + 1483 + . . . + 1462
=
10
= 1,596.2
Point Estimators
i
1
2
3
4
5
6
7
8
9
10
Xi
1739
1567
1643
1490
1517
1674
1831
1462
1536
1571
X=
1739 + 1567 + . . . + 1571
= 1,603.0
10
South Dakota
School of Mines & Technology
Estimation
Industrial Engineering
Estimation
Interval Estimates
Industrial Engineering
Interval Estimates
 Suppose
our light bulbs have some
underlying distribution f(x) with finite mean
m and variance s2. Regardless of the
distribution, recall from that central limit
theorem that
X  N (m , s
n
)
Interval Estimates
 Recall
that for a standard normal distribution,
a/2
a/2
1-a
za/2
0
za/2
1  a = P( za / 2  Z  za / 2 )
Interval Estimates
But,
X  N (m , s
so, Z =
X m
s
n
)
 N ( 0 ,1)
n
Then,
1  a = P ( za / 2 
xm
s
n
 za / 2 )
Interval Estimates
1  a = P ( za / 2 
xm
s
 za / 2 )
n
= P( za / 2
s
n
= P ( x  za / 2
 x  m  za / 2
s
n
s
n
)
  m   x  za / 2
s
n
)
Interval Estimates
1 - a = P ( x  za / 2
= P( x + za / 2
s
n
s
n
  m   x  za / 2
 m  x  za / 2
s
n
)
s
n
)
Interval Estimates
1 - a = P ( x + za / 2
s
n
 m  x  za / 2
s
n
)
In words, we are (1 - a)% confident that the
true mean lies within the interval
x  za / 2
s
n
Example
 Suppose
we know that the variance of the
bulbs is given by s2 = 10,000. A sample of 25
bulbs yields a sample mean of 1,596. Then a
90% confidence interval is given by
100
1,596  1.645
25
1,596  32.9
Example
or 1,563.1 < m < 1,628.9
32.9
1,563.1 1,596
1,628.9
32.9 is called the precision (E) of the interval
and is given by
E = za / 2
s
n
Interpretation
 Either
the mean is in the confidence interval
or it is not. A 90% confidence interval says
that if we construct 100 intervals, we would
expect 90 to contain the true mean m and 10
would not.
1,596
1,578
1,612
1,584
A Word on Confidence Int.
 Suppose
instead of a 90% confidence, we
wish to be 99% confident the mean is in the
interval. Then
100
1,596  2.575
25
1,596  51.5
A Word on Confidence Int.
 That
is, all we have done is increase the
interval so that we are more confident that
the true mean is in the interval.
32.9
1,563.1
1,596
90% Confidence
1,628.9
99% Confidence
51.5
1,544.5
1,596
1,647.5
Sample Sizes
 Suppose
we wish to compute a sample size
required in order to have a specified
precision. In this case, suppose we wish to
determine the sample size required in order
to estimate the true mean within + 20 hours.
Sample Sizes
 Recall
the precision is given by
E = za / 2
s
n
Solving for n gives
 za / 2s 

n=
 E 
2
Sample Sizes
 We
wish to determine the sample size
required in order to estimate the true mean
within + 20 hours with 90% confidence.
 1.645(100) 

n=
20


= 67.65 = 68
2
South Dakota
School of Mines & Technology
Estimation
Industrial Engineering
Estimation
Interval Estimates
(s unknown)
Industrial Engineering
Confidence Intervals (s unknown)
 Suppose
we do not know the true variance of
the population, but we can estimate it with
the sample variance.
n
s2 =
x
i =1
i
2
 nx 2
n 1
For large samples (>30), replace s2 with s2
and compute confidence interval as before.
Confidence Intervals (s unknown)
 For
small samples we need to replace the
standard normal, N(0,1) , with the tdistribution. Specifically,
xm
 t n 1
t=
s
n
Confidence Interval (s unknown)
xm
 t n 1
t=
s
n
tn-1
a/2
a/2
1-a
tn-1,a/2 0
tn-1,a/2
Confidence Interval (s unknown)
tn-1
a/2
a/2
1-a
tn-1,a/2 0
tn-1,a/2
xm
 t n 1,a / 2 )
1  a = P ( t n 1,a / 2 
s
n
Confidence Interval (s unknown)
xm
 t n 1,a / 2 )
1  a = P ( t n 1,a / 2 
s
n
Miracle 17b
occurs
x  t n 1,a / 2
s
n
Example
 Suppose
in our light bulb example, we wish
to estimate an interval for the mean with 90%
confidence. A sample of 25 bulbs yields a
sample mean of 1,596 and a sample variance
of 10,000.
x  t n 1,a / 2
s
n
1,596  1.711
100
25
Example
 Suppose
in our light bulb example, we wish
to estimate an interval for the mean with 90%
confidence. A sample of 25 bulbs yields a
sample mean of 1,596 and a sample variance
of 10,000.
x  t n 1,a / 2
s
n
1,596  1.711
100
25
1,596 + 34.2
Example
 Note
that lack of knowledge of s gives a
slightly bigger confidence interval (we know
less, therefore we feel less confident about the
same size interval).
32.9
1,563.1
1,596
s known
1,628.9
s unknown
34.2
1,561.8
1,596
1,630.2
A Final Word
 Note
that on the t-distribution chart, as n
becomes larger,
t n 1,a / 2  za / 2
hence, for larger samples (n > 30) we can
replace the t-distribution with the standard
normal.
South Dakota
School of Mines & Technology
Estimation
Industrial Engineering
Estimation
Estimates for Proportions
Industrial Engineering
Estimating a Proportion
 Suppose
we sample 100 circuit boards and
find that 8 are defective. We would like to
make an inference about the true percentage
defective given a sample defective of p = 0.08.
Estimating a Proportion
 Suppose
we sample 100 circuit boards and
find that 8 are defective. We would like to
make an inference about the true percentage
defective given a sample defective of p = 0.08.
Recall that for a large sample (n>30) the
binomial may be approximated by the normal
distribution. We also know that the mean of
the binomial is np and the variance is npq.
x  N (np, npq )
Estimating a proportion
x  N (np, npq )
Now if pˆ =
then,
x
n
npˆ  N ( np, npq )
or npˆ  np  N (0,1)
npq
Estimating a Proportion
n pˆ  np
 N ( 0,1)
npq
Divide through by n and
replace pq by pˆ qˆ gives
pˆ  p
N (0,1)
pˆ qˆ / n
Estimating a Proportion
pˆ  p
N (0,1)
pˆ qˆ / n
pˆ  p
 za / 2 )
1  a = P( za / 2 
pˆ qˆ / n
Estimating a Proportion
pˆ  p
 za / 2 )
1  a = P( za / 2 
pˆ qˆ / n
Miracle 21c
occurs
pˆ  za / 2 pˆ qˆ / n
Example
 Returning
to our circuit board example,
suppose a sample of 100 boards yields 8%
defective. Compute a 90% confidence
interval for the true but unknown proportion
defective.
pˆ  za / 2 pˆ qˆ / n
0.08  1.645 .08(.92) / 100
Example
 Returning
to our circuit board example,
suppose a sample of 100 boards yields 8%
defective. Compute a 90% confidence
interval for the true but unknown proportion
defective.
0.035 < p < 0.125
South Dakota
School of Mines & Technology
Estimation
Industrial Engineering
Estimation
Interval Estimates
(variance)
Industrial Engineering
Estimator for a Variance
 Suppose
in a sample of 25 light bulds, we
compute a sample variance of 10,000. We
would now like to make an inference about
the true but unknown population variance s2.
If the underlying distribution is normal, then
the distribution of the sample variance is chisquare.
Estimator for a Variance
(n  1)
a/2
s
2
s2
c
c2n-1
2
n 1
a/2
1a
c2n-1,a/2
c2n-1,1-a/2
Estimator for Variance
1  a = P( c
2
n 1,a / 2
 (n  1)
s
2
s2
 c n21,1a / 2 )
Miracle 21c
occurs
(n  1) s 2
c n21,1a / 2
s 2 
(n  1) s 2
c n21,a / 2
Example
 Suppose
in our sample of 25 light bulbs we
compute a sample variance of 10,000.
Compute a 90% confidence for the true
variance.
Example
 Suppose
in our sample of 25 light bulbs we
compute a sample variance of 10,000.
Compute a 90% confidence for the true
variance.
(n  1) s 2
c n21,1a / 2
s 2 
(n  1) s 2
c n21,a / 2
24(10,000)
24(10,000)
2
s 
36.415
13.484
Example
24(10,000)
24(10,000)
2
s 
36.415
13.484
6,591 < s2 < 17,799
Example
24(10,000)
24(10,000)
2
s 
36.415
13.484
6,591 < s2 < 17,799
Note that the confidence interval for s2 is
not symmetric.
6,591
10,000
17,799
Summary
 To
make probabilistic statements about
m (s known)
N(0,1)
m (s unknown)
tn-1
normal
m (s unknown)
N(0,1)
n >> 30
s2
c
2
n 1
normal
s12 given s22
Fn1-1,n2-1
normal
p
N(0,1)
n >> 30
given
s2
South Dakota
School of Mines & Technology
Estimation
Industrial Engineering
Estimation
Method of Moments
Industrial Engineering
Method of Moments
 Recall
from Data Analysis, we had three
measures for failure time data
s2 = 302.76
20
Frequency
X = 19.1
Power Supply Failure Times
15
10
5
0
010
1020
2030
3040
4050
Time Class
5060
6070
7080
Method of Moments
 For
Failure Time data, we now have three
measures for the data
s2 = 302.76
Exponential ??
20
Frequency
X = 19.1
Power Supply Failure Times
15
10
5
0
010
1020
2030
3040
4050
Time Class
5060
6070
7080
Method of Moments
 Recall
that for the exponential distribution
m = 1/l
=
X 19.1
s2 = 1/l2
s2 = 302.76
If E[ X ] = m and E [s2 ] = s2, then
1/l = 19.1
lˆ = .0524
1/l2 = 302.76
lˆ = .0575
or
Estimation
Maximum Likelihood
Estimates
Industrial Engineering
Discrete Case
 Suppose
we have hypothesized a discrete
distribution from which our data which has
some unknown parameter q . Let pq ( x )
denote the probability mass function for this
distribution. The likelihood function is
L(q ) = pq ( x1 ) pq ( x2 )    pq ( xn )
Discrete Case
 Since
L(q )
is just the joint probability, we
want to choose some q$ which maximizes this
joint probability mass function.
L(q$ )  L(q )
for all possible q
Continuous Case
 Suppose
we have a set of nine observations
x1, x2, . . . X9 which have underlying
distribution exponential (in this case scale
parameter l = 2.0).
0.053
0.112
0.178
0.255
0.347
0.458
0.602
0.805
1.151
Continuous Case
 Suppose
we have a set of nine observations
x1, x2, . . . X9 which have underlying
distribution exponential (in this case scale
parameter l = 2.0). Our object is to estimate
the true but unknown parameter l.
L( l ) = f l ( x1 ) f l ( x2 )    f l ( xn )
MLE (Exponential)
L( l ) = f l ( x1 ) f l ( x2 )    f l ( xn )
= le
 lx1
n l
=l e
 le
 xi
 lx2
   le
 lxn
MLE (Exponential)
Likelihood
0.25
L(lambda)
0.20
0.15
0.10
0.05
0.00
0.0
1.0
2.0
lambda
3.0
4.0
MLE (Exponential)
 We
can use the plot to graphically solve for
the best estimate of l. Alternatively, we can
find the maximum analytically by using
calculus. Specifically,
L(l )
=0
 (l )
Log Likelihood
 The
natural log is a monotonically increasing
function. Consequently, maximizing the log
of the likelihood function is the same as
maximizing the likelihood function itself.
Ln(q ) = LN L(q )
MLE (Exponential)
L( l ) = f l ( x1 ) f l ( x2 )    f l ( xn )
n l
=l e
 xi
Ln( l ) = nln ( l )  l  xi
MLE (Exponential)
Ln( l ) = nln ( l )  l  xi
 Ln( l )
 ln( l )
=n
  xi
l
l
=
n
l
  xi
=0
MLE (Exponential)
n
l
n
l
  xi = 0
=  xi
MLE (Exponential)
i
1
2
3
4
5
6
7
8
9
Sum =
X-bar =
Xi
0.053
0.112
0.178
0.255
0.347
0.458
0.602
0.805
1.151
3.961
0.440
l$ = 1
x
Experimental Data
 Suppose
we wish to make some estimates on
time to fail for a new power supply. 40 units
are randomly selected and tested to failure.
Failure times are recorded follow:
2.7
6.4
13.9
34.9
14.9
7.1
11.1
3.8
25.8
18.3
32.2
21.0
24.1
59.9
2.1
51.8
19.6
41.6
27.7
10.2
1.0
9.4
16.0
1.6
4.5
5.8
5.1
46.1
29.8
12.9
22.5
17.1
0.5
73.8
12.0
37.9
3.3
7.9
8.6
14.7
X = 19.1
Failure Data
L(lambda)
Likelihood
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.00
0.05
lambda
0.10