DevStat8e_06_02

Download Report

Transcript DevStat8e_06_02

6
Point Estimation
Copyright © Cengage Learning. All rights reserved.
6.2
Methods of Point
Estimation
Copyright © Cengage Learning. All rights reserved.
Methods of Point Estimation
The definition of unbiasedness does not in general indicate
how unbiased estimators can be derived.
We now discuss two “constructive” methods for obtaining
point estimators: the method of moments and the method
of maximum likelihood.
By constructive we mean that the general definition of each
type of estimator suggests explicitly how to obtain the
estimator in any specific problem.
3
Methods of Point Estimation
Although maximum likelihood estimators are generally
preferable to moment estimators because of certain
efficiency properties, they often require significantly more
computation than do moment estimators.
It is sometimes the case that these methods yield unbiased
estimators.
4
The Method of Moments
5
The Method of Moments
The basic idea of this method is to equate certain sample
characteristics, such as the mean, to the corresponding
population expected values.
Then solving these equations for unknown parameter
values yields the estimators.
6
The Method of Moments
Definition
Let X1, . . . , Xn be a random sample from a pmf or pdf f(x).
For k = 1, 2, 3, . . . , the kth population moment, or kth
moment of the distribution f(x), is E(Xk). The kth sample
moment is
Thus the first population moment is E(X) = , and the first
sample moment is Xi/n =
The second population and sample moments are E(X2) and
Xi2/n, respectively. The population moments will be
functions of any unknown parameters 1, 2, . . . .
7
The Method of Moments
Definition
Let X1, X2, . . . , Xn be a random sample from a distribution
with pmf or pdf f(x; 1, . . . , m), where 1, . . . , m are
parameters whose values are unknown.
Then the moment estimators 1, . . . , m are obtained by
equating the first m sample moments to the corresponding
first m population moments and solving for 1, . . . , m.
8
The Method of Moments
If, for example, m = 2, E(X) and E(X2) will be functions of
1 and 2.
Setting E(X) = (1/n)  Xi (= ) and E(X2) = (1/n)  Xi2 gives
two equations in 1 and 2. The solution then defines the
estimators.
9
Example 12
Let X1, X2, . . . , Xn represent a random sample of service
times of n customers at a certain facility, where the
underlying distribution is assumed exponential with
parameter .
Since there is only one parameter to be estimated, the
estimator is obtained by equating E(X) to .
Since E(X) = 1/ for an exponential distribution, this gives
1/ = or  = 1/ . The moment estimator of  is then 
10
Maximum Likelihood Estimation
11
Maximum Likelihood Estimation
The method of maximum likelihood was first introduced by
R. A. Fisher, a geneticist and statistician, in the 1920s.
Most statisticians recommend this method, at least when
the sample size is large, since the resulting estimators
have certain desirable efficiency properties.
12
Example 15
A sample of ten new bike helmets manufactured by a
certain company is obtained. Upon testing, it is found that
the first, third, and tenth helmets are flawed, whereas the
others are not.
Let p = P(flawed helmet), i.e., p is the proportion of all such
helmets that are flawed.
Define (Bernoulli) random variables X1, X2, . . . , X10 by
13
Example 15
cont’d
Then for the obtained sample, X1 = X3 = X10 = 1 and the
other seven Xi’s are all zero.
The probability mass function of any particular Xi is
, which becomes p if xi = 1 and 1 – p when
xi = 0.
Now suppose that the conditions of various helmets are
independent of one another.
This implies that the Xi’s are independent, so their joint
probability mass function is the product of the individual
pmf’s.
14
Example 15
cont’d
Thus the joint pmf evaluated at the observed Xi’s is
f(x1, . . . , x10; p) = p(1 – p)p . . . p = p3(1 – p)7
(6.4)
Suppose that p = .25. Then the probability of observing the
sample that we actually obtained is (.25)3(.75)7 = .002086.
If instead p = .50, then this probability is
(.50)3(.50)7 = .000977. For what value of p is the obtained
sample most likely to have occurred? That is, for what
value of p is the joint pmf (6.4) as large as it can be? What
value of p maximizes (6.4)?
15
Example 15
cont’d
Figure 6.5(a) shows a graph of the likelihood (6.4) as a
function of p. It appears that the graph reaches its peak
above p = .3 = the proportion of flawed helmets in the
sample.
Graph of the likelihood (joint pmf)
(6.4) from Example 15
Figure 6.5(a)
16
Example 15
cont’d
Figure 6.5(b) shows a graph of the natural logarithm of
(6.4); since ln[g(u)] is a strictly increasing function of g(u),
finding u to maximize the function g(u) is the same as
finding u to maximize ln[g(u)].
Graph of the natural logarithm of the likelihood
Figure 6.5(b)
17
Example 15
cont’d
We can verify our visual impression by using calculus to
find the value of p that maximizes (6.4).
Working with the natural log of the joint pmf is often easier
than working with the joint pmf itself, since the joint pmf is
typically a product so its logarithm will be a sum.
Here
ln[ f (x1, . . . , x10; p)] = ln[p3(1 – p)7] = 3ln(p) + 7ln(1 – p)
(6.5)
18
Example 15
cont’d
Thus
[the (1) comes from the chain rule in calculus].
19
Example 15
cont’d
Equating this derivative to 0 and solving for p gives
3(1 – p) = 7p, from which 3 = 10p and so p = 3/10 = .30 as
conjectured.
That is, our point estimate is = .30. It is called the
maximum likelihood estimate because it is the parameter
value that maximizes the likelihood (joint pmf) of the
observed sample.
In general, the second derivative should be examined to
make sure a maximum has been obtained, but here this is
obvious from Figure 6.5.
20
Example 15
cont’d
Suppose that rather than being told the condition of every
helmet, we had only been informed that three of the ten
were flawed.
Then we would have the observed value of a binomial
random variable X = the number of flawed helmets.
The pmf of X is
For x = 3, this becomes
The binomial coefficient
is irrelevant to the
maximization, so again = .30.
21
Maximum Likelihood Estimation
The likelihood function tells us how likely the observed
sample is as a function of the possible parameter values.
Maximizing the likelihood gives the parameter values for
which the observed sample is most likely to have been
generated—that is, the parameter values that “agree most
closely” with the observed data.
22
Example 16
Suppose X1, X2, . . . , Xn is a random sample from an
exponential distribution with parameter . Because of
independence, the likelihood function is a product of the
individual pdf’s:
The natural logarithm of the likelihood function is
ln[ f (x1, . . . , xn ; )] = n ln() – xi
23
Example 16
cont’d
Equating (d/d)[ln(likelihood)] to zero results in
n/ – xi = 0, or  = n/xi =
Thus the mle is
it is identical to the method of
moments estimator [but it is not an unbiased estimator,
since
24
Example 17
Let X1, . . . , Xn be a random sample from a normal
distribution. The likelihood function is
so
25
Example 17
cont’d
To find the maximizing values of  and  2, we must take
the partial derivatives of ln(f ) with respect to  and  2,
equate them to zero, and solve the resulting two equations.
Omitting the details, the resulting mle’s are
The mle of  2 is not the unbiased estimator, so two
different principles of estimation (unbiasedness and
maximum likelihood) yield two different estimators.
26
Estimating Functions of
Parameters
27
Estimating Functions of Parameters
In Example 17, we obtained the mle of 2 when the
underlying distribution is normal.
The mle of  =
, as well as that of many other mle’s,
can be easily derived using the following proposition.
Proposition
The Invariance Principle
Let
be the mle’s of the parameters
1, 2, . . . , m.
Then the mle of any function h(1, 2, . . . , m) of these
parameters is the function h(
) of the mle’s.
28
Example 20
Example 17 continued…
In the normal case, the mle’s of  and 2 are
To obtain the mle of the function
substitute the mle’s into the function:
The mle of  is not the sample standard deviation S,
though they are close unless n is quite small.
29
Large Sample Behavior of the MLE
30
Large Sample Behavior of the MLE
Although the principle of maximum likelihood estimation
has considerable intuitive appeal, the following proposition
provides additional rationale for the use of mle’s.
Proposition
Under very general conditions on the joint distribution of the
sample, when the sample size n is large, the maximum
likelihood estimator of any parameter  is approximately
unbiased
and has variance that is either as small
as or nearly as small as can be achieved by any estimator.
Stated another way, the mle is approximately the MVUE
of .
31
Large Sample Behavior of the MLE
Because of this result and the fact that calculus-based
techniques can usually be used to derive the mle’s (though
often numerical methods, such as Newton’s method, are
necessary), maximum likelihood estimation is the most
widely used estimation technique among statisticians.
Many of the estimators used in the remainder of the book
are mle’s. Obtaining an mle, however, does require that the
underlying distribution be specified.
32