Transcript Chapter 7

Chapter 7. Statistical Estimation
and Sampling Distributions
7.1
7.2
7.3
7.4
7.5
NIPRL
Point Estimates
Properties of Point Estimates
Sampling Distributions
Constructing Parameter Estimates
Supplementary Problems
7.1 Point Estimates
7.1.1 Parameters
• Parameters
– In statistical inference, the term parameter is used to
denote a quantity  , say, that is a property of an
unknown probability distribution.
– For example, the mean, variance, or a particular quantile
of the probability distribution
– Parameters are unknown, and one of the goals of
statistical inference is to estimate them.
NIPRL
NIPRL
Figure 7.1 The relationship between a point
estimate and an unknown parameter θ
NIPRL
Figure 7.2 Estimation of the population mean
by the sample mean
7.1.2 Statistics
• Statistics
– In statistical inference, the term statistics is used to
denote a quantity that is a property of a sample.
– Statistics are functions of a random sample. For example,
the sample mean, sample variance, or a particular
sample quantile.
– Statistics are random variables whose observed values
can be calculated from a set of observed data.
Examples :
sample mean X 
X1  X 2 
n
 Xn
2
(
X

X
)
 i1 i
n
sample variance S 2 
NIPRL
n 1
7.1.3 Estimation
• Estimation
– A procedure of “guessing” properties of the population
from which data are collected.
– A point estimate of an unknown parameter  is
a statistic ˆ that represents a “guess” at
the value of  .
• Example 1 (Machine breakdowns)
– How to estimate
P(machine breakdown due to operator misuse) ?
• Example 2 (Rolling mill scrap)
–
NIPRL
How to estimate the mean and variance of the probability
distribution of % scrap ( D% scrap ) ?
7.2 Properties of Point Estimates
7.2.1. Unbiased Estimates (1/5)
•
Definitions
- A point estimate
unbiased if
ˆ for a parameter 
is said to be
E (ˆ)  
- If a point estimate is not unbiased, then its bias is defined
to be
bias  E (ˆ)  
NIPRL
7.2.1. Unbiased Estimates (2/5)
•
Point estimate of a success probability
If X
-
NIPRL
X
B(n, p) , then pˆ 
n
B(n, p )  E ( X )  np,
X
1
1
hence E ( pˆ )  E ( )  E ( X )  np  p
n
n
n
it means that pˆ is anunbiased estimate
X
7.2.1. Unbiased Estimates(3/5)
•
Point estimate of a population mean
If X 1 , X 2 ,
, X n is a sample observations from a prob. dist.
with a mean  , then the sample mean ˆ  X is an unbiased
point estimate of the population mean .
-
since E ( X i )   , 1  i  n ,
n
1
1 n
1
E ( ˆ )  E ( X )  E ( X i )   E ( X i )  n  
n i 1
n i 1
n
NIPRL
7.2.1. Unbiased Estimates(4/5)
•
Point estimate of a population variance
If X 1 , X 2 ,
, X n is a sample observations from a prob. dist .
with a variance  2 , then the sample variance
2
(
X

X
)
 i1 i
n
ˆ 2  S 2 
n 1
is an unbiased point estimate of the population variance  2
NIPRL
7.2.1. Unbiased Estimates (5/5)
 i1 ( X i  X )2   i1 (( X i   )  ( X   ))2
n
n
  i 1 ( X i   ) 2  2( X   ) i 1 ( X i   )  n( X   ) 2
n
n
  i 1 ( X i   ) 2  n( X   ) 2
n
note E ( X i )   , E  ( X i   )2   Var ( X i )   2
E( X )   , E ( X   )


2
  Var ( X ) 
1
n
1
2
E (S ) 
E  i 1 ( X i  X ) 
E
n 1
n 1
  2 
1  n 2
2

  i 1  n     
n 1 
 n 
2
NIPRL

2
n
2
2
(
X


)

n
(
X


)
 i1 i
n

7.2.2. Minimum Variance Estimates (1/4)
• Which is the better of two
unbiased point estimates?
Probability density
function of ˆ
2
Probability density
function of ˆ
1

NIPRL
x
7.2.2. Minimum Variance Estimates (2/4)
SinceVar (ˆ1 Var (ˆ2 ),ˆ2 is a better point estimate thanˆ1.
This can be written
P(| ˆ1 | )P(| ˆ2  | )for any value of   0.
Probability density
function of ˆ
2
Probability density
function of ˆ
1
 
NIPRL

 
 

 
7.2.2. Minimum Variance Estimates (3/4)
•
An unbiased point estimate whose variance is smaller than any other
unbiased point estimate: minimum variance unbised estimate (MVUE)
•
Relative efficiency
•
The relative efficiency of an unbiased point estimate ˆ1
Var (ˆ2 )
ˆ
to an unbiased point estimate  2 is
Var (ˆ )
1
Mean squared error (MSE)


2
MSE
(

)

E
(



)
How is it decomposed ?
–
– Why is it useful ?
NIPRL
7.2.2. Minimum Variance Estimates (4/4)
Probability density
function of ˆ
Probability density
function of ˆ
2
1

ˆ1
Bias of ˆ1
NIPRL
ˆ2
x
Bias of ˆ2
Example: two independent measurements
XA
N (C, 2.97) \ and \ X B
N (C,1.62)
Point estimates of the unknown C
C A  X A \ and \ C B  X B
They are both unbiased estimates since
E[C A ]  C \ and \ E[C B ]  C.
The relative efficiency of C A to C B is
Var (C B ) / Var (C A )  1.62 / 2.97  0.55.
Let us consider a new estimate
C  pC A  (1  p)C B .
Then, this estimate is unbiased since
E[C]  pE[C A ]  (1  p) E[C B ]  C.
What is the optimal value of p that results in C having
the smallest possible mean square error (MSE)?
NIPRL
Let the variance of C be given by
Var (C )  p 2Var (C A )  (1  p) 2Var (C B )
 p 2 12  (1  p) 2  2 2 .
Differentiating with respect to p yields that
d
Var (C )  2 p 12  2(1  p) 2 2 .
dp
The value of p that minimizes Var (C ) :
1/  12
d
Var (C ) | p  p  0  p 
dp
1/  12  1/  22
Therefore, in this example,
1/ 2.97
p
 0.35.
1/ 2.97  1/1.62
The variance of
1
Var (C ) 
 1.05.
1/ 2.97  1/1.62
NIPRL
The relative efficiency of
Var (C )
C B to C is
1.05

 0.65.
Var (C B ) 1.62
In general, assuming that we have n independent and unbiased
2
estimates  i , i  1, , n having variance  i , i  1, , n
respectively for a parameter  , we can set the unbiased
n
estimator as
2


i
i 1
n
/i
2
1/

 i
.
i 1
The variance of this estimator is
Var ( ) 
1
n
2
1/

 i
i 1
NIPRL
.
Mean square error (MSE):
Let us consider a point estimate  .
Then, the mean square error is defined by
MSE( )  E[(   )2 ].
Moreover, notice that
MSE ( )  E[(   ) 2 ]
 E[(  E[ ]  E[ ]   ) 2 ]
 E[(  E[ ]) 2  2(  E[ ])( E[ ]   )  ( E[ ]   ) 2
 E[(  E[ ]) 2 ]  ( E[ ]   ) 2
 Var ( )  bias 2
NIPRL
7.3 Sampling Distribution
7.3.1 Sample Proportion (1/2)
•
If X
X
has
n
p(1  p) 

N  p,

n


B(n, p), then the sample propotion pˆ 
the approximate distribution pˆ
E ( pˆ )  p,
X
p(1  p)
Var ( X )  np(1  p)  Var ( pˆ )  Var ( ) 
n
n
NIPRL
7.3.1 Sample Proportion (2/2)
• Standard error of the sample mean
Thestandard error of the sample mean is defined as
p (1  p )
, but since p is usually unknown,
n
wereplace p by the observed value pˆ  x / nto have
s.e.( pˆ )  pˆ (1  p̂) 1 x(n  x)

n
n
n
s.e.( pˆ ) 
NIPRL
7.3.2 Sample Mean (1/3)
• Distribution of Sample Mean
If X 1 ,
, X n are observation from a population with mean 
and variance  2 , then the Central Limit Theorem says,
ˆ  X
NIPRL
N ( ,
2
n
)
7.3.2 Sample Mean (2/3)
• Standard error of the sample mean
The standard error of the sample mean is defined as
s.e.( X ) 

, but since  is usually unknown,
n
wemaysafelyreplace  ,whennislarge,
by the observed value s,
s
s.e.( X ) 
n
NIPRL
7.3.2 Sample Mean (3/3)
If n  20, prob. that ˆ  X lies within  / 4 of 




2


P     X      P     N ( , )    
4
4
4
20
4



20
20 
 P  
 N (0,1) 
   (1.12)   (1.12)  0.7372
4
4 

Probability density function
74%
of ˆ  X when n  20
  / 4
NIPRL

  / 4
7.3.3 Sample Variance (1/2)
• Distribution of Sample Variance
, X n normally distributed with mean  and
If X 1 ,
variance  2 , then the sample variance S 2 has the distribution
S
NIPRL
2

2
 n21
(n  1)
Theorem: if X i , i  1, , n is a sample from a normal
2
population having mean  and variance  , then X \ and \ S 2
are independent random variables, with X being normal
with mean  and variance  2 / n and (n  1) S 2 /  2 being
chi-square with n-1 degrees of freedom.
(proof)
n
n
2
2
2
(Yi  Y )  Yi  nY
Let Yi  X i  . Then,
i 1
i 1
or equivalently,

n
n
i 1
i 1

2
2
2
(
X

X
)

(
X


)

n
(
X


)
.
 i
 i
Dividing this equation by 
n
n
i 1
i 1
2
, we get
2
2
2
((
X


)
/

)

((
X

X
)
/

)

(
n
(
X


)
/

)
.
 i
 i
Cf. Let X and Y be independent chi-square random variables
with m and n degrees of freedom respectively. Then,
Z=X+Y is a chi-square random variable with m+n degrees of
freedom.
NIPRL
In the previous equation,
n
2
(
n
(
X


)
/

)

i 1
n
2
((
X


)
/

)
 i
i 1
12 \ and
 n2 .
Therefore,
n
2
2
2
((
X

X
)
/

)

(
n

1)
S
/

 i
i 1
NIPRL
 n21.
7.3.3 Sample Variance (2/2)
• t-statistics
X
 2 
n
N  ,
(X  )
 
n 


And also
 n21
S

(n  1)
N (0,1)
so that
n
( X  )
n( X  ) 

S
S
 
 
NIPRL
N (0, 1)

2
n 1
(n  1)
  tn 1
7.4 Constructing Parameter Estimates
7.4.1 The Method of Moments (1/3)
• Method of moments point estimate for One
Parameter
If a data set of observations x1 ,
, xn from a probabilty
distribution that depends upon one unknown parameter  ,
the method of moments point estimate ˆ of the parameter
is found by solving the equation x  E ( X )
NIPRL
7.4.1 The Method of Moments (2/3)
• Method of moments point estimates for Two Parameters
For unknown two parameters 1 and  2 ,
The method of moments point estimates are found by solving
the equations x  E ( X ) and s 2  Var ( X )
NIPRL
7.4.1 The Method of Moments (3/3)
• Examples
- For example of normally distributed data, since E ( X )  
and Var ( X )   2 for N (  ,  2), the method of moments give
x  ˆ and s 2   2
- Suppose that the data observations 2.0 2.4 3.1 3.9 4.5
4.8 5.7 9.9 are obtained from a U (0,  ).
x  4.5375 and E ( X ) 

2
 ˆ  2  4.5375  9.075
– What if the distribution is exponential with the parameter
NIPRL
?
7.4.2 Maximum Likelihood Estimates (1/4)
• Maximum Likelihood Estimate for One Parameter
If a data set consists of observations x1 ,
, xn from a probability
distrbution f ( x,  ) depending upon one unknown parameter  ,
The maximum likelihood estimate ˆ of the parameter is found by
maximizing the likelihood function
L( x1 , , xn ,  )  f ( x1 ,  )   f ( xn ,  )
NIPRL
7.4.2 Maximum Likelihood Estimates (2/4)
• Example
- If x1 ,
, xn are a set of Bernoulli observations,
with f (1, p)  p and f (0, p)  1  p, i.e. f ( xi , p)  p xi (1  p)1 xi
- The likelihood function is
n
L( p; x1 ,
, xn )   p xi (1  p )1 xi  p x (1  p ) n  x ,
where x  x1 
i 1
 xn , 
and the m.l.e pˆ is the value that maximizes this.
- The log-likelihood is ln( L)  x ln( p)  ( n  x) ln(1  p)
d ln( L) x n  x
x
ˆ
and 
 
0  p
dp
p 1 p
n
NIPRL
7.4.2 Maximum Likelihood Estimates (3/4)
•
Maximum Likelihood Estimate for Two Parameters
For two unknown parameters 1 and  2 ,
the maximum likelihood estimates ˆ and ˆ
1
2
are found bymaximizing the likelihood function
L(1 ,  2 ;x1 , , xn )  f ( x1; 1 ,  2 )   f ( xn ; 1 ,  2 )
NIPRL
7.4.2 Maximum Likelihood Estimates (4/4)
• Example
The normal dist. f ( x,  ,  ) 
2
1
 ( x   )2 / 2 2
e
2
n
likelihood L( x1 ,
xn ,  ,  )   f ( xi ,  ,  2 )
 1 

2 
 2 
2
i 1
n/2
 n

exp  ( xi   ) 2 / 2 2 
 i 1

2
(
x


)
n

i
2
so that log-likelihood ln( L)   ln(2 )  i 1 2
2
2
n
 i 1 ( xi   )
2
(
x


)
d ln( L)
d ln( L)
n

i
i 1

,



d
2
d ( 2 )
2 2
2 4
setting two equations to zeros,
n
n
2
2
ˆ
(
x


)
(
x

x
)
ˆ  x ,


i
i
ˆ 2  i 1
 i 1
n
n
n
NIPRL
n
7.4.3 Examples (1/6)
• Glass Sheet Flaws
At a glass manufacturing company, 30 randomly selected sheets
of glass are inspected. If the dist. of the number of flaws per
sheet is taken to have a Poisson dist. How should thepara.  be estimated?
- The method of moment
E ( X )    ˆ  x
NIPRL
7.4.3 Examples (2/6)
• The maximum likelihood estimate:
e    xi
f ( xi ,  ) 
, so that the likelihood is
xi !
n
L( x1 ,
, xn ,  )  
i 1
e  n  ( x1   xn )
f ( xi ,  ) 
( x1 !   xn !)
The log-likelihood is therefore
ln( L)   n  ( x1   xn ) ln( )  ln( x1 ! 
 xn !)
( x1   xn )
d ln( L)
so that
 n 
 0  ˆ  x
d

NIPRL
7.4.3 Examples (3/6)
• Example 26: Fish Tagging and Recapture
Suppose fisherman wants to estimate the fish stock N of a lake
and that 34 fish have been tagged and released back into the lake.
If, over a period time, the fisherman catches 50 fish and 9 of them
are tagged, then an intuitive estimate of total number of fish is
34  50
Nˆ 
 189
9
this assumes that the proportion of tagged fish roughly equal to
the proportion of the fisherman's catch that is tagged.
NIPRL
7.4.3 Examples (4/6)
Under the assumption that all the fish are equally likely to be caught,
the dist. of the number of tagged fish X in the fisherman's catch of
50 fish is a hypergeometric dist. with r  34, n  50 and N unknown.
 r  N  r 
 

x  n  x 

hypergeometric P( X  x | N , n, r ) 
,
N
 
n
E( X ) 
nr 50  34

N
N
nr
Since there is only one obs. x and so x  x, equating E ( X ) 
x
N
50  34
ˆ
N
188.89
9
NIPRL
7.4.3 Examples (5/6)
Under binomial approximation, in this case success prob. p 
is estimated to be pˆ 
r
N
x 9
r 50  34
 , Hence Nˆ  
n 50
pˆ
9
• Example 36: Bee Colonies
The data on the proportion of workers bees that leave a colony with
a queen bee.
0.28 0.32 0.09 0.35 0.45 0.41 0.06 0.16 0.16 0.46 0.35 0.29 0.31.
If the entomologist wishes to model this proportion with a beta dist.,
How should the parameters be estimated?
NIPRL
7.4.3 Examples (6/6)
(a  b) a 1
x (1  x)b 1 , 0  x  1, a  0, b  0
(a)(b)
a
ab
E( X ) 
, Var ( X ) 
ab
(a  b) 2 (a  b  1)
beta f ( x | a, b) 
x  0.3007, s 2  0.01966
So the point estimates aˆ and bˆ are solution to the equations
a
ab
 0.3007 and
 0.01966
2
ab
(a  b) (a  b  1)
which are aˆ  2.92 and bˆ  6.78
NIPRL
MLE for U (0,  )
•
For some distribution, the MLE may not be found by differentiation.
You have to look at the curve of the likelihood function itself.
•
The MLE of   max {X1 ,
NIPRL
, X n}