DATA ANALYSIS - DCU School of Computing

Transcript DATA ANALYSIS - DCU School of Computing

DATA ANALYSIS
Module Code: CA660
Lecture Block 6: Alternative
estimation methods and their implementation
MAXIMUM LIKELIHOOD ESTIMATION
• Recall general points: Estimation, definition of Likelihood
function for a vector of parameters  and set of values x.
Find most likely value of  = maximise the Likelihood fn.
Also defined Log-likelihood (Support fn. S() ) and its derivative,
the Score, together with Information content per observation,
which for single parameter likelihood is given by


2

2




 



I ( )  E   log L( x)    E 
log L( x)
 
  2

  
 


• Why MLE? (Need to know underlying distribution).
Properties: Consistency; sufficiency; asymptotic efficiency (linked
to variance); unique maximum; invariance and, hence most
convenient parameterisation; usually MVUE; amenable to
conventional optimisation methods.
2
VARIANCE, BIAS & CONFIDENCE

k
1
2
ˆ
i  
 k

k
• Variance of an Estimator - usual form or ˆ2 
i 1
i 1
for k independent estimates
• For a large sample, variance of MLE can be approximated by
ˆ2 

ˆ
i 

2
1
nI ( )
can also estimate empirically, using re-sampling* techniques.
• Variance of a linear function (of several estimates) – (common
need in genomics analysis, e.g. heritability), in risk analysis
• Recall Bias of the Estimator
E (ˆ)  
2
then the Mean Square Error is defined to be: MSE  E(ˆ  )
expands to E{[ˆ  E (ˆ)]  [ E (ˆ)   ]}2   2ˆ  [ E (ˆ)   ]2

so we have the basis for C.I. and tests of hypothesis.
3
COMMONLY-USED METHODS of obtaining
MLE
• Analytical - solving dL
 0 or dS d  0 when simple solutions
d
exist
• Grid search or likelihood profile approach
• Newton-Raphson iteration methods
• EM (expectation and maximisation) algorithm
N.B. Log.-likelihood, because max. same  value as Likelihood
Easier to compute
Close relationship between statistical properties of MLE
and Log-likelihood
4
MLE Methods in outline
Analytical : - recall Binomial example earlier
dS ( ) x n  x
Score 
 
0
d


x
ˆ
 
n
• Example : For Normal, MLE’s of mean and variance, (taking
derivatives w.r.t mean and variance separately), and equivalent
to sample mean and actual variance (i.e. /N),
- unbiased if mean known, biased if not.
• Invariance : One-to-one relationships preserved
• Used: when MLE has a simple solution
5
MLE Methods in outline contd.
Grid Search – Computational
Plot likelihood or log-likelihood vs parameter. Various features
• Relative Likelihood =Likelihood/Max. Likelihood (ML set =1).
Peak of R.L. can be visually identified /sought algorithmically. e.g.
S ( )  Log[ 20 (1  )80   80 (1  )20 ]
Plot likelihood and parameter space range 0    1 - gives 2 peaks,
symmetrical around ˆ  0.2 ( likelihood profile for e.g. well-known
mixed linkage analysis problem. Or for similar example of
populations following known proportion splits).
If now constrain 0    0.5 MLE solution unique e.g. ˆ  0.5 =
R.F. between genes (possible mixed linkage phase).
6
MLE Methods in outline contd.
• Graphic/numerical Implementation - initial estimate of . Direction
of search determined by evaluating likelihood to both sides of .
Search takes direction giving increase, because looking for max.
Initial search increments large, e.g. 0.1, then when likelihood
change starts to decrease or become negative, stop and refine
increment.
Issues:
• Multiple peaks – can miss global maximum, computationally
intensive ; see e.g.
http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_1.html
• Multiple Parameters - grid search. Interpretation of Likelihood
profiles can be difficult, e.g.
http://blogs.sas.com/content/iml/2011/10/12/maximumlikelihood-estimation-in-sasiml/
7
Example in outline
• Data e.g used to show a linkage relationship (non-independence) between
e.g. marker and a given disease gene, or (e.g. between sex and purchase) of
computer games.
Escapes = individuals who are susceptible, but show no disease phenotype
under experimental conditions: (express interest but no purchase record).
So define  ,  as proportion of escapes and R.F. respectively.
1   is penetrance for disease trait or of purchasing, i.e.
P{ that individual with susceptible genotype has disease phenotype}.
P{individual of given sex and interested who actually buys}
Purpose of expt.-typically to estimate R.F. between marker and gene or
proportion of a sex that purchases
• Use: Support function = Log-Likelihood. Often quite complex, e.g. for above
example, might have
S ( ,  )  k1 ln(1    )  k2 ln(  )  k3 (    )  k4 ln(1      )
8
Example contd.
• Setting 1st derivatives (Scores) w.r.t   0 and w.r.t.   0
• Expected value of Score (w.r.t.  is zero, (see analogies in classical
sampling/hypothesis testing). Similarly for . Here, however, No simple
analytical solution, so can not solve directly for either.
• Using grid search, likelihood reaches maximum at e.g. ˆ  0.02, ˆ  0.22
• In general, this type of experiment tests H0: Independence between the
factors (marker and gene), (sex and purchase) (  0.5)
• and H0: no escapes (   0)
Uses Likelihood Ratio Test statistics. (M.L.E. 2 equivalent)
9
MLE Methods in outline contd.
Newton-Raphson Iteration
Have Score () = 0 from previously. N-R consists of replacing Score by linear
terms of its Taylor expansion, so if ´´ a solution,  ´=1st guess
dS( ) dS( )
d 2 [ S ( )]

 (    )
0
2
d
d
d
d [ S ( )] d
      2
d S ( ) d 2
Repeat with  ´´ replacing ´
Each iteration - fits a parabola to
Likelihood Fn.
L.F.
2nd
• Problems - Multiple peaks, zero Information, extreme estimates
• Multiple parameters – need matrix notation, where S matrix e.g. has
elements = derivatives of S(, ) w.r.t.  and  respectively. Similarly,
Information matrix has terms of form
2
2
 1st


 

 E  2 S ( ,  )  E 
S ( ,  )etc.
 

 

 Estimates are
      1 1
        N I ( )S ( )
   
Variance of Log-L
i.e.S()
10
MLE Methods in outline contd.
Expectation-Maximisation Algorithm - Iterative. Incomplete data
(Much genomic, financial and other data fit this situation e.g. linkage analysis
with marker genotypes of F2 progeny. Usually 9 categories observed for 2locus, 2-allele model, but 16 = complete info., while 14 give info. on linkage.
Some hidden, but if linkage parameter known, expected frequencies can be
predicted and the complete data restored using expectation).
• Steps: (1) Expectation estimates statistics of complete data, given
observed incomplete data.
• -(2) Maximisation uses estimated complete data to give MLE.
• Iterate till converges (no further change)
11
E-M contd.
Implementation
• Initial guess, ´, chosen (e.g. =0.25 say = R.F.).
• Taking this as “true”, complete data is estimated, by distributional
statements e.g. P(individual is recombinant, given observed genotype) for
R.F. estimation.
• MLE estimate ´´ computed.
• This, for R.F.  sum of recombinants/N.
• Thus MLE, for fi observed count,
  
Convergence ´´ = ´ or
1
N
 f P ( R G)
i i
     tolerance(0.00001)
12
LIKELIHOOD : C.I. and H.T.
• Likelihood Ratio Tests – c.f. with 2.
• Principal Advantage of G is Power, as unknown parameters involved in
hypothesis test.
Have : Likelihood of  taking a value A which maximises
it, i.e. its MLE and likelihood  under H0 : N , (e.g. N = 0.5)
• Form of L.R. Test Statistic
 L( N x) 
 L( A x) 
G  2 Log 
G  2 Log 


or,
conventionally
L
(

x
)
L
(

x
)


A



N
- choose; easier to interpret.
• Distribution of G ~ approx. 2 (d.o.f. = difference in dimension of
parameter spaces for L(A), L(N) )
n
O
• Goodness of Fit : notation as for 2 , G ~ 2n-1 :
G2
Oi Log i
Ei
i 1
O Log E
r
c
• Independence: G  2
Oij
ij
i 1
j 1
ij

notation again as for 2
13
Likelihood C. I.’s – graphical method
• Example: Consider the following Likelihood function L( )  (1   )a  b
 is the unknown parameter ; a, b observed counts
• For 4 data sets observed,
A: (a,b) = (8,2), B: (a,b)=(16,4) C: (a,b)=(80, 20) D: (a,b) = (400, 100)
• Likelihood estimates can be plotted vs possible parameter values, with
MLE = peak value.
e.g. MLE = 0.2, Lmax=0.0067 for A, and Lmax=0.0045 for B etc.
Set A: Log Lmax- Log L=Log(0.0067) - Log(0.00091)= 2 gives  95% C.I.
so  =(0.035,0.496) corresponding to L=0.00091,  95% C.I. for A.
Similarly, manipulating this expression, Likelihood value corresponding to 
95% confidence interval given as L = (7.389)-1 Lmax
Note: Usually plot Log-likelihood vs parameter, rather than Likelihood.
As sample size increases, C.I. narrower and  symmetric
14
Maximum Likelihood Benefits
• Strong estimator properties – sufficiency, efficiency, consistency,
non-bias etc. as before
• Good Confidence Intervals
Coverage probability realised and intervals meaningful
• MLE Good estimator of a CI
2
MSE consistent Lim n E (ˆ   )  0
Absence of Bias E (ˆ)  
- does not “stand-alone” – minimum variance important
ˆ  
~ N (0, 1) as n  
Asymptotically Normal
 ˆ
Precise – large sample
Inferences valid, ranges realistic
15

DATA ANALYSIS - DCU School of Computing

Transcript DATA ANALYSIS - DCU School of Computing

Directory