DATA ANALYSIS - DCU School of Computing
Download
Report
Transcript DATA ANALYSIS - DCU School of Computing
DATA ANALYSIS
Module Code: CA660
Lecture Block 6: Alternative
estimation methods and their implementation
MAXIMUM LIKELIHOOD ESTIMATION
• Recall general points: Estimation, definition of Likelihood
function for a vector of parameters and set of values x.
Find most likely value of = maximise the Likelihood fn.
Also defined Log-likelihood (Support fn. S() ) and its derivative,
the Score, together with Information content per observation,
which for single parameter likelihood is given by
2
2
I ( ) E log L( x) E
log L( x)
2
• Why MLE? (Need to know underlying distribution).
Properties: Consistency; sufficiency; asymptotic efficiency (linked
to variance); unique maximum; invariance and, hence most
convenient parameterisation; usually MVUE; amenable to
conventional optimisation methods.
2
VARIANCE, BIAS & CONFIDENCE
k
1
2
ˆ
i
k
k
• Variance of an Estimator - usual form or ˆ2
i 1
i 1
for k independent estimates
• For a large sample, variance of MLE can be approximated by
ˆ2
ˆ
i
2
1
nI ( )
can also estimate empirically, using re-sampling* techniques.
• Variance of a linear function (of several estimates) – (common
need in genomics analysis, e.g. heritability), in risk analysis
• Recall Bias of the Estimator
E (ˆ)
2
then the Mean Square Error is defined to be: MSE E(ˆ )
expands to E{[ˆ E (ˆ)] [ E (ˆ) ]}2 2ˆ [ E (ˆ) ]2
so we have the basis for C.I. and tests of hypothesis.
3
COMMONLY-USED METHODS of obtaining
MLE
• Analytical - solving dL
0 or dS d 0 when simple solutions
d
exist
• Grid search or likelihood profile approach
• Newton-Raphson iteration methods
• EM (expectation and maximisation) algorithm
N.B. Log.-likelihood, because max. same value as Likelihood
Easier to compute
Close relationship between statistical properties of MLE
and Log-likelihood
4
MLE Methods in outline
Analytical : - recall Binomial example earlier
dS ( ) x n x
Score
0
d
x
ˆ
n
• Example : For Normal, MLE’s of mean and variance, (taking
derivatives w.r.t mean and variance separately), and equivalent
to sample mean and actual variance (i.e. /N),
- unbiased if mean known, biased if not.
• Invariance : One-to-one relationships preserved
• Used: when MLE has a simple solution
5
MLE Methods in outline contd.
Grid Search – Computational
Plot likelihood or log-likelihood vs parameter. Various features
• Relative Likelihood =Likelihood/Max. Likelihood (ML set =1).
Peak of R.L. can be visually identified /sought algorithmically. e.g.
S ( ) Log[ 20 (1 )80 80 (1 )20 ]
Plot likelihood and parameter space range 0 1 - gives 2 peaks,
symmetrical around ˆ 0.2 ( likelihood profile for e.g. well-known
mixed linkage analysis problem. Or for similar example of
populations following known proportion splits).
If now constrain 0 0.5 MLE solution unique e.g. ˆ 0.5 =
R.F. between genes (possible mixed linkage phase).
6
MLE Methods in outline contd.
• Graphic/numerical Implementation - initial estimate of . Direction
of search determined by evaluating likelihood to both sides of .
Search takes direction giving increase, because looking for max.
Initial search increments large, e.g. 0.1, then when likelihood
change starts to decrease or become negative, stop and refine
increment.
Issues:
• Multiple peaks – can miss global maximum, computationally
intensive ; see e.g.
http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_1.html
• Multiple Parameters - grid search. Interpretation of Likelihood
profiles can be difficult, e.g.
http://blogs.sas.com/content/iml/2011/10/12/maximumlikelihood-estimation-in-sasiml/
7
Example in outline
• Data e.g used to show a linkage relationship (non-independence) between
e.g. marker and a given disease gene, or (e.g. between sex and purchase) of
computer games.
Escapes = individuals who are susceptible, but show no disease phenotype
under experimental conditions: (express interest but no purchase record).
So define , as proportion of escapes and R.F. respectively.
1 is penetrance for disease trait or of purchasing, i.e.
P{ that individual with susceptible genotype has disease phenotype}.
P{individual of given sex and interested who actually buys}
Purpose of expt.-typically to estimate R.F. between marker and gene or
proportion of a sex that purchases
• Use: Support function = Log-Likelihood. Often quite complex, e.g. for above
example, might have
S ( , ) k1 ln(1 ) k2 ln( ) k3 ( ) k4 ln(1 )
8
Example contd.
• Setting 1st derivatives (Scores) w.r.t 0 and w.r.t. 0
• Expected value of Score (w.r.t. is zero, (see analogies in classical
sampling/hypothesis testing). Similarly for . Here, however, No simple
analytical solution, so can not solve directly for either.
• Using grid search, likelihood reaches maximum at e.g. ˆ 0.02, ˆ 0.22
• In general, this type of experiment tests H0: Independence between the
factors (marker and gene), (sex and purchase) ( 0.5)
• and H0: no escapes ( 0)
Uses Likelihood Ratio Test statistics. (M.L.E. 2 equivalent)
9
MLE Methods in outline contd.
Newton-Raphson Iteration
Have Score () = 0 from previously. N-R consists of replacing Score by linear
terms of its Taylor expansion, so if ´´ a solution, ´=1st guess
dS( ) dS( )
d 2 [ S ( )]
( )
0
2
d
d
d
d [ S ( )] d
2
d S ( ) d 2
Repeat with ´´ replacing ´
Each iteration - fits a parabola to
Likelihood Fn.
L.F.
2nd
• Problems - Multiple peaks, zero Information, extreme estimates
• Multiple parameters – need matrix notation, where S matrix e.g. has
elements = derivatives of S(, ) w.r.t. and respectively. Similarly,
Information matrix has terms of form
2
2
1st
E 2 S ( , ) E
S ( , )etc.
Estimates are
1 1
N I ( )S ( )
Variance of Log-L
i.e.S()
10
MLE Methods in outline contd.
Expectation-Maximisation Algorithm - Iterative. Incomplete data
(Much genomic, financial and other data fit this situation e.g. linkage analysis
with marker genotypes of F2 progeny. Usually 9 categories observed for 2locus, 2-allele model, but 16 = complete info., while 14 give info. on linkage.
Some hidden, but if linkage parameter known, expected frequencies can be
predicted and the complete data restored using expectation).
• Steps: (1) Expectation estimates statistics of complete data, given
observed incomplete data.
• -(2) Maximisation uses estimated complete data to give MLE.
• Iterate till converges (no further change)
11
E-M contd.
Implementation
• Initial guess, ´, chosen (e.g. =0.25 say = R.F.).
• Taking this as “true”, complete data is estimated, by distributional
statements e.g. P(individual is recombinant, given observed genotype) for
R.F. estimation.
• MLE estimate ´´ computed.
• This, for R.F. sum of recombinants/N.
• Thus MLE, for fi observed count,
Convergence ´´ = ´ or
1
N
f P ( R G)
i i
tolerance(0.00001)
12
LIKELIHOOD : C.I. and H.T.
• Likelihood Ratio Tests – c.f. with 2.
• Principal Advantage of G is Power, as unknown parameters involved in
hypothesis test.
Have : Likelihood of taking a value A which maximises
it, i.e. its MLE and likelihood under H0 : N , (e.g. N = 0.5)
• Form of L.R. Test Statistic
L( N x)
L( A x)
G 2 Log
G 2 Log
or,
conventionally
L
(
x
)
L
(
x
)
A
N
- choose; easier to interpret.
• Distribution of G ~ approx. 2 (d.o.f. = difference in dimension of
parameter spaces for L(A), L(N) )
n
O
• Goodness of Fit : notation as for 2 , G ~ 2n-1 :
G2
Oi Log i
Ei
i 1
O Log E
r
c
• Independence: G 2
Oij
ij
i 1
j 1
ij
notation again as for 2
13
Likelihood C. I.’s – graphical method
• Example: Consider the following Likelihood function L( ) (1 )a b
is the unknown parameter ; a, b observed counts
• For 4 data sets observed,
A: (a,b) = (8,2), B: (a,b)=(16,4) C: (a,b)=(80, 20) D: (a,b) = (400, 100)
• Likelihood estimates can be plotted vs possible parameter values, with
MLE = peak value.
e.g. MLE = 0.2, Lmax=0.0067 for A, and Lmax=0.0045 for B etc.
Set A: Log Lmax- Log L=Log(0.0067) - Log(0.00091)= 2 gives 95% C.I.
so =(0.035,0.496) corresponding to L=0.00091, 95% C.I. for A.
Similarly, manipulating this expression, Likelihood value corresponding to
95% confidence interval given as L = (7.389)-1 Lmax
Note: Usually plot Log-likelihood vs parameter, rather than Likelihood.
As sample size increases, C.I. narrower and symmetric
14
Maximum Likelihood Benefits
• Strong estimator properties – sufficiency, efficiency, consistency,
non-bias etc. as before
• Good Confidence Intervals
Coverage probability realised and intervals meaningful
• MLE Good estimator of a CI
2
MSE consistent Lim n E (ˆ ) 0
Absence of Bias E (ˆ)
- does not “stand-alone” – minimum variance important
ˆ
~ N (0, 1) as n
Asymptotically Normal
ˆ
Precise – large sample
Inferences valid, ranges realistic
15