MLE-EM2 - TIGP Bioinformatics Program
Download
Report
Transcript MLE-EM2 - TIGP Bioinformatics Program
Maximum Likelihood Estimates
and the EM Algorithms II
Henry Horng-Shing Lu
Institute of Statistics
National Chiao Tung University
[email protected]
http://tigpbp.iis.sinica.edu.tw/courses.htm
1
Part 1
Computation Tools
2
Include Functions in R
source(“file path”)
Example
In MME.R:
In R:
3
Part 2
Motivation Examples
4
Example 1 in Genetics (1)
Two linked loci with alleles A and a, and B
and b
A, B: dominant
a, b: recessive
A double heterozygote AaBb will produce
gametes of four types: AB, Ab, aB, ab
A
A
A
a
B
A
a
B
b
a
B
b
b
b
a
B
F ( Female)
1- r’
r’ (female recombination fraction)
M (Male)
1-r
r (male recombination fraction)
5
Example 1 in Genetics (2)
r and r’ are the recombination rates for
male and female
Suppose the parental origin of these
heterozygote is from the mating
of AABB aabb. The problem is to estimate r
and r’ from the offspring of selfed
heterozygotes.
Fisher, R. A. and Balmukand, B. (1928). The
estimation of linkage from the offspring of
selfed heterozygotes. Journal of Genetics, 20,
79–92.
http://en.wikipedia.org/wiki/Genetics
http://www2.isye.gatech.edu/~brani/isyebayes/ba
nk/handout12.pdf
6
Example 1 in Genetics (3)
MALE
F
E
M
A
L
E
AB
(1-r)/2
ab
(1-r)/2
aB
r/2
Ab
r/2
AB
(1-r’)/2
AABB
(1-r) (1-r’)/4
aABb
(1-r) (1-r’)/4
aABB
r (1-r’)/4
AABb
r (1-r’)/4
ab
(1-r’)/2
AaBb
(1-r) (1-r’)/4
aabb
(1-r) (1-r’)/4
aaBb
r (1-r’)/4
Aabb
r (1-r’)/4
aB
r’/2
AaBB
(1-r) r’/4
aabB
(1-r) r’/4
aaBB
r r’/4
AabB
r r’/4
Ab
r’/2
AABb
(1-r) r’/4
aAbb
(1-r) r’/4
aABb
r r’/4
AAbb
r r’/4
7
Example 1 in Genetics (4)
Four distinct phenotypes: A*B*, A*b*, a*B* and
a*b*.
A*: the dominant phenotype from (Aa, AA, aA).
a*: the recessive phenotype from aa.
B*: the dominant phenotype from (Bb, BB, bB).
b* : the recessive phenotype from bb.
A*B*: 9 gametic combinations.
A*b*: 3 gametic combinations.
a*B*: 3 gametic combinations.
a*b*: 1 gametic combination.
Total: 16 combinations.
8
Example 1 in Genetics (5)
Let (1 r )(1 r '), then
2
P( A * B*)
,
4
1
P( A * b*) P(a * B*)
,
4
P(a * b*)
4
.
9
Example 1 in Genetics (6)
Hence, the random sample of n from the offspring
of selfed heterozygotes will follow a multinomial
distribution:
2 1 1
Multinomial n;
,
,
, .
4
4
4 4
We know that (1 r )(1 r '), 0 r 1/ 2, and 0 r ' 1/ 2.
So, 1 1/ 4.
10
Example 1 in Genetics (7)
Suppose that we observe the data of
y = (y1, y2, y3, y4) = (125, 18, 20, 24),
which is a random sample from
2 1 1
Multinomial n;
,
,
, .
4
4
4 4
Then the probability mass function is
n!
2 y1 1 y2 y3 y4
g ( y, )
(
) (
)
( ) .
y1 ! y2 ! y3 ! y4 ! 4
4
4
11
Maximum Likelihood Estimate (MLE)
Likelihood:
Maximize likelihood: Solve the score
equations, which are setting the first
derivates of likelihood to be zeros.
Under regular conditions, the MLE is
consistent, asymptotic efficient and
normal!
More:
http://en.wikipedia.org/wiki/Maximum_lik
elihood
12
MLE for Example 1 (1)
n!
2 y1 1 y2 y3 y4
L( )
(
) (
)
( )
y1 ! y2 ! y3 ! y4 ! 4
4
4
Likelihood
MLE:
ˆMLE max L( ) ˆMLE max log L( )
( ) logL( )
log(
n!
2
1
) y log(
) ( y2 y3 ) log(
) y4 log( )
y1 ! y2 ! y3 ! y4 !
4
4
4
y y3 y4
y
d
log L( ) 1 2
0
d
2 1
( y1 y2 y3 y4 ) 2 ( y1 2 y2 2 y3 y4 ) 2 y4 0
A
B
C
13
MLE for Example 1 (2)
MLE
B B2 4 AC
2A
Checking:
d ( )
2
d ˆ
2
(1)
0?
MLE
(2)
1/ 4 ˆMLE 1?
(3)
Compare log L(ˆMLE )?
14
Part 3
Numerical Solutions for
the Score Equations of MLEs
15
A Banach Space
A Banach space B is a vector space over
the field K such that
1. x 0 x B.
2.
x 0 x B x 0.
3.
rx r x r K ,x B.
4.
x y x y x, y B.
5.
Every Cauchy sequence x ( k ) of B
converges in B (i.e., B is complete).
(http://en.wikipedia.org/wiki/Banach_space)
16
Lipschitz Continuous
A closed subset A B and mapping
F : A A.
1.
2.
F is Lipschitz continuous on A with
0 L if F ( x) F ( y) L x y .
F is a contraction mapping on A if F is
Lipschitz continuous and 0 L 1.
(http://en.wikipedia.org/wiki/Lipschitz_continuous)
17
Fixed Point Theorem
If F is a contraction mapping on A if F is
Lipschitz continuous and 0 L 1.
1. F has an unique fixed point s A such
that F ( s ) s.
(0)
x
A , x( k ) F ( x( k -1) ), k=1,2,…
2. initial
x
3.
k
s.
k -t
L
(k )
sx
x(t 1) x(t ) , 0 t k.
1- L
(k )
(http://en.wikipedia.org/wiki/Fixed-point_theorem)
(http://www.math-linux.com/spip.php?article60)
18
Applications for MLE (1)
log L( )
( )
0, i.e.
0 ( ) ' 0 0
F ( ) ( ) ' is a contraction mapping
(0) : initial
(1)
F ( ) '( )
(0)
(0)
(0)
( k ) F ( ( k 1) ) '( ( k 1) ) ( k 1)
k
Then, ( k ) s s.t. F ( s) s
19
Applications for MLE (2)
?
lim
yx
F ( x) F ( y )
x y
F '( x) ''( x) 1 L 1.
Optimal ?
20
Parallel Chord Method (1)
Parallel chord method is also called simple
iteration.
(k) ( k 1) '( ( k 1) )
0 '( ( k 1) )
1
(k )
( k 1)
21
Parallel Chord Method (2)
s
'( (0) )
'( (1) )
(0)
(1)
(2)
22
Plot the Parallel Chord Method by R
23
Define Functions for Example 1 in R
We will define some functions and
variables for finding the MLE in Example 1
by R
24
Parallel Chord Method by R (1)
25
Parallel Chord Method by R (2)
26
Parallel Chord Method by C/C++
27
Newton-Raphson Method (1)
'(s) '( ( k 1) ) (S ( k 1) ) ''( ( k 1) ) 0
( k 1)
'(
)
( k 1)
s
''( ( k 1) )
(k )
( k 1)
'(
)
( k 1)
( k 1)
G (
)
''( ( k 1) )
http://math.fullerton.edu/mathews/n2003
/Newton'sMethodMod.html
http://en.wikipedia.org/wiki/Newton%27s
_method
28
Newton-Raphson Method (2)
s
'( (0) )
'( (1) )
(0)
(1)
(2)
29
Plot the Newton-Raphson Method by R
30
Newton-Raphson Method by R (1)
31
Newton-Raphson Method by R (2)
32
Newton-Raphson Method by C/C++
33
Halley’s Method
The Newton-Raphson iteration function is
f ( x)
g ( x) x
f '( x)
It is possible to speed up convergence by
using more expansion terms than the
Newton-Raphson method does when the
object function is very smooth, like the
method by Edmond Halley (1656-1742):
f ( x)
f ( x) f ''( x)
g ( x) x
1
f '( x)
2( f ( x)) 2
1
34
(http://math.fullerton.edu/mathews/n2003/Halley'sMethodMod.html)
Halley’s Method by R (1)
35
Halley’s Method by R (2)
36
Halley’s Method by C/C++
37
Bisection Method (1)
Assume that f C[a, b] and that there exists
a number r [a, b] such that f (r ) 0 .
If f (a ) and f (b) have opposite signs, and cn
represents the sequence of midpoints
generated by the bisection process, then
ba
r cn n 1
n 1, 2,...
2
and the sequence cn converges to r.
That is, lim cn r .
n
(http://en.wikipedia.org/wiki/Bisection_method )
38
Bisection Method (2)
1
39
Plot the Bisection Method by R
40
Bisection Method by R (1)
> fix(Bisection)
41
Bisection Method by R (2)
42
Bisection Method by R (3)
43
Bisection Method by C/C++ (1)
44
Bisection Method by C/C++ (2)
45
Secant Method
1
''( ( k 1) )
slop ''( )
(k )
'(
) '(
( k 2)
( k 1)
( k 2)
( k 1)
)
(http://en.wikipedia.org/wiki/Secant_method )
(http://math.fullerton.edu/mathews/n2003/Secant
MethodMod.html )
46
Secant Method by R (1)
>fix(Secant)
47
Secant Method by R (2)
48
Secant Method by C/C++
49
Secant-Bracket Method
The secant-bracket method is also called
the regular falsi method.
'( A) '( B)
'(C ) '( A)
A B
CA
B '( A) A '( B)
C
'( A) '( B)
S
A
C
B
50
Secant-Bracket Method by R (1)
>fix(RegularFalsi)
51
Secant-Bracket Method by R (2)
52
Secant-Bracket Method by R (3)
53
Secant-Bracket Method by C/C++ (1)
54
Secant-Bracket Method by C/C++ (1)
55
Fisher Scoring Method
Fisher scoring method replaces ''( )
by E( ''( ( k 1) )) I ( ( k 1) ) where I is the
Fisher information matrix when the
parameter may be multivariate.
(k )
56
Fisher Scoring Method by R (1)
> fix(Fisher)
57
Fisher Scoring Method by R (2)
58
Fisher Scoring Method by C/C++
59
Order of Convergence
Order of convergence is p if
lim sup
k
x( k 1) s
x( k ) s
p
c 0c
and c<1 for p=1.
(http://en.wikipedia.org/wiki/Order_of_convergence)
( k 1)
(k )
ln
x
s
c
p
ln
x
s as k .
Note:
Hence, we can use regression to estimate p.
60
Theorem for Newton-Raphson Method
If I [a, b] , F : I I , F is a contraction
mapping F ( x) 0 x I then p=1 and
1
lim
k
x( k 1) s
x
(k )
s
F '( s) 1
If I [a, b] , ''' exists, ''( s) 0, '( ) 0
has a simple zero, then I1 [s , s ], >0,
such that G() of the Newton-Raphson
method is a contraction mapping and p=2.
1
61
Find Convergence Order by R (1)
R=Newton(y1, y2, y3, y4, initial)
#Newton method can be substitute for different method
temp=log(abs(R$iteration-R$phi));
y=temp[2:(length(temp)-1)]
x=temp[1:(length(temp)-2)]
lm(y~x)
62
Find Convergence Order by R (2)
63
Find Convergence Order by R (3)
64
Find Convergence Order by C/C++
65
Exercises
Write your own programs for those
examples presented in this talk.
Write programs for those examples
mentioned at the following web page:
http://en.wikipedia.org/wiki/Maximum_li
kelihood
Write programs for the other examples
that you know.
66
More Exercises (1)
Example 3 in genetics: The observed
data are (nO, nA, nB, nAB) = (176, 182,
60, 17) ~ Multinomial(r^2, p^2+2pr,
q^2+2qr, 2pq), where p, q, and r fall in
[0,1] such that p+q+r = 1. Find the
MLEs for p, q, and r.
67
More Exercises (2)
Example 4 in the positron emission tomography
(PET): The observed data are n*(d)
~Poisson(λ*(d)), d = 1, 2, …, D, and
B
* (d ) p(b, d ) (b).
.
b 1
The values of p(b,d) are known and the
unknown parameters are λ(b), b = 1, 2, …, B.
Find the MLEs for λ(b), b = 1, 2, …, B.
68
More Exercises (3)
Example 5 in the normal mixture: The observed
data xi, i = 1, 2, …, n, are random samples from
the following probability density function:
K
f ( xi )
K
k 1
k
k 1
k
Normal ( k , ),
2
k
1, and 0 k 1 for all k .
Find the MLEs for the following parameters:
(1,..., K , 1,..., K , 1,..., K ).
69