Transcript MLE

Maximum Likelihood Estimates
and the EM Algorithms I
Henry Horng-Shing Lu
Institute of Statistics
National Chiao Tung University
[email protected]
1
Part 1
Computation Tools
2
Computation Tools
R (http://www.r-project.org/): good for
statistical computing
 C/C++: good for fast computation and
large data sets
 More:
http://www.stat.nctu.edu.tw/subhtml/sourc
e/teachers/hslu/course/statcomp/links.htm

3
The R Project
R is a free software environment for
statistical computing and graphics. It
compiles and runs on a wide variety of
UNIX platforms, Windows and MacOS.
 Similar to the commercial software of Splus.
 C/C++, Fortran and other codes can be
linked and called at run time.
 More: http://www.r-project.org/

4
Download R from
http://www.r-project.org/
5
Choose one Mirror Site of R
6
Choose the OS System
7
Select the Base of R
8
Download the Setup Program
9
Install R
Double click R-icon to
install R
10
Execute R
Interactive
command
window
11
Download Add-on Packages
12
Choose a Mirror Site
Choose a
mirror site
close to you
13
Select One Package to Download
Choose one
package to
download, like
“rgl” or
“adimpro”.
14
Load Packages

There are two methods to load packages:
Method 1:
Click from the
menu bar
Method 2:
Type “library(rgl)”
in the command
window
15
Help in R (1)

What is the loaded library?

help(rgl)
16
Help in R (2)

How to search functions for key words?


help.search(“key words”)
It will show all functions has the key words.
help.search(“3D plot”)
17
Help in R (3)

How to find the illustration of function?


?function name
It will show the usage, arguments, author,
reference, related functions, and examples.
?plot3d
18
R Operators (1)

Mathematic operators:



+, -, *, /, ^
Mod: %%
sqrt, exp, log, log10, sin, cos, tan, …
19
R Operators (2)

Other operators:







:
%*%
<, >, <=, >=
==, !=
&, &&, |, ||
~
<-, =
sequence operator
matrix algebra
inequality
comparison
and, or
formulas
assignment
20
Algebra, Operators and Functions
> 1+2
[1] 3
> 1>2
[1] FALSE
> 1>2 | 2>1
[1] TRUE
> A = 1:3
> A
[1] 1 2 3
> A*6
[1] 6 12 18
> A/10
[1] 0.1 0.2 0.3
> A%%2
[1] 1 0 1
> B = 4:6
> A*B
[1] 4 10 18
> t(A)%*%B
[1]
[1] 32
> A%*%t(B)
[1] [2] [3]
[1] 4 5 6
[2] 8 10 12
[3] 12 15 18
> sqrt(A)
[1] 1.000 1.1414 1.7320
> log(A)
[1] 0.000 0.6931 1.0986
> round(sqrt(A), 2)
[1] 1.00 1.14 1.73
> ceiling(sqrt(A))
[1] 1 2 2
> floor(sqrt(A))
[1] 1 1 1
> eigen(A%*%t(B))
$values
[1] 3.20e+01 8.44e-16 -4.09e-16
$vectors
[1]
[2]
[3]
[1,] -0.2673 0.3112 -0.2353
[2,] -0.5345 -0.8218 -0.6637
[3,] -0.8018 0.4773 0.7100
21
Variable Types
Item
Descriptions
Vector
X=c(10.4,5.6,3.1,6.4) or Z=array(data_vector,
dim_vector)
Matrices
X=matrix(1:8,2,4) or Z=matrix(rnorm(30),5,6)
Factors
Statef=factor(state)
Lists
pts = list(x=cars[,1], y=cars[,2])
Data Frames
data.frame(cbind(x=1, y=1:10),
fac=sample(LETTERS[1:3], 10, repl=TRUE))
Functions
name=function(arg_1,arg_2,…) expression
Missing
Values
NA or NAN
22
Define Your Own Function (1)
Use "fix(myfunction)"
# a window will show up
 function(parameter){
statements;
return (object);
# if you want to return some values
}
 Save the document
 Use "myfunction(parameter)" in R

23
Define Your Own Function (2)

Example: Find all the factors of an integer
24
Define Your Own Function (3)
When you leave the program, remember to save
the work space for the next use, or the function
you defined will disappear after you close R
project.
25
Read and Write Files
Write Data to a TXT File
 Write Data to a CSV File
 Read TXT and CSV Files
 Demo

26
Write Data to a TXT File

Usage:
write(x, file, …)
> X = matrix(1:6, 2, 3)
>X
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> write(t(X), file = "d:/out1.txt", ncolumns = 3)
> write(X, file = "d:/out2.txt", ncolumns = 3)
d:/out1.txt
1 3 5
2 4 6
d:/out2.txt
1 2 3
4 5 6
27
Write Data to a CSV File

Usage:
write.table(x, file = "foo.csv", …)
d:/out1.csv
d:/out2.csv
> X = matrix(1:6, 2, 3)
>X
1,2
1,3,5
[,1] [,2] [,3]
3,4
2,4,6
[1,] 1 3 5
5,6
[2,] 2 4 6
> write.table(t(X), file = "d:/out1.csv", sep = ",", col.names
= FALSE, row.names = FALSE)
> write.table(X, file = "d:/out2.csv", sep = ",", col.names =
FALSE, row.names = FALSE)
28
Read TXT and CSV Files

Usage:
read.table(file, ...)
> X = read.table(file = "d:/out1.txt")
>X
V1 V2 V3
1 1 3 5
2 2 4 6
> Y = read.table(file = "d:/out1.csv", sep = ",", header =
FALSE)
>Y
V1 V2
1 1 2
2 3 4
3 5 6
29
Demo (1)

Practice for read file and basic analysis
> Data = read.table(file = "d:/01.csv", header = TRUE, sep
= ",")
> Data
Y
X1
X2
[1,] 2.651680 13.808990 26.75896
[2,] 1.875039 17.734520 37.89857
[3,] 1.523964 19.891030 26.03624
[4,] 2.984314 15.574260 30.21754
[5,] 10.423090 9.293612 28.91459
[6,] 0.840065 8.830160 30.38578
[7,] 8.126936 9.615875 32.69579
01.csv
30
Demo (2)

Practice for read file and basic analysis
> mean(Data$Y)
[1] 4.060727
> boxplot(Data$Y)
> boxplot(Data)
31
Part 2
Motivation Examples
32
Example 1 in Genetics (1)

Two linked loci with alleles A and a, and B
and b



A, B: dominant
a, b: recessive
A double heterozygote AaBb will produce
gametes of four types: AB, Ab, aB, ab
A
A
a
B
b
1/2
1/2
A
B
A
a
a
B
b
b
1/2
b
a
1/2
B
33
Example 1 in Genetics (2)

Probabilities for genotypes in gametes
No Recombination
Recombination
Male
1-r
r
Female
1-r’
r’
A
A
a
B
b
1/2
1/2
A
B
A
a
a
B
b
1/2
a
1/2
b
AB
ab
aB
Ab
Male
(1-r)/2
(1-r)/2
r/2
r/2
Female
(1-r’)/2
(1-r’)/2
r’/2
r’/2
b
B
34
Example 1 in Genetics (3)
Fisher, R. A. and Balmukand, B. (1928).
The estimation of linkage from the offspring
of selfed heterozygotes. Journal of
Genetics, 20, 79–92.
 More:
http://en.wikipedia.org/wiki/Genetics
http://www2.isye.gatech.edu/~brani/isyeba
yes/bank/handout12.pdf

35
Example 1 in Genetics (4)
MALE
F
E
M
A
L
E
AB
(1-r)/2
ab
(1-r)/2
aB
r/2
Ab
r/2
AB
(1-r’)/2
AABB
(1-r) (1-r’)/4
aABb
(1-r) (1-r’)/4
aABB
r (1-r’)/4
AABb
r (1-r’)/4
ab
(1-r’)/2
AaBb
(1-r) (1-r’)/4
aabb
(1-r) (1-r’)/4
aaBb
r (1-r’)/4
Aabb
r (1-r’)/4
aB
r’/2
AaBB
(1-r) r’/4
aabB
(1-r) r’/4
aaBB
r r’/4
AabB
r r’/4
Ab
r’/2
AABb
(1-r) r’/4
aAbb
(1-r) r’/4
aABb
r r’/4
AAbb
r r’/4
36
Example 1 in Genetics (5)










Four distinct phenotypes:
A*B*, A*b*, a*B* and a*b*.
A*: the dominant phenotype from (Aa, AA, aA).
a*: the recessive phenotype from aa.
B*: the dominant phenotype from (Bb, BB, bB).
b*: the recessive phenotype from bb.
A*B*: 9 gametic combinations.
A*b*: 3 gametic combinations.
a*B*: 3 gametic combinations.
a*b*: 1 gametic combination.
Total: 16 combinations.
37
Example 1 in Genetics (6)

Let   (1  r )(1  r ') , then
2 
P( A * B*) 
4
1
P( A * b*)  P(a * B*) 
4
P(a * b*) 

4
38
Example 1 in Genetics (7)

Hence, the random sample of n from the
offspring of selfed heterozygotes will follow
a multinomial distribution:
 2   1 1  
Multinomial  n;
,
,
, 
4
4
4 4

We know that
  (1  r )(1  r '), 0  r  1/ 2, and 0  r '  1/ 2
So
1/ 4    1
39
Example 1 in Genetics (8)

Suppose that we observe the data of
y   y1, y2 , y3 , y4   125,18,20,24
which is a random sample from
 2   1 1  
Multinomial  n;
,
,
, 
4
4
4 4

Then the probability mass function is
n!
2   y 1 y  y  y
g ( y,  ) 
(
) (
)
( )
2
1
y1 ! y2 ! y3 ! y4 !
4
4
3
4
4
40
Estimation Methods

Frequentist Approaches:
http://en.wikipedia.org/wiki/Frequency_probability
Method of Moments Estimate (MME)
http://en.wikipedia.org/wiki/Method_of_moments
_%28statistics%29
Maximum Likelihood Estimate (MLE)
http://en.wikipedia.org/wiki/Maximum_likelihood

Bayesian Approaches:
http://en.wikipedia.org/wiki/Bayesian_probability
41
Method of Moments Estimate (MME)
Solve the equations when population
moments are equal to sample moments:
 'k  m 'k for k = 1, 2, …, t, where t is
the number of parameters to be estimated.
 MME is simple.
 Under regular conditions, the MME is
consistent!
 More:
http://en.wikipedia.org/wiki/Method_of_mo
ments_%28statistics%29

42
MME for Example 1
y1 1 
2 
E (Y1 )  n
 y1  ˆ1  4(  ) 
4
n 2

y2 
1
E (Y2 )  n
 y2  ˆ2  1  4
ˆ1  ˆ2  ˆ3  ˆ4
4
n 
  ˆMME 
y
1
4
E (Y3 )  n
 y3  ˆ3  1  4 3 
4
n 

4 y4


E (Y4 )  n  y4
 ˆ4 
4
n


Note: MME can’t assure ˆMME [1/ 4,1]!
43
MME by R
> MME <- function(y1, y2, y3, y4){
n = y1+y2+y3+y4;
phi1 = 4.0*(y1/n-0.5);
phi2 = 1-4*y2/n;
phi3 = 1-4*y3/n;
phi4 = 4.0*y4/n;
phi = (phi1+phi2+phi3+phi4)/4.0;
print("By MME method");
return(phi); # print(phi);
}
> MME(125, 18, 20, 24)
[1] "By MME method"
[1] 0.5935829
44
MME by C/C++
45
Maximum Likelihood Estimate (MLE)
Likelihood:
 Maximize likelihood: Solve the score
equations, which are setting the first
derivates of likelihood to be zeros.
 Under regular conditions, the MLE is
consistent, asymptotic efficient and normal!
 More:
http://en.wikipedia.org/wiki/Maximum_likel
ihood

46
Example 2 (1)

We toss an unfair coin 3 times and the
random variable is
1, if the ith trial is head;
Xi  
0, if the ith trial is tail.
If p is the probability of tossing head, then
1 with probability p;
Xi  
0 with probability 1- p.
47
Example 2 (2)

The distribution of “# of tossing head”:
# of tossing head
( x1 , x2 , x3 )
probability
0
(0,0,0)
(1-p)3
1
(1,0,0) (0,1,0) (0,0,1)
3p(1-p)2
2
(0,1,1) (1,0,1) (1,1,0)
3p2(1-p)
3
(1,1,1)
p3
48
Example 2 (3)

Suppose we observe the toss of 1 heads
and 2 tails, the likelihood function becomes
3
L( p | x1 , x2 , x3 )    p(1  p)2 , where 0  p  1
 2
One way to maximize this likelihood
function is by solving the score equation,
which sets the first derivative to be zero:
 3
2
2
2
p
(1

p
)

3(1

p
)

6
p
(1

p
)

9
p
 12 p  3 = 0
 
p  2 
49
Example 2 (4)

The solution of p for the score equation is
1/3 or 1.

One can check that p=1/3 is the maximum
point. (How?)

Hence, the MLE of p is 1/3 for this example.
50
MLE for Example 1 (1)

Likelihood
n!
2   y1 1   y2  y3  y4
L( ) 
(
) (
)
( )
y1 ! y2 ! y3 ! y4 ! 4
4
4

MLE:
ˆMLE  max L( )  ˆMLE  max log L( )


n!
2 
( )  logL( )  log(
)  y1 log(
)
y1 ! y2 ! y3 ! y4 !
4
1

 ( y2  y3 ) log(
)  y4 log( )
4
4
51
MLE for Example 1 (2)
y2  y3 y4
y1
d
d
l   
log L( ) 


0
d
d
2   1

( y1  y2  y3  y4 ) 2  ( y1  2 y2  2 y3  y4 )  2 y4  0
A
MLE
B
C
 B  B2  4 AC

2A
52
MLE for Example 1 (3)

Checking:

d 2 ( )
1. d 2
 ˆ
 0?
MLE

2. 1/ 4  ˆMLE  1?

3. Compare log L(ˆMLE ) ?
53
Use R to find MLE (1)
>
>
>
+
#MLE
y1 = 125; y2 = 18; y3 = 20; y4 = 24
f <- function(phi){
((2.0+phi)/4.0)^y1 * ((1.0-phi)/4.0)^(y2+y3) *
(phi/4.0)^y4
+}
> plot(f, 1/4, 1, xlab = expression(varphi), ylab = "likelihood
function multipling a constant")
> optimize(f, interval = c(1/4, 1), maximum = T)
$maximum
[1] 0.5778734
$objective
[1] 7.46944e-82
54
Use R to find MLE (2)
55
Use C/C++ to find MLE (1)
56
Use C/C++ to find MLE (2)
57
Exercises
Write your own programs for those
examples presented in this talk.
 Write programs for those examples
mentioned at the following web page:
http://en.wikipedia.org/wiki/Maximum_likel
ihood
 Write programs for the other examples that
you know.

58
More Exercises (1)

Example 3 in genetics:
The observed data are
 nO , nA , nB , nAB   176,182, 60,17 
~ Multinomial  r 2 , p 2  2 pr , q 2  2qr , 2 pq 
where p , q , and r fall in [0,1]
such that p  q  r  1
Find the likelihood function and score
equations for p, q, and r.
59
More Exercises (2)

Example 4 in the positron emission
tomography (PET):
The observed data are
n*  d  ~ Poisson   *  d   , d  1, 2, , D
and
B
 * (d )   p(b, d ) (b).
b 1
The values of p b, d  are known and the
unknown parameters are  b , b  1, 2, , B.
 Find the likelihood function and score
equations for  b , b  1, 2, , B.

60
More Exercises (3)

Example 5 in the normal mixture:
The observed data X i , i  1, 2,, n are random
samples from the following probability
density function:
K
f ( xi ) ~   k Normal ( k ,  ),
k 1

2
k
K

k 1
k
 1, and 0   k  1 for all k.
Find the likelihood function and score
equations for the following parameters:
(1,...,  K , 1,..., K , 1,...,  K ).
61