Lecture 2/14/2006

Download Report

Transcript Lecture 2/14/2006

Maximum Likelihood - "Frequentist" inference
•x1,x2,....,xn ~ iid
f N ( x | μ , σ2 ) 
N(,2)
1
2π σ
(x μ) 2
e
2σ 2
•Joint pdf for the whole random sample
f ( x1 , x2 ,..., xn | μ , σ 2 )  f ( x1 | μ , σ 2 ) f ( x2 | μ , σ 2 )... f ( xn | μ , σ 2 )
•Likelihood function is basically the pdf for the fixed sample
l ( μ , σ | x1 , x2 ,..., xn )  f ( x1 | μ , σ) f ( x2 | μ , σ)... f ( xn | μ , σ)
•Maximum likelihood estimates of the model parameters 
and 2 are numbers that maximize the joint pdf for the fixed
sample which is called the Likelihood function
x
μ̂ 
n
2-9-06
i
σ̂
2
( x  μ̂)


2
i
n
1
Sampling Distributions
•x1,x2,....,xn ~ iid
 xi
μ̂  x 
n
f N ( x | μ , σ2 ) 
N(,2)
σ̂  s
2
2
( xi  μ̂) 2


2π σ
e
2σ 2
x
t* 
n -1
1
(x μ) 2
s
1
n
•"Sample Statistics" is a numerical summary of a random sample
(e.g. x , s 2 , t * ). As functions of random variables, they are also
random variables.
•"Sampling Distribution" is the probability distribution (statistical
model) of "Sample Statistics" - can be derived from the
probability distribution of experimental outcomes
x ~ N ( ,
2-9-06
2
n
)
(n - 1) s 2

~  n21
t * |   0 ~ t n21
2
"Frequentist" inference
•Assume that parameters in the model describing the probability
of experimental outcome are unknown, but fixed values
•Given a random sample of experimental outcome (data), we
make inference (i.e. make probabilistic statements) about the
values of the underlying parameters based on the sampling
distributions of parameter estimates and other "sample statistics"
•Since model parameters are not random variables, these
statements are somewhat contrived. For example we don't talk
about the p(>0), but about p(t>t*|=0).
•However, for simple situations this works just fine and
arguments are mostly philosophical
2-9-06
3
Bayesian Inference
• Assumes parameters are random variables - key difference
• Inference based on the posterior distribution given
data
• Prior Distribution
Defines prior knowledge or ignorance about the
parameter
• Posterior Distribution
Prior belief modified by data
2-9-06
Prior :
p(  )
Likelihood :
l ( x1 ,..., xn| )
Posterior :
f(μ | x1 ,..., xn ) 
l ( x1 ,..., xn | μ) p( μ)
D( x1 ,..., xn )
4
Bayesian Inference
Prior distribution of 
Prior :
 |  , 2 ~ N ( , 2 )
-10
-8
-6
-4
-2
0
2
4
6
8
10
0
2
4
6
8
10
8
10

Data model given 
Likelihood :
x 1 , x 2 ,..., x n |  ,  2 ~ N (  ,  2 )
-10
Posterior :
1
n


x
2
2
 2 2
2


 |  , , x1 ,..., xn ~ N (
, 2
)
2
1
n
 2 n  
2
 
-6
-4
-2
LogRatio
Posterior distribution of  given data
(Bayes theorem)
P(>0|data)
-10
2-9-06
-8
-8
-6
-4
-2
0

2
4
6
5
Bayesian Estimation
• Bayesian point-estimate is the expected value of the parameter under its
posterior distribution given data
1
Posterior :
 |  , , x1 ,..., xn ~ N ( 
2

2
1

2

n

2
n

x
 
, 2
)
n   2
2
2
1

2
E[  |  , , x1 ,..., xn ]  
2

2
1

2

n
2
x
n
2
• In some cases, the expectation of the posterior distribution could be
difficult to assess - easer to find the value for the parameter that
maximized the posterior distribution given data - Maximum a Posteriori
(MAP) estimate
• Since the numerator of the posterior distribution in the Bayes theorem is
constant in the parameter, this is equivalent to maximizing the product
of the likelihood and the prior pdf
Posterior :
2-9-06
f(μ | x1 ,..., xn ) 
l ( x1 ,..., xn | μ) p( μ)
D( x1 ,..., xn )
6
Alternative prior for the normal model
• Degenerate uniform prior for  assuming that any prior value is equally
likely - this is clearly unrealistic - we know more than that
Prior :
Posterior :
p( )  1
f (  | x1 ,..., xn ) 
l ( x1 ,..., xn |  ) p(  )
 const * l ( x1 ,..., xn |  )
P( x1 ,..., xn )
• MAP estimate for  is identical to the maximum likelihood estimate
• Bayesian point-estimation and maximum likelihood are very closely
related
2-9-06
7
Hierarchical Bayesian Models and Empirical Bayes
Inference
MOTIVATION
•xij ~ ind N(j,j2), i=1,...,n is number of replicated observations
and j=1,...,T is indexing all genes
•Each gene has its own mean and variance
•Usually n is small in comparison to T
•Want to use information from all genes to estimate the variance
of individual gene measurements
2-9-06
8
Hierarchical Bayesian Models and Empirical Bayes
Inference
SOLUTION
•Postulate the "hierarchical" Bayesian model in which individual variances for
different genes are assumed to be generated by a single distributions
Level 2 :
Level1 :
1
j
2
| d 0 , s02 ~
1
d 0 s02
 d20
xij |  j ,  j 2 ~ N ( j ,  j 2 )
•Estimate the parameters of this distribution using the Empirical Bayes
approach
•Estimate individual gene's variances using Bayesian estimation assuming the
prior parameters calculated using Empirical Bayes
d 0 s02  (n  1)ˆ i2
2
2
2
~
ˆ
si  E[ i |  i , d 0 , s0 ] 
d0  n 1
2-9-06
9
Hierarchical Bayesian Models and Empirical Bayes
Inference
•Testing the hypothesis i=0, by calculating the modified t-statistics
ˆ j
*
t 
|  j  0 ~ t d 0  n 1
1
~
sj
n
•Limma operates on linear models
yj = Xj + j, 1j,...,nj ~ N(0,j2)
and the Empirical Bayes estimation is applied to estimate 2
for each gene
2-9-06
10
Effects of using Empirical Bayes modifications
> attributes(FitLMAD)
$names
[1] "coefficients"
[6] "pivot"
"stdev.unscaled"
"method"
"sigma"
"design"
"df.residual"
"genes"
"cov.coefficients"
"Amean"
"stdev.unscaled"
"method"
"s2.prior"
"p.value"
"sigma"
"design"
"var.prior"
"lods"
"df.residual"
"genes"
"proportion"
"F"
"cov.coefficients"
"Amean"
"s2.post"
"F.p.value"
$class
[1] "MArrayLM"
attr(,"package")
[1] "limma"
> attributes(EFitLMAD)
$names
[1] "coefficients"
[6] "pivot"
[11] "df.prior"
[16] "t"
$class
[1] "MArrayLM"
attr(,"package")
[1] "limma"
2-9-06
11
Effects of using Empirical Bayes modifications
d 0 s02  (n  1)ˆ i2
2
2
2
~
ˆ
si  E[ i |  i , d 0 , s0 ] 
d0  n 1
t* 
> EFitLMAD$s2.prior
[1] 0.03466463
> EFitLMAD$df.prior
[1] 4.514814
ˆ
~ t d 0  n 1
0
0.00
1
0.05
2
3
0.10
4
0.15
1
~
si
n
Simple
2-9-06
EmpiricalBayes
Simple
EmpiricalBayes
12
Effects of using Empirical Bayes modifications
0.00
-0.05
-0.10
•Empirical Bayes "inflates variances"
from the low-variability genes
•This reduces the proportion of "false
positive" resulting from the low variance
•It biases chance of being differentially
expressed towards genes with higher
observed differential expressions
•It has been shown to overall improve
the proportion of true positives among
the genes pronounced significant
•"Stein effect" - individually we can not
improve over the simple t-test, but by
looking at all genes at the same time,
turns out that this method works better
0.05
> AnovadB$s2.prior
[1] 0.0363576
> AnovadB$df.prior
[1] 5.134094
EmpiricalBayes-Simple
2-9-06
13
Effects of using Empirical Bayes modifications
> AnovadB$s2.prior
[1] 0.0363576
> AnovadB$df.prior
[1] 5.134094
1.5
0.5
0.0
-4
-2
0
Two-sample t-test p-value
2-9-06
1.0
-log10(EmpiricalBayesPvalue)
0.3
0.2
0.1
0.0
Paired t-test p-value
0.4
2.0
Two-sample vs Paired t-test
2
4
-4
-2
0
2
4
FitLMAD$coefficients[, "Nic"]
14
Effects of using Empirical Bayes modifications
1.5
1.0
0.5
0.0
-log10(EmpiricalBayesPvalue)+log10(SimplePvalue)
> AnovadB$s2.prior
[1] 0.0363576
> AnovadB$df.prior
[1] 5.134094
-4
2-9-06
-2
0
FitLMAD$coefficients[, "Nic"]
2
4
15