Transcript Document
Bayesian Models for Gene expression
With DNA Microarray Data
Joseph G. Ibrahim, Ming-Hui Chen, and Robert J. Gray
Presented by Yong Zhang
Goals:
1) To build a model to compare between normal
and tumor tissues and to find the genes
that best distinguish between tissue
types.
2) to develop model assessment
techniques so as to assess the fit of a
class of competing models.
Outline
General Model
Gene Selection Algo.
Prior Distributions
L measures(assessment)
example
Data structure
x: the expression level for a given gene
C0: threshold value for which a gene is
considered as not expressed
Let p = P(x=c0), then
where y is the continuous part for x.
• j=1, 2 index the tissue type(normal vs.
tumor)
• i=1,2,…nj, ith individual
• g=1,…G, gth gene
• xjig : the gene expression mixture
random
variable for the jth tissue type for the ith
individual and the gth gene.
The General Model
• Assume
• δjig = 1(xjig=c0)
• pjg=P(xjig=c0)=P(δjig = 1)
• =(,2,p)
• Data D=(x111,…x2,n2,G, )
• Likelihood function for :
L(|D)=
In order to find which genes best
discriminate
between the normal and tumor tissues, let
Then we set
such that we can use g to judge them.
Prior Distributions
•
• jg2 ~ Inverse Gamma(aj0,bj0)
• j0 ~ N(mj0,vj02), j=1,2
•
• bj0 ~ gamma(qj0,tj0)
• ejg ~ N(uj0,kj0wj02)
Gene Selection Algo.
1) For each gene, compute g and
2) Select a “threshold” value, say r0, to decide
which genes are different. If
3) Once the gth genes are declared different,
set 1g 2g, otherwise set 1g = 2g g , where
g is treated as unknown.
Gene Selection Algo.
4) Create several submodels using
several
values of r0.
5) Use L measure to decide which
submodel is the best one(smallest L
measure).
The properties of this approach
1) Model the gene expression level as
a mixture random variable.
2) Use a lognormal model for the
continuous part of the mixture.
3) Use L measure statistic for
evaluating models.
L measure for model assessment
• It relies on the notion of an imaginary
replicate experiment.
• Let z= (z111, …, z2,n2,G) denote future
values of a replicate experiment.
L measure is the expected squared Euclidean
distance between x and z,
A more general is
The r.s. of the last formula can be got by
MCMC.
Computational Algo.(MCMC)
1.
• For 1–4 and 6, the generation is
straightforward.
• For 5, we can use an adaptive
rejection algorithm(Gilks and Wild,
1992) because the corresponding
conditional posterior densities are
log-concave.
Discussion
• That model development and prior
distributions in this paper can be
easily extended to handle three or
more tissue types.
• More general classes of priors
• The gene selection criterions