Module Networks
Download
Report
Transcript Module Networks
Module Networks
Discovering Regulatory Modules and their
Condition Specific Regulators from Gene
Expression Data
Cohen Jony
Outline
The Problem
Regulators
Module Networks
Learning Module Networks
Results
Conclusion
The Problem
Inferring regulatory networks from gene expression data.
Into:
From:
Regulators
Regulation types
Regulators example
This is an example
for a regulating
module.
Known solution: Bayesian Networks
The problem:
Too many variables and too
little data cause statistical
noise to lead to spurious
dependencies, resulting in
models that significantly over
fit the data.
From Bayesian To Module
Module Networks
We assume that we are given a domain of random variables
X = {X1; : : : ;Xn}.
We use Val(Xi) to denote the domain of values of the variable Xi.
A module set C is a set of such formal
variables M1; : : : ;MK. As all the variables in a module share the
same CPD.
Note that all the variables must have the same domain of
values!
Module Networks
A module network template T = (S; θ) for C defines, for each
module Mj in C:
1) a set of parents PaMj from X;
2) a conditional probability template (CPT) P( Mj | PaMj ) which
specifies a distribution over Val (Mj ) for each assignment in
Val (PaMj ).
We use S to denote the dependency structure encoded by
{PaMj : Mj in C} and θ to denote the parameters required for the
CPTs {P( Mj | PaMj ) : Mj in C}.
Module Networks
A module assignment function for C is a function
A : X → {1; : : : ;K} such that A(Xi) = j only if Val (Xi) = Val (Mj ).
A module network is defined by both the module network
template and the assignment function.
Example
In our example, we have
three modules M1, M2, and
M3.
PaM1 = Ø , PaM2 = {MSFT},
and PaM3 = {AMAT; INTL}.
In our example, we have
that A(MSFT) = 1, A(MOT) =
2, A(INTL) = 2, and so on.
Learning Module Networks
The iterative learning procedure attempts to search for the
model with the highest score by using the expectation
Maximization (EM) algorithm.
An important property of the EM algorithm is that each iteration
is guaranteed to improve the likelihood of the model, until
convergence to a local maximum of the score.
Each iteration of the algorithm consists of two steps:
M-step
E-step
Learning Module Networks cont.
the M-step , the procedure is given a partition of the genes
into modules and learns the best regulation program (regression
tree) for each module.
In
The
regulation program is learned via a combinatorial search over
the space of trees.
The
tree is grown from the root to its leaves. At any given node,
the query which best partitions the gene expression into two
distinct distributions is chosen, until no such split exists.
Learning Module Networks cont.
In the E-step , given the inferred regulation programs, we
determine the module whose associated regulation program
best predicts each gene’s behavior.
We test the probability of a gene’s measured expression
values in the dataset under each regulatory program,
obtaining an overall probability that this gene’s expression
profile was generated by this regulation program.
We then select the module whose program gives the gene’s
expression profile the highest probability, and re-assign the
gene to this module.
We take care not to assign a regulator gene to a module in
which it is also a regulatory input.
Bayesian score
When the priors satisfy the assumptions above, the Bayesian
score decomposes into local module scores:
k
score ( S , A : D) scoreMj ( PaMj , A( X ) : D)
j
j 1
Where…
Bayesian score cont.
score M j (U , X : D )
log
L
j
(U , X , M j : D ) P ( M j | S j U )
log P ( S j U ) log P ( A j X )
Where Lj(U,X, ӨMj:D) is the Likelihood function .
Where P(ӨMj | Sj =u) is the Priors.
Where Sj = U denotes that we chose a structure where U are
the parents of module Mj.
Where Aj = X denotes that A is such that Xj = X.
Assumptions
Let P(A), P(S | A), P(Ө | S,A) be assignment, structure, and
parameter priors.
P(Ө | S,A) satisfies parameter independence if
k
P( | S , A) P( M j | PaM | S , A)
j
j 1
P(Ө | S,A) satisfies parameter modularity if
P( M j |Pa | S1 , A) P( M j |Pa | S2 , A)
Mj
for all structures S1 and S2 such that
Mj
PaMS1 j PaMS2 j
Assumptions
P(Ө, S | A) satisfies assignment independence if
P(Ө | S, A) = P(Ө | S) and P(S | A) = P(S).
j
j
j
P(S) satisfies structure modularity if
where Sj denotes the choice of parents for module Mj , and ρj
is a distribution over the possible parent sets for module Mj.
P( S ) ( S )
P( S ) j j ( S j )
P(A) satisfies assignment modularity if
where Aj is the choice of variables assigned to module Mj, and
{αj : j = 1; : : : ;K} is a family of functions from 2^X to the
positive reals.
Assumptions - Explainations
Parameter independence, parameter modularity, and
structure modularity are the natural analogues of standard
assumptions in Bayesian network learning.
Parameter independence implies that P(Ө | S, A) is a product
of terms that parallels the decomposition of the likelihood, with
one prior term per local likelihood term Lj.
Parameter modularity states that the prior for the parameters
of a module Mj depends only on the choice of parents for Mj
and not on other aspects of the structure.
Structure modularity implies that the prior over the structure S
is a product of terms, one per each module.
Assumptions - Explainations
These two assumptions are new to module networks.
Assignment independence: makes the priors on the parents
and parameters of a module independent of the exact set of
variables assigned to the module.
Assignment modularity: implies that the prior on A is
proportional to a product of local terms, one corresponding to
each module.
Thus, the reassignment of one variable from one module Mi to
another Mj does not change our preferences on the
assignment of variables in modules other than i; j.
Experiments
The network learning procedure was evaluated on synthetic
data, gene expression data, and stock market data.
The data consisted solely of continuous values. As all of the
variables have the same domain, the definition of the module
set reduces simply to a specification of the total number of
modules.
Beam search was used as the search algorithm, using a look
ahead of three splits to evaluate each operator.
As a comparison, Bayesian networks were used with
precisely the same structure learning algorithm, simply
treating each variable as its own module.
Synthetic data
The synthetic data was generated by a known module
network.
The generating model had 10 modules and a total of 35
variables that were a parent of some module. From the
learned module network, 500 variables where selected,
including the 35 parents.
This procedure was run for training sets of various sizes
ranging from 25 instances to 500 instances, each repeated 10
times for different training sets.
Synthetic data - results
Generalization to unseen test data, measuring the likelihood
ascribed by the learned model to 4500 unseen instances.
As expected, models learned with larger training sets do
better; but, when run using the correct number of 10 modules,
the gain of increasing the number of data instances beyond
100 samples is small.
Models learned with a larger number of modules had a wider
spread for the assignments of variables to modules and
consequently achieved poor performance.
Synthetic data – results cont.
Log-likelihood per instance assigned to held-out data.
For all training set sizes,
except 25, the model with 10
modules performs the best.
Synthetic data – results cont.
Fraction of variables assigned to the largest 10 modules.
Models learned using 100,
200, or 500 instances and up
to 50 modules assigned 80%
of the variables to 10
modules.
Synthetic data – results cont.
Average percentage of correct parent-child relationships recovered.
The total number of parent-child
relationships in the generating
model was 2250.
The procedure recovers 74% of
the true relationships when
learning from a dataset of size
500 instances.
Synthetic data – results cont.
As the variables begin fragmenting over a large number of
modules, the learned structure contains many spurious
relationships.
Thus in domains with a modular structure, statistical noise is
likely to prevent overly detailed learned models such as
Bayesian networks from extracting the commonality between
different variables with a shared behavior.
Gene Expression Data
Expression data which measured the response of yeast to
different stress conditions was used.
The data consists of 6157 genes and 173 experiments.
2355 genes that varied significantly in the data were selected
and learned a module network over these genes.
A Bayesian network was also learned over this data set.
Candidate regulators
A set of 466 candidate regulators was compiled from SGD
and YPD.
Both transcriptional factors and signaling proteins that may
have transcriptional impact.
Also included genes described to be similar to such
regulators.
Excluded global regulators, whose regulation is not specific to
a small set of genes or process.
Gene Expression reasults
The figure demonstrates that
module networks generalize
much better then Bayesian
network to unseen data for
almost all choices of number
of modules.
Biological validity
Biological validity of the learned module network with 50
modules was tested.
The enriched annotations reflect the key biological processes
expected in our dataset.
For example, the “protein folding” module contains 10 genes, 7
of which are annotated as protein folding genes. In the whole
data set, there are only 26 genes with this annotation. Thus,
the p-value of this annotation, that is, the probability of
choosing 7 or more genes in this category by choosing 10
random genes, is less than 10^-12.
42 modules, out of 50, had at least one significantly enriched
annotation with a p-value less than 0.005.
Biological validity Cont.
The enrichment of both HAP4
motif and STRE, recognized by
Hap4 and Msn4, respectively,
supporting their inclusion in the
module’s regulation program.
Lines represent 500 bp of
genomic sequence located
upstream to the start codon of
each of the genes; colored
boxes represent the presence
of cis-regulatory motifs locates
in these regions.
Stock Market Data
NASDAQ stock prices for 2143 companies, covering 273
trading days.
stock → variable, instance → trading day.
The value of the variable is the log of the ratio between that
day’s and the previous day’s closing stock price.
As potential controllers, 250 of the 2143 stocks, whose
average trading volume was the largest across the dataset
were selected.
Stock Market Data
Cross validation is used to
evaluate the generalization
ability of different models.
Module networks perform
significantly better than
Bayesian networks in this
domain.
Stock Market Data
Module networks compared with Autoclass
Significant enrichment for 21
annotations, covering a wide
variety of sectors where found.
In 20 of the 21 cases, the
enrichment was far more
significant in the modules
learned using module networks
compared to the one learned by
AutoClass.
Conclusions
The results show that learned module networks have much
higher generalization performance than a Bayesian network
learned from the same data.
Parameter sharing between variables in the same module
allows each parameter to be estimated based on a much
larger sample, this allows us to learn dependencies that are
considered too weak based on statistics of single variables.
(these are well-known advantages of parameter sharing);
An interesting aspect of the method is that it determine
automatically which variables have shared parameters.
Conclusions
The assumption of shared structure significantly restricts the
space of possible dependency structures, allowing us to learn
more robust models than those learned in a classical
Bayesian network setting.
In module network, a spurious correlation would have to arise
between a possible parent and a large number of other
variables before the algorithm would introduce the
dependency.
Overview on Module Networks
Literature
Reference: Discovering Regulatory Modules and their
Condition Specific Regulators from Gene Expression Data.
By: Eran Segal, Michal Shapira, Aviv Regev, Dana Pe’er,
David Botstein, Daphne Koller & Nir Friedman.
Bibliography:
P. Cheeseman, J. Kelly, M. Self, J. Stutz, W. Taylor, and D. Freeman.
Autoclass: a Bayesian classification system. In ML ’88. 1988.
THE END