Dynamic Bayesian Networks
Download
Report
Transcript Dynamic Bayesian Networks
Dynamic Bayesian Networks
Sushmita Roy
[email protected]
Computational Network Biology
Biostatistics & Medical Informatics 826
Computer Sciences 838
https://compnetbiocourse.discovery.wisc.edu
Oct 18th 2016
Goals for today
•
•
•
•
•
Dynamic Bayesian Network
How does a DBN differ from a static BN?
Learning a Dynamic Bayesian Network
DBN with structure priors
Application of DBN to a phospho-proteomics
time course
– Evaluation and insights
Dynamic Bayesian Networks
• A Dynamic Bayesian network (DBN) is a
Bayesian network that can model
temporal/sequential data
• DBN is a Bayes net for dynamic processes
• A DBN also has a graph structure and
conditional probability distributions
• The DBN specifies how observations at a
future time point may arise from previous
time points.
Notation
• Assume we have a time course with T time points
specifying activity of p different variables
• Let
denote the set of random
variables at time t
• A DBN over these variables defines the joint
distribution of P(X), where
• A DBN, like a BN, has a directed acyclic graph G
and parameters Θ
• G typically specifies the dependencies between
time points
– In addition we need to specify dependence (if any) at
t=0
A DBN for p variables and T time points
t=0
p
X21
…
XT1
X2 1
X12
X22
…
XT2
…
XTp
Xp 1
X1p
X2p
Dependency at the
first time point
t=T
…
X11
…
X1 1
…
t=2
…
t=1
X2: Variables at time t=2
Stationary assumption in a Bayesian network
The stationarity assumption states that the
dependency structure and parameters do not change
with t
Due to this assumption, we only need to specify
dependencies between two sets of variables
X21
…
XT1
X1t
X1t+1
X12
X22
…
XT2
X2t
X2t+1
…
XTp
t+1
…
X2p
t
…
X1p
t=T
…
X11
…
t=2
…
p
t=1
Xpt
Xpt+1
Computing the joint probability distribution in
a DBN
Joint Probability Distribution can be factored into a product of
conditional distributions across time and variables:
Parameters specifying
Graph encoding
dependency structure the form of the
conditional distributions
between variables at
consecutive time points
Parents of Xit defined by
the graph G
Learning problems in DBNs
• Parameter learning: Given known temporal
dependencies between random variables
estimate the parameters from observed
measurements
• Structure learning: Given data, learn both the
graph structure and parameters
– Complexity of learning depends upon the order of
the model
An example DBN
• Let us consider a simple example of two
regulators, B and C and one target gene, A
• Assume their expression takes on values H, L and
NC (for high, low and no-change in expression)
• A’s expression level depends upon regulator B
and C’s expression level
• B and C mutually regulate each other
• Let XAt denote the random variable representing
the expression level of gene A at time t
DBN for a three node network
DBN
The collapsed network
Specifying the parameters of the DBN for a
three node network
Each of these conditional
distributions will specify the
distribution over {H,L, NC}
given the state of the parent
variable
DBN
Specifying the parameters of the DBN
H
L
NC
H
0.5
0.1
0.4
L
0.4
0.4
0.2
NC
0.25 0.25 0.5
Specifying the parameters of the DBN
H
L
NC
H
H
0.8
0.1
0.1
H
L
0.2
0.2
0.6
H
NC
0.2
0.1
0.7
L
H
0.2
0.2
0.6
L
L
0.1
0.8
0.1
L
NC
0.1
0.3
0.6
NC
H
0.2
0.1
0.7
NC
L
0.05
0.2
0.75
NC
NC
0.1
0.1
0.8
Parameter estimation: Estimating these numbers
Assume the following CPDs for three variables
H
L
NC
H
0.5
0.1
0.4
L
0.4
0.4
0.2
NC
0.25
0.25
0.5
H
L
NC
H
0.5
0.1
0.4
L
0.4
0.4
0.2
NC
0.25
0.25
0.5
B
C
H
L
NC
H
H
0.8
0.1
0.1
H
L
0.2
0.2
0.6
H
NC
0.2
0.1
0.7
L
H
0.2
0.2
0.6
L
L
0.1
0.8
0.1
L
NC
0.1
0.3
0.6
NC
H
0.2
0.1
0.7
NC
L
0.05
0.2
0.75
NC
NC
0.1
0.1
0.8
Computing the probability distribution of an
observation
Suppose we are given a new observation time course
T=0
T=1
T=2
NC
L
H
NC
H
H
NC
NC
H
Assume, P(NC)=0.5 and P(H)=P(L)=0.25 for all variables at T=0.
Using the DBN from the previous slides, what is the probability of this time course?
First we plug in the formula at the time point level
Next, we look at the graph structure of the DBN to further decompose these terms
Computing the probability distribution of an
observation
T=0
T=1
T=2
NC
L
H
NC
H
H
NC
NC
H
Assume
P(NC)=0.5 and
P(H)=P(L)=0.25 at
T=0.
Graph structure of the DBN to further decompose these terms
Parameter estimation in DBNs
• Parameter estimation approach would differ
depending upon the form of the CPD
• Assume that the variables are discrete, then
we need to estimate the entries of the CPD
distribution
Parameter estimation example for three node
DBN
Need to estimate this table
H
H
L
NC
Suppose we had a training time course:
To compute these probabilities,
we need to look at the joint
assignments of {XBt+1,XCt} for all
0≤t≤4
T=0
T=1
T=2
T=3
T=4
NC
L
L
L
NC
NC
H
H
NC
L
NC
NC
H
H
L
What is P(XBt+1=H|XCt=L)?
What is P(XBt+1=NC|XCt=L)?
L
NC
Structure learning in DBNs
• We need to learn the dependency structure
between two consecutive time points
• We may also want to learn within time point
connectivity
• Structure search learning algorithms used for
BNs, can be used with a simple extension:
– parents of a node can come from the previous or
current time step.
DBN with score-based search
• Score of a DBN is a function of the data
likelihood
Data: Collection of time
courses
Graph prior: This can be uniform,
or can encode some
form of model complexity
Goals for today
•
•
•
•
•
Dynamic Bayesian Network
How does a DBN differ from a static BN?
Learning a Dynamic Bayesian Network
DBN with structure priors
Application of DBN to a phospho-proteomics
time course
– Evaluation and insights
Bayesian Inference of Signaling Network
Topology in a Cancer Cell Line (Hill et al 2012)
• Protein signaling networks are important for many
cellular diseases
– The networks can differ between normal and disease cell
types
• But the structure of the network remains incomplete
• Temporal activity of interesting proteins can be
measured over time, that can be used infer the
network structure
• Build on prior knowledge of signaling networks to learn
a better, predictive network
• BNs are limiting because they do not model time
Applying DBNs to infer signaling network topology
Hill et al., Bioinformatics 2012
Application of DBNs to signaling networks
• Dataset description
– Phospho-protein levels of 20 proteins
– Eight time points
– Four growth conditions
• Use known signaling network as a graph prior
• Estimate CPDs as conditional regularized
Gaussians
• Assume a first-order Markov model
– Xt depends on on Xt-1
Integrating prior signaling network into the
DBN
• A Bayesian approach to graph learning
Data likelihood
Graph prior
• Graph prior is encoded as (Following Mukherjee & Speed
2008)
Prior strength
Graph features
• Where f(G)=-|E(G)\E*| is defined as the number of edges in
the graph G, E(G), that are not in the prior set E*
• This prior does not promote new edges, but penalizes
edges that are not in the prior
Calculating posterior probabilities of edges
• For each edge e, we need to calculate
• Although this is intractable in general, this work makes
some assumptions
– Allow edges only forward in time
• The learning problem decomposes to smaller per-variable
problems that can be solved by variable selection
– Assume P(G) factorizes over individual edges
– To compute the posterior probability, the sum goes over all
possible parent sets
• Assume a node can have no more than dmax parents
Results on simulated data
20 variables, 4 time-courses
8 time points
Prior network had 54 extra
edges and did not have 10 of
the ground truth edges
Results are not sensitive to prior values
Sensitivity to choice of hyper parameter
Sensitivity to noisy prior graph
Inferred signaling network using a DBN
Prior also had self-loops that are
not shown
Inferred signaling network
Prior network
gur eS3: Network prior. Existing biology iscaptured and integrated during modeling using aprior probability
tribution on graphs P (G) ∝ exp(λf (G)), with f (G) = − |E (G)\ E ∗ | where E (G) is the set of edges
ntained in G and E ∗ is a set of a priori expected edges. The graph shows edge set E ∗ . Edges represent
eractions through time. Each node also has a self-loop edge (i.e. an edge starting and finishing at the same
de, these are not displayed). The edge set includes expected indirect edges which operate via components
included in our study.
Using the DBN to make predictions
• Although many edges were expected, several edges
were unexpected
• Select novel edges based on posterior probability and
test them based on inhibitors
• For example, if an edge was observed from X to Y,
inhibition of X should affect the value of Y, if X is a
regulator of Y
• Example edges tested
– MAPKp to STAT3p(S727) with high probability (0.98)
• Apply MEKi, which is an inhibitor of MAPK, and measure MAPKp
and STAT3p post inhibition
– AKTp to p70S6Kp, AKTp to MEKp and AKTp to cJUNp
Experimental validation of links
Add MAPK inhibitor and measure MAPK and STAT3
MAPK is significantly inhibited (P-value 5X10-4)
STAT3 is also inhibited (P-value 3.3X10-4)
Their success is measured by the difference in the levels of the targets
as a function of the levels of the inhibitors
Take away points
• Network dynamics can be defined in multiple ways
• We have seen two ways to capture network dynamics
• Skeleton network-based approaches
– The universe of networks is fixed
– Nodes become on or off
– No assumption or model of how the network changes over time
• Dynamic Bayesian network
– A type of probabilistic graphical model
– Describes how the system transitions from one state to another
– Assumes that the dependency between t-1 and t is the same for all
time points
• Application to cancer signaling data
– DBNs are powerful for capturing the dynamics
– However, the prior was important to learn an accurate network
References
•
•
N. Friedman, K. Murphy, and S. Russell, "Learning the structure of dynamic probabilistic
networks," in Proceedings of the Fourteenth Conference on Uncertainty in Artificial
Intelligence, ser. UAI'98. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1998,
pp. 139-147. [Online]. Available: http://portal.acm.org/citation.cfm?id=2074111
S. M. Hill, Y. Lu, J. Molina, L. M. Heiser, P. T. Spellman, T. P. Speed, J. W. Gray, G. B. Mills, and
S. Mukherjee, "Bayesian inference of signaling network topology in a cancer cell line."
Bioinformatics (Oxford, England), vol. 28, no. 21, pp. 2804-2810, Nov. 2012. [Online].
Available: http://dx.doi.org/10.1093/bioinformatics/bts514