File - Department of Computer Science

Download Report

Transcript File - Department of Computer Science

Attractor Detection
and
Control of Boolean Networks
Tatsuya Akutsu
Bioinformatics Center
Institute for Chemical Research
Kyoto University
Contents
Boolean Network
 Attractor Detection



Definition and Algorithms
Control of Boolean Network

Definition and DP algorithm
Integer Programming-based Approach
 PBN and its Control
 Conclusion

Acknowledgment





Tamura Takeyuki, Morihiro Hayashida [Kyoto U.]
Masaki Yamamoto [Kwansei Gakuin U.]
Wai-Ki Ching, Shuqin Zhang, Xi Chen [U. Hong
Kong]
Michael Ng [Hong Kong Baptist U.]
Avraham A. Melkman [Ben-Gurion University of
the Negev]
Boolean Network
Boolean Network


Mathematical model of genetic networks
node⇔gene


Regulation rules



State of node: 1 (active) / 0 (inactive)
Boolean function (AND, OR, NOT …)
Edge from y to x ⇔ y directly controls x
Synchronized update

Almost the same as digital circuits (with
clocks)
[Kauffman, The Origin of Order, 1993]
Example of Boolean Network
Boolean Network
State Transition Table
A
B
C
A’ = B
B’ = A and C
C’ = not A
time t
time t+1
A B C
A’ B’ C’
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
INPUT
0
0
0
0
0
1
0
1
1
1
1
1
0
0
0
0
OUTPUT
Example of state transition:
111 ⇒ 110 ⇒ 100 ⇒ 000 ⇒ 001 ⇒ 001 ⇒ 001 ⇒ 。。。
Why Boolean Networks ?

Criticism that BN is too simplified

Unless simplified, difficult for theoretical analysis,
inference, and control



Maybe useful for qualitative analyses
One of most simple non-linear models


though complex models can be used for simulation
Negative results on BN suggest negative results on
more general (non-linear) models
Almost the same as digital circuits

Theories and techniques in computer science can
be utilized
Our Focus: Time Complexity

Many problems for BN are NP-hard



NP-hard means that there is no polynomial time
algorithm (unless P=NP)
It will take O(2n) time or more if we use
naïve methods
But, we want to solve much better

Because we can solve the cases of



n=300 for O(1.1n)
n=600 for O(1.05n)
Important for coping with large-scale
networks
Attractor Detection
Attractor (1)



Steady state
Different attractors ⇔
Different cell types
Example


011 ⇒ 101 ⇒ 010 ⇒
101 ⇒ 010 ⇒…
111 ⇒ 110 ⇒ 100 ⇒
000 ⇒ 001 ⇒ 001 ⇒
001 ⇒ …
State Transition Table
time t
time t+1
A B C
A’ B’ C’
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
INPUT
0
0
0
0
0
1
0
1
1
1
1
1
0
0
0
0
OUTPUT
Attractor (2)
time t
time t+1
A B C
A’ B’ C’
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
INPUT
0
0
0
0
0
1
0
1
1
1
1
1
0
0
0
0
OUTPUT
111
011
110
100
000
001
101
010
N-K Model (Kauffman Network)


N: Number of nodes (We use n instead of N)
K: Indegree



Indegree = the number of input edges = the number
of genes directly affecting node v
Each node has (maximum or average) indegree K
Boolean function assigned to each node is randomly
selected
indegree=2
v
indegree=3
v
Distribution of Attractors in N-K Model

Classical conjecture


O( n )
Some results suggest that this conjecture may
not be true



The number of attractors is
Superpolynomial growth ( > nγ for any γ) of the number
of attractors
(Samuelsson & Troein, PRL, 2003)
Superpolynomial growth of the average size of attractors
(Drossel et al., PRL, 2005)
No conclusive result is known
Singleton Attractor (or Point Attractor)


Biological interpretation of attractors
 Different attractors ⇔ Different cell types
Point attractor
 Attractor with period 1
 Corresponding to a steady state
 Definition: v(t )  (v1 (t ), , vn (t ))
satisfying
v (t  1)  v (t )

(or,
v (1)  v (0) )
Attractor Detection
 Input: Boolean Network
 Output: Point Attractor (if any)
Previous Works and Our Works

Around

O(2n ) time is enough
since there are 2n global states
Several heuristics, but no theoretical guarantee
[Irons, Pysica D, 2006], [Devloo et al., Bull. Math. Biol. 2003], …

Detection of a singleton attractor is NP-hard
[Akutsu et al., GIW 1998]

We developed algorithms with average case
theoretical bounds
[Zhang et al., EURASIP JBSB 2007]

We developed algorithms for singleton attractor detection

O(1.587 n ) time algorithm for AND-OR BNs
[Melkman, Tamura & Akutsu, 2010]

O(1.799 n ) time algorithm for nested canalyzing BNs
[Akutsu, Melkman, Tamura & Yamamoto, 2011]
Reduction from BN-ATTRACTOR to SAT

Detection of Singleton Attractor with Max.
Indegree K  (K+1)-SAT (Boolean SATisfiability problem)
vj
vk
vi
Basic Idea of Our Algorithms
Assigning x=0 eliminates three nodes
 Assigning x=1 eliminates two nodes
n
f
(
n
)

f
(
n

2
)

f
(
n

3
)
f
(
n
)

1
.
325
⇒
⇒

⇒ need additional work using SAT ⇒ O(1.587 n )
y
z
OR
0
0
OR
OR
0
1
x
OR
OR
w
OR
1
Summary of Attractor Detection Algorithms
Singleton Attractors
K=2
K=3
Recursive
(Ave.
Time)
O(1.19n)
O(1.27n)
SAT based
(detection)
O(1.323n)
O(1.474n)
Our
algorithms
(detection)
O((1.323-δ)n)
(δ=0.00004)
Cyclic Attractors
AND/OR of
literals
(any K)
Canalyzing
(any K)
AND/OR of literals
(Planar, any K)
N/A
N/A
N/A
O(1.587n) O(1.799n)
O((1+ε)n)
(Recursive, Average Case)
K=2
K=3
K=4
K=5
period=2
O(1.57n)
O(1.70n)
O(1.78n)
O(1.83n)
period=3
O(1.72n)
O(1.86n)
O(1.92n)
O(1.95n)
Control of Boolean Network
Control Theory for Biological Systems

One of the main targets of Systems Biology



Though control theory is well established for linear
systems, biological systems have non-linear
components
May lead to new drugs and treatment methods
Introduction of 4 genes turns normal cells into
induced pluripotent stem cells (iPS cells)
制御
Control
がん細胞
Cancer
Cell
正常細胞
Normal
Cell
Definition of BN-Control


Input

Internal nodes: v1 ,…, vn

Initial state:
v0
External nodes: u1 ,…, um
Desired state:
vM
BN
Output

Sequence of states of external nodes: u(0), u(1), …, u(M)

v(0)=v0, v(M)=vM
(leading to the desired state at time M)
[Akutsu et al.,
J. Theo. Biol. 2007]
BN-Control: Related Works

Datta et al. defined a problem of control of PBN
(Probabilistic Extension of BN) and proposed a dynamic
programming based method [Machine Learning, 52:169-191, 2003]


They also proposed various extensions
But, their method must handle 2n×2n matrices

BN-Control (also PBN-Control) is NP-hard
BN-Control can be solved in polynomial time if the
network has a tree structure
[Akutsu et al., JTB 2007]

Practical approach based on Model Checking/SAT

[Langmund & Jha, APBC 2008, JBCB 2009]

Theoretical studies using Semi-Tensor Product
[Cheng, 2009, 2010, …]
Dynamic Programming for Control of BN


BN version of the algorithm by Datta et al.
DP table: D[b1 , b2 ,, bn , t ]


takes 1 if there is a control seq. leading to the
target state
can be computed by
1, if [b1 ,, bn ]  v M
D[b1 , b2 ,, bn , M ]  
0, otherwise
1, if there is (c, x) such that

D[b1 , b2 ,  , bn , t  1]  
D[c1, ,cn ,t ]  1 and c  f (b, x)
 0, otherwise
Illustration of DP Algorithm
D[1,1,1, 2] =1
D[0,0,0, 2] = 0
D[b1 , b2 ,  , bn , t  1] 
1, if there is (c, x) such that


D[c1, ,cn ,t ]  1 and c  f (b, x)
 0, otherwise
u1=1, u2=1
D[0,1,1, 3] = 1
DP
Computation
But, the size of
DP table is
exponential
Integer Linear ProgrammingBased Approach
Integer Programming

Linear Programming (LP)


Maximize (or minimize) an objective linear function under
constraints of linear inequalities
Integer Linear Programming (ILP)



LP + constraints that specified variables must take integer value
Several efficient solvers: CPLEX, Gurobi
Used for solving various NP-hard problems
maxize
2x  3y
subject to
100 x  50 y  500
100 x  20 y  100
x  0,
y  0, x, y : integers
ILP Representation of Boolean Functions

Variables: either 0 or 1
(i.e., integer between 0 and 1)

zAND
 x AND y
z  x, z  y , z  x  y  1

z  x OR y
OR
z  x, z  y, z  x  y

NOT
z  NOT x
z 1  x
We applied this methodology to BN-control.
[Akutsu et al., IEEE CDC 2009]
Result on Attractor Detection



Data: randomly generated BNs

with cases of indegree=2 and indegree=3

n: #nodes
3GHz Xeon CPU + ILOG CPLEX
Result: quite fast if indegree=2
Result on BN-Control

Data: randomly generated BNs



with cases of indegree=2 and indegree=3
n: #internal nodes, m: #external nodes, M: #steps
Result: fast if indegree=2
but, not so fast if indegree=3
PBN and its Control
Probabilistic Boolean Network (PBN)
[Shmulevich et al., 2002]


Multiple control rules (boolean functions) for each
node
Control rule is selected randomly at each t
according to a given probability distribution



Almost equivalent to Dynamic Bayesian Network
Pros: Capable of noise. Can be modeled as Markov
process.
Cons:Not scalable since it takes O(2n) or more time for
almost all problems on PBN
B
A(t+1) = B(t) AND C(t) with Prob.=0.6
A
C
A(t+1) = B(t) OR (NOT C(t)) with Prob.=0.4
Example of PBN
State Transition Diagram
PBN
(only for half of nodes)
One of 4(=2×1×2) BNs is randomly selected at each time setp
BN vs. PBN


BN: 1 outgoing edge
PBN: multiple outgoing edges (with probabilities)
101
101
0.1
0.3 0.2 0.4
BN1 BN2 BN3
001
BN
001
011
101
PBN
BN4
110
PBN-CONTROL: Model




Probabilistic Boolean network (PBN, an extension of Boolean
network)
v(t )  (v1 (t ), , vn (t ))  {0,1}n
Global state at time t:
Probabilistic regulation rule is given as a 2n×2n matrix A
A can be controlled by m boolean variables u(t )  (u1 (t ), , um (t ))
Pr( v (t  1)  w | v(t )  x )  A(u(t )) w, x

Cost functions


Ct(v, u): cost for applying control u for global state v at time t
C(v):
cost for final global state v
[Datta et al., Machine Learning, 2003]
PBN-CONTROL: Problem and Algorithm

Problem:


Given initial state v(0), control rule A(u(t)), target time M ,
and cost functions,
Find a first control action u(0) minimizing
M


E C ( v( M ))   Ct ( v(t ), u(t )) 
t 0



s.t.
Pr( v (t  1) | v (t ))  A(u(t )) v ( t 1), v ( t )
Can be solved by dynamic programming
J M* ( v)  C ( v)
111


*
*
J t ( v)  min m Ct ( v, u)   A(u) v ,x  J t 1 (x)
u{0 ,1}
x 000


[Datta et al., Machine Learning, 2003]
Hardness Results



Control of BN is NP-complete [Akutsu et al., JTB 07]
Integer linear programming (ILP)-based
[Akutsu et al., IEEE CDC 09]
method for control of BN
p
Control of PBN is harder than NP (  2 -hard)
 Such technique as ILP, SAT cannot be
utilized
[Chen et al., BIBM 2010]
PSPACE
 2p
Control of BN
ILP
SAT
NP
?
Control of PBN
Conclusion
Conclusion

Boolean network



Attractor Detection/Enumeration



NP-hard
Integer Linear Programming-based Approach


NP-hard
Much better than naïve O(2n) bound for several cases
Control of Boolean Networks


A discrete model of a genetic network
Similar to digital circuits
Simple, Flexible for modifications/extensions
Control of Probabilistic Boolean Networks


p -hard ⇒ SAT or IP cannot be utilized
2
Future Work

Development of Non-trivial Algorithms for
 Periodic Attractor Detection
 In progress
 Control of Boolean Network
 Break O(2n) bound !
 Control of PBN
p
 How to cope with  2 -hardness

Development of Hybrid Model/Theory Combining
Boolean and Linear Models
Thank you !