Motif Mining from Gene Regulatory Networks

Download Report

Transcript Motif Mining from Gene Regulatory Networks

Motif Mining from Gene
Regulatory Networks
Based on the publications of Uri Alon’s
group
…presented by Pavlos Pavlidis
Tartu University, December 2005
Gene Regulatory Networks
• From Wikipedia
Gene regulatory network is a collection of DNA
segments in a cell which interact with each other and
with other substances in the cell, thereby governing the
rates at which genes in the network are transcribed into
mRNA
• From DOE
Gene regulatory networks (GRNs) are the on-off
switches and rheostats…dynamically orchestrate the
level of expression for each gene….
Why networks can regulate Gene
Expression?
• U. Alon and his group, stresses the
importance of the building blocks of the
network.
• These building blocks are called motifs
Motifs
• They are called also n-node subgraphs
in a directed graph
(The work has also been extended for
undirected graphs)
• They are characterized from the number n
of the nodes and the relations between
them – directed edges
The 13 different 3-node subgraphs
Feed Forward Loop
It regulates
rapidly the
production of Z
In what motifs they are interested
• Not in biologically significant
– They don’t know a priori if a motif is
biologically significant
• They can calculate statistical significance
– The probability that a randomized
network contains the same number or more
instances of a particular motif must be smaller
than P. Here P is 0.01.
Randomized Network
• A randomized network is not completely
randomized.
It has some properties:
• The same number of nodes as in the real
network
• For each node the number of the
incoming and outgoing edges equals to
the real network.
Operon 1
Operon 2
Operon 3
Operon 4
Operon 5
Operon 6
Operon 7
Operon 8
Operon 9
Operon 10
Operon 11
Operon 12
Operon 13
Operon 14
Operon 15
Operon 16
Operon 17
Operon 18
Operon 1 Operon 2 Operon 3 Operon 4 Operon 5 Operon 6 …
0
0
1
0
0
0
1
0
0
1
0
0
Mij:
1 if the j operon produces a TF
which ragulates operon i
1
operon 2 regulates
operon 11
Representation of the network as a matrix M
Randomization: Select randomly two cells which are 1 e.g A(1,3), B(2,1).
If A’(1, 1) and B’(2, 3) are 0 then swap
Goal : The randomized network must have the same sum in columns
and in rows
Columns: The number of outgoing edges
Rows: The number of incoming edges
One more requirement:
If we are looking for n-node subgraphs, then the number of n-1 node
subgraphs must be the same in real and randomized networks
This is done to avoid assigning high significance to a structure only
because of the fact that it includes a highly significant substructure.
Significance of a motif
• Three requirements
– P < 0.01
P was estimated (or bounded) by using 1000
randomized networks.
– The number of times it appears in the real network
with distinct sets of nodes is at least U = 4.
– The number of appearances in the real network is
significantly larger than in the randomized networks:
Nreal – Nrand > 0.1Nrand (Why??).
What did they find
• That in biological systems as in E.coli or in
S.cerevisiae only some certain types of
motifs are statistically important.
• When they studied other systems such as:
Food webs. The database of seven ecosystem food webs
Neuronal networks: the neural system of C.elegans
WWW
OTHER KIND OF MOTIFS WHERE STATISTICALLY IMPORTANT
FFL
SIM
DOR
FFL
• Biological Example
– the L-arabinose utilization system:
– Crp is the general transcription factor and
AraC the specific transcription factor.
The real model
FFL
• Coherent
• Incoherent
• Important for the speed of response
Software
mDraw
Network visualization tool
(mfinder and network motifs visualization tool
embedded)