Department of Physics Prof. Ferdinando Bersani group

Download Report

Transcript Department of Physics Prof. Ferdinando Bersani group

Gene expression Network dynamics: from
microarray data to gene-gene connectivity
reconstruction. Reconstruction of
c-MYC proto-oncogene regulated genetic
network
G. C.Castellani, D.Remondini, N.Intrator, B. O’Connell, JM Sedivy
Centro L.Galvani Biofisica Bioinformatica e Biocomplessità
Università Bologna and Physics Department Bologna
Institute for Brain and Neural System Brown University Providence
RI
Gene expression Network dynamics: from microarray
data to gene-gene connectivity reconstruction.
Reconstruction of
c-MYC proto-oncogene regulated genetic network
• Gene significance
• Temporal structure
• Gene clustering
• Model validation
Complex Network Theory and its application
to cellular networks
Complex Network theory is a rapidly growing field of
contemporary interdisciplinary research.
The applications ranges from Mathematics to Physics to Biology.
The classical mathematical theory has been developed (1957-1960)
by Erdos and Reny :
Random Graph .
Some Physical problems that are related to this approach are:
Percolation, Bose-Einstein Condensation and the Simon problem.
Recent application to Biology are focused on
Neural Network,Immune Network Protein Folding, Proteomic and
Genomics mainly on the large scale organization of Biological Netwo
One of the most recent theories that has been shown to have
promising applications in the Biological Sciences is the so called
Theory of Complex Networks that have been applied to protein-protein
interaction and to metabolic network (Jeong and Barabasi)
Classical Random Graphs
A Random Graphs is a set of nodes and edges connecting them.
The number of edges and their nodes attachment are chosen
Randomly with a certain probability p.
It has been demonstrated that there exists a critical probability pc fo
appearance of a giant cluster (phase transition) pc ~N-1.
Another Erdos Reny result is that the degree connectivity distributio
(the number of edges of each node) follows the Poisson statistics
Extension to Random Graph Theory
During the last years considerable efforts have been done to
further analyze the statistics of Random Graphs.
The major results are summarized by the so called “Small World”
and “Scale free” graphs
The “Small World” graphs interpolates between regular lattice and
Random graphs.
The “Scale Free” network are created by two simple rules:
Network growth and Preferential Attachment (the most connected
Nodes are the most probable sites of attachment)
Both models gives a non Poisson degree distribution: Power Law
P(k )  k 
P(k )  (k  k0 ) e

( k  k0 )
kc
Moreover, this type of distributions was observed in real networks
such as
Internet, C.Elegans Brain, Methabolic Network with 2< < 3
exponent and
various values for the exponential cutoff kc and k0
Inadeguacy of complete connectivity
The complete connectivity as well the random connectivity are not
biologically plausible. Connectivity changes as consequence to
developemental changes (ie learning, ageing) appear most appropriate
Comparison between experimental and theoretical resultson the number of
virgin cells during the lifespan.The number of stable states (that we identify
with memory capacity and with memory cells) increases as a function of age.
We found similar results (increase of number of stable states by connectivity
changes) also for the BCM model, but the biological interpretation is less clear
The John Sedivy Lab at Brown University has designed a new generation of microarrays that
cover approximately one half of the whole rat genome (roughly 9000 genes).
The array construction aims at obtaining a precise targeting of the proto-oncogene c-MYC.
This gene encodes for a transcriptional regulator that is correlated with a wide array of
human malignancies, cellular growth and cell cycle progression.
The data base is organized in 81 array obtained by hybridisation with a cell line of rat fibroblats.
These gene expression measurements were performed in triplicate for a better statistical
significance. The complete data set is divided into three separate experiments; each of which
addresses a specific problem;.
Experiment 1: Comparison of different cell lines where c-myc is expressed at various degrees (
null, moderate, over-expressed). This experiment can reveal the total number of genes that
respond to a sustained loss of c-Myc as well as those genes that respond to c-MYC overexpression.
Experiment 2: Analysis of those cell-lines that over-express c-Myc following stimulation with
Tamoxifen (a drug that has been used to treat both advanced and early stage breast cancer). This
data was collected during a 16 our time course. This experiment reveals the kinetics of the
response to Myc activation and may lead to the identification of the early- responding genes.
Experiment 3: Analysis of the time course of induction with Tamoxifen when it was performed in the
presence of Cycloheximide (a protein synthesis inhibitor). This experiment reveals a subset of
direct transcriptional targets of c-Myc.
Our approach to the determination of the C-MYC
regulated network can be summarized in 3 points
1) List of genes based on significance analysis over time
points between MYC and control and within time point
(between groups and within groups (time)).
2) Time translation matrix calculated on microarray treated
with Tamoxifen and not treated
- T and NT raw data
The resulting time translation matrix will be used to
reconstruct the connectivity matrix between genes
4) Model validation for determination of the error model
Significance Analysis
xNT (i)  xT (i)
d (i)between 
s(i)  s0
s (i ) 

2
NT
(i )
n1

 (i )
2
T
n1
xT / NT (t  1)  xT / NT (t )
 T2 / NT (t  1)  T2 / NT (t )
d (i) within 
s (i ) 

n1
n1
s(i)  s0
S0 is an appropriate regularizing factor.
Interesting genes are chosen as the union between the
genes selected with the above methods
• With this SA we obtain 776 significative genes (p<0.05)
if we require significance on 1 time point
Step 2: Linear “Markov” Model
The selected genes are used for the step 2 of our analysis:
x(t )  Ax(t  1)
The x(t) are the gene expressions at time t and A is the unknown
matrix that we estimate from time course (0,2,4,8,16) of
microarray data (T and NT separately, An and At).
This is a so called inverse problem because the matrix is
recovered from time dependent data.
-> From appropriate thresholding on A’s we can recover the
connectivity matrix between the genes.
Network topology
No Tamoxifen
With Tamoxifen
Model validation
The different models (data preprocessing, modeling of gene
dynamics, clustering techniques) have been validated
mathematically by means of
- residues analysis (errors)
The residual are small and we have used a Markov matrix that
is not the original (computed over 5 time steps) but the validated one.
We compute the matrix on 4 time step and the validation
is on the subsequent by comparison with the real data.
Changing databases
In order to have a better understanding of the results, both in terms
of network topology and connectivity distribution, we generated 2
databases:
1) One small database with those genes that were without any doubt
affected by Tamoxifen (50 genes)
2) One larger database with all the genes that give 2 P on 3
experiments i.e. those genes for which we have good measurements
(3444 genes)
50 genes database
NT
T
Results
For each of the 50 genes, we computed the connectivity and the
clustering coefficient that express if the gene is connected to highly
connected or poorly connected genes.
It is possible to see that the treatment with Tamoxifen causes a
decrease in clustering in the network so it seems that the network
becomes “less scale free”. This is confirmed by the network
clustering coefficient:
N Overall graph clustering coefficient: 0.840
T Overall graph clustering coefficient: 0.241
The individual connectivity and clustering changes are summarized
in this table: Table
The 3444 genes database
This large database is used in order to have a better statistics and
possibly a distribution fit
N
T
Clearly these distributions are not Poisson and seem to be
Power law with exponential tail
Fitting the distributions
We fitted the distribution with a generalized power-law :
P(k )  (k  k0 ) e

( k  k0 )
kc
N
N
-
T
H L @D
H L @D
2.50398 - 0.401309 + x + 4.80736 Log - 0.401309 + x
k0  0.4 kc  4.8   2.5
T
-
2.34483 - 1.04328 + x + 3.66552 Log - 1.04328 + x
k0  1.04 kc  3.66   2.3
Network Structure (3444 genes)
N Overall graph clustering coefficient: 0.902
T Overall graph clustering coefficient: 0.893
From this results and from the fit parameters it seems that the NNetwork is less scale free, but these results are strongly affected by
noise
We have looked at the individual connectivity and clustering coefficien
their variation between N and T.
The results are encouraging: between those genes that have changed
connectivity in a significant way there are C-MYC targets
Network Structure (3444 genes)
As an example we report some connectivity change in C-Myc target
genes
2379 rc_AI178135_at complement component 1, q subcomponent
binding protein 3 272
2796 U09256_at
transketolase 13 39
2772 U02553cds_s_at protein tyrosine phosphatase,
non-receptor type 16 133 146
390
D10853_at
Amidotransferase
0
phosphoribosyl pyrophosphate
7
933 M58040_at transferrin receptor 1 27
Conclusions
The connectivity is a very important parameter both for Physical and
Biological systems. Connectivity (coupling) changes are the basis for
Phase Transitions and developmental changes (ageing, learning and
response to external stimuli)
We have tested the hyphothesis that a treatment with Tamoxifen that
in these engineered cells lead to C-MYC activation can be related to
connectivity changes between genes
Our results show that within the framework of scale free network
there are changes in gene-gene connectivity.
The connectivity distributions of N and T are far from Poisson with
parameters that are similar to those founded for other systems that
account for scale free distribution with exponential tail.
Conclusions
One clear result is that the global gene degree connectivity follow a
power law distribution both without and with Tamoxifen.This result
seems to point out that this type of behaviour is very general
If we look for the individual gene connectivity or if we look in smaller
database we observe that there are significant changes induced by
the treatment. As example the clustering coefficient changes and
some C-MYC target shows connectivity and clustering coefficient
changes
These results need to be confirmed and further analyzed, but, at our
knowledge this is the first attempt to monitor the network
connectivity changes induced by C-MYC activation in comparison
with a basal level
Conclusions
The MARKOV approach for the gene-gene connectivity
reconstruction is not new (Maritan 2001) but we have introduced
matrix validation, rigorous data discretization and normalization that
can improve the model robustness
Some points that need further analysis are the correlation between
connectivity change and C-MYC target, our method is not a
significance test it can only help to look gene activity as result of
interactions between genes at the previous time step
Finally we will further improve the model robustness by time
reshuffling and try to test its predictive performances