NETWORK ANALYSIS: A FOCUS ON GENETIC NETWORKS

Download Report

Transcript NETWORK ANALYSIS: A FOCUS ON GENETIC NETWORKS

Pasquale Lucio Scandizzo, Alessandra Imperiali
Paper prepared for presentation at the 16th ICABR Conference – 128th EAAE Seminar
“The Political Economy of the Bioeconomy:
Biotechnology and Biofuel”
Ravello, Italy, June 24-27, 2012
25/06/12
Outline
Introduction
Literature and review
Data and variables
Empirical analysis
Concluding remarks
25/06/12
Introduction
Aim of this paper
• Investigate the characteristics of genetic networks.
• Account for the role of desirable traits for modern
agriculture.
• Explore the implication of a model, where the desirable
traits depend not only on the properties of the individual
genes, but also on their connections and the architecture
of the network
25/06/12
Introduction
Key Questions
• Is biotechnology research in agriculture making
substantial progress along new lines?
• Are there new paradigms for research in the sector?
• What is the role of network theory ?
• Does it promise to yield revolutionary results?
25/06/12
Introduction
Results’ overview
• Part 1
• Consistent with empirical research on large networks, our findings point
to a scale-free relationship between the number of links among genes
and the number of genes, with a significant and proportional or more than
proportional increase in the percentage of links in response to a
percentage increase in nodes inside each co-expression network.
• This relationship can be interpreted as the result of information
exchanges, i.e. as a relationship between the information contained and
the information exchanged by the genes.
• Part 2
• Our findings point to a positive, but less than proportional, scale free
relationship, between the number of co-expressed genes and the
number of genes inside each coexpression network.
25/06/12
Literature and review
An overview of a Rice coexpression network with 4,495 genes and 32,544 edges
(Pearson’s correlation r ≥ 0.93)
Source:
A.Fukushima et al.,
“Characterizing gene
coexpression modules
in Oryza sativa based
on a graph-clustering
approach”, 2009.
25/06/12
Literature and review
From multiplicity of characters to co-expression
In the past 15 years, an intense research activity has been directed towards
biological Networks, where the Network Theory finds its natural application.
These studies have produced remarkable progress in understanding the topological
and chemical structures of the genes, and promise to make spectacular
improvements to agricultural crops.
Today genes can be modified and recombined into the cells of living organisms to
improve crop productivity or to make crops more resistant to stress, diseases and
chemical treatments (Steven D. Tanksley and Susan R. McCouch, 1997).
This new technique is known as recombinant DNA technology, and has allowed
scientists to carry out procedures using genes and DNA that are extremely
advanced and innovative.
25/06/12
Literature and review
From multiplicity of characters to co-expression
• Recombinant DNA (rDNA) consists of DNA sequences resulting from laboratory
methods that bring together genetic material from multiple sources.
• Scientists succeded in isolating genes responsible for main adaptive and
improvement traits and were able to determine their chemical structure, together
with their function.
• This acquired knowledge was then used to develop the potential of our wild and
cultivated germ-plasm resources for improving agricultural crops.
• After spending decades to disassemble nature, and having provided a wealth of
knowledge about the individual components, scientists developed a theory of
complexity where nothing happens in isolation and most of the characteristics of
living being derive from the interactions among their constituents.
25/06/12
Literature and review
From multiplicity of characters to co-expression
Understanding and unraveling the interactions and the orchestrated activity of many
interacting components constitutes a major goal for biologists of the genome era.
The network approaches are used to integrate various types of genomics data in order to
increase the reliability of predicted interactions.
• One increasingly important method used to identify interacting gene sets is represented by
the construction of gene co-expression networks where traits are the result of cooperative
expression (co-expression) of genes, organized according to the topology of networks.
• Co-expression networks includes genes involved in related biological pathways, which are
expressed cooperatively for their functions.
• It is constructed by determining the tendency of m transcripts to exhibit similar expression
patterns across a set of n microarrays.(P. Ficklin and F. Alex Feltus, 2011)
25/06/12
Literature and review
Gene co-expression network
• In gene co-expression networks, each gene corresponds to a node.
• Two genes are connected by an edge if their expression values are highly
correlated.
• Definition of “high” correlation is somewhat tricky
– One can use statistical significance…
– Alternatively, one can use a threshold parameter: scale free topology
criterion.
25/06/12
Literature and review
From multiplicity of characters to co-expression
By exploring several large databases describing the topology of large networks,
Albert and Barabasi (1999) found that the degree distribution follows a powerlaw for large k:
-g
P(K) » K
Where K stands for the average degree of a node i, which represents the
number of edges incident with the node. P stands for the probability that a
node chosen uniformly at random has degree k. The value of the exponent
varies between 2 and 3.
Following this approach, the literature indicates that the intricate interwoven
relationships that govern cellular functions follow a universal law. They are
“scale-free, modular, hierarchical, small worlds of short paths and their
connections are highly clustered” (Albert-Laszlo Barabasi and Zoltan N.
Oltvai, 2004).
25/06/12
Literature and review
Distribution of connections per node in the coexpression network
Source:
V. van Noort et al,
“The yeast coexpression
network has a smallworld, scale-free
architecture and can be
explained by a simple
model, 2004.”
25/06/12
Data and variables
Meta Analysis
• In the paper we discuss the recombinant DNA techniques and their
application to agricultural crops, focusing our attention on the regulatory
mechanism in gene interactions.
• We aim to study the interactions underlying expressed traits, using
three crops species: Arabidopsis thaliana, maize (Zea mays) and rice
(Oryza sativa)
• We selected 57 studies aimed to identify the gene co-expression
networks among these crops by examining the co-expression patterns of
genes over a large number of experimental conditions.
• We collected both the data and the results presented by the studies on
101 networks.
25/06/12
Empirical analysis
Estimates
Aim: Identify the correspondence between the underlying genes and the observed traits.
• We analyzed the role of the number of genes on two different characteristics:
The number of edges (L) and the number of coexpressed genes (C).
• Using the data assembled from the studies, we estimated two different relations by
means of ordinary least squares (OLS):
Li   0  1Ni   2 X i'  ei
Ci   0  1N1   2 X i'  vi
25/06/12
Empirical analysis
Variables used for our estimates
The number of genes
The number of links
The number of co-expressed genes
Dummy Arabidopsis
Dummy seed development
Dummy photosynthesis
Dummy metabolic pathways
Dummy signaling pathways
Dummy stress response
Dummy biosynthesis
25/06/12
Empirical analysis
The Estimates
Dep. Variable
Number of Links
(1)
(2)
(3)
Nodes
1.13***
(7.65)
0.99***
(7.35)
1.37***
(6.26)
Dummy Arabidopsis
0.99*
(1.75)
-1.73*
(-1.83)
Dummy Biosynthesis
Dummy Metabolic Pathways
Dummy Photosynthesis
2.30**
(3.43)
Dummy Seed Development
1.57*
(1.82)
-0.41
(-0.40)
Dummy Signaling Pathways
1.48**
(2.23)
Dummy Stress Response
0.16
(-1.27)
N
41
41
19
R-squared
0.61
0.69
0.82
25/06/12
Empirical analysis
The Estimates
Dep. Variable:
N. of coexpressed genes
(1)
(2)
(3)
Nodes
0.60***
(5.44)
0.79***
(10.72)
0.65***
(5.82)
Dummy Arabidopsis
0.33
(0.85)
Dummy Biosynthesis
-0.83*
(-1.73)
Dummy Metabolic Pathways
-0.75**
(-2.17)
-0.53**
(-2.81)
Dummy Photosynthesis
-0.30
(-0.60)
Dummy Seed Development
Dummy Signaling Pathways
-1.24***
(-3.089
Dummy Stress Response
-1.25***
(-3.35)
-0.90***
(-3.42)
-0.83*
(-1.73)
N
54
101
54
R-squared
0.56
0.61
0.55
25/06/12
Concluding remarks
Conclusion
• Our Analysis confirms the existence of a scale-free relationship which has been found
ubiquitous in complex networks.
• We found also that as the number of genes increase inside the biological networks
considered, the number of co-expressed genes increase, less than proportionally.
• This finding suggests that a strategy of research aimed to identify relevant clusters of co-
expression may be more successful than one aimed at identifying single traits or groups of
traits and corresponding gene determinants.
• Althout this conclusion seems to hold for stress response, metabolic pathways and
biosynthesis, which have a lower influence on the number of edges, the intensity of seed
development appears to be associated instead with an increase in the connectivity of the
network.