Transcript Document

“An Extension of Weighted Gene
Co-Expression Network Analysis to
Include Signed Interactions”
Michael Mason
Department of Statistics, UCLA
Contents
• Here we consider the application of a
generalized WGCNA that keeps track of the
sign of the co-expression information.
• standard unsigned networks are based on
sij  cor ( xi , x j )
• Here we focus on signed networks based
cor( xi , x j )  1
on
sij 
2
General Framework Of Network
Construction
Step 1: Define a Gene Co-expression
Similarity
Step 2: Define a Family of Adjacency
Functions
Step3: Determine the AF Parameters
Step 4: Define a Measure of Node
Dissimilarity
Step 5: Identify Network Modules
(Clustering)
Step 5: Find Biologically Interesting
Modules
Step 6: Find Key Genes in Interesting
Modules
Adjacency Functions: Hard and Soft
Thresholding
• A network can be represented by an adjacency
matrix, A=[aij], that encodes how a pair of nodes is
connected.
– A is a symmetric matrix with entries in [0,1]
– For unweighted networks, hard thresholding is applied to
S to yield A. If sij > τ, aij = 1 else aij = 0.
– For weighted networks, soft thresholding is applied with
0 < aij < 1, and aij = sijβ.
– Both types of adjacency functions can be applied to
unsigned and signed co-expression similarity measures.
In this analysis we employ soft thresholding.
Defining a co-expression similarity measures
that keeps track of the sign
Unsigned networks are based on the
absolute value of the correlation.
sij  cor ( xi , x j )
Cor(xi,xj)
Signed networks preserve sign
information from the correlation
sij 
cor( xi , x j )  1
2
Cor(xi,xj)
Generalized Connectivity
• A gene’s connectivity (also known as degree) equals the row
sum of the adjacency matrix. Intuitively for unweighted
networks this is the number of direct neighbors a gene has.
• For our signed networks, the connectivity of the i-th gene
measures the extent of positive correlations with the other
genes in the network.
ki 

j
aij
For high powers of beta, signed weighted
networks exhibit approximate scale free
topology
• Scale Free Topology refers
to the frequency
distribution of the
connectivity k, P(k)~k-λ
• p(k)=proportion of nodes
that have connectivity k
400
300
200
100
0
Frequency
500
600
700
Frequency Distribution of Connectivity
0.000
0.005
0.010
0.015
0.020
Connectivity k
0.025
0.030
0.035
How to check Scale Free Topology?
• Idea: Log transformation p(k) and k and look at scatter plots
• Linear model fitting R2 index can be used to quantify scale free
topology
•In our cancer and mouse embryonic stem cell applications, we
find R2 = 0.97 and 0.94 for β= 12 and 22, respectively.
The scale free topology criterion for
choosing the parameter values of an
adjacency function.
A) CONSIDER ONLY THOSE PARAMETER VALUES THAT
RESULT IN APPROXIMATE SCALE FREE TOPOLOGY
B) SELECT THE PARAMETERS THAT RESULT IN THE
HIGHEST MEAN NUMBER OF CONNECTIONS
• Criterion A is motivated by the finding that most
metabolic networks (including gene co-expression
networks, protein-protein interaction networks and
cellular networks) have been found to exhibit a scale
free topology
• Criterion B leads to high power for detecting modules
(clusters of genes) and hub genes.
Trade-off between criterion A and
criterion B when varying the power β in
signed cancer network
Trade-off between criterion A and criterion B
when varying the power β in signed mouse
embryonic stem cell network
How to measure distance in a network?
• Biological Answer: look at shared neighbors with
the topological overlap matrix.
– Intuition: if 2 people share the same friends they are
close in a social network
– In an unsigned network negatively correlated genes
are treated as friends while in the signed network they
are treated as enemies.
– Two genes have high topological overlap if they share
(positively correlated) friends
Topological Overlap leads to a network
distance measure (Ravasz et al 2002)
a
iu auj
TOM ij 
aij
u
min(ki , k j )  1  aij
DistTOM ij  1  TOM ij
• Generalized in Zhang and Horvath (2005) to the
case of weighted networks.
SIMPLE TOM example
• In this simple example TOM1,2
reduces to a.
• If cor(x1, xu) and cor(xu, x2) = -1,
then in an unsigned network
TOM1,2 = 1, while in a signed
network TOM1,2 = 0.
Application: comparing Signed to
Unsigned Networks using brain
cancer data described in
Horvath S, Zhang B, Carlson M, Lu KV, Zhu S, Felciano RM, Laurance MF, Zhao W,
Shu, Q, Lee Y, Scheck AC, Liau LM, Wu H, Geschwind DH, Febbo PG, Kornblum
HI, Cloughesy TF, Nelson SF, Mischel PS (2006) "Analysis of Oncogenic Signaling
Networks in Glioblastoma Identifies ASPM as a Novel Molecular Target", PNAS |
November 14, 2006 | vol. 103 | no. 46 | 17402-17407
Preservation of Modules between Unsigned
and Signed Methods in Brain Cancer
Unsigned Network
Signed Network
Message: no difference between signed and unsigned analysis
Analysis of Networks in Mouse ESC
data described in Ivanova et al
Preservation of Large Modules between Unsigned
and Signed Methods in Mouse embryonic stem cells.
Signed network exhibits 4 additional modules
Unsigned Network
Signed Network
Gene significance
Definition
•
Differential gene expression test between control
versus knockout
–
–
•
•
–
Control: Mouse microrray samples treated with empty
virus versus
Knockout: microarray samples treated with a Oct4 RNAi
(Oct4 is of major biological importance in ES pluripotency)
Individual gene significance = t-test statistic
Note that the t-test keep tracks of the sign
Goal: To relate gene significance to intramodular
connectivity
Absolute Mean Significance Increases Once
New Modules are Found via Signed WGCNA
Unsigned
Signed
Message: signed networks allowed us to split large modules
into smaller, biologically more significant modules
Behind the Scenes: Brown Module is Hidden
within Turquoise
Unsigned
Signed
Signed WGCNA shows influence of known
pluripotency transcription factors
• Separated into their
own module, both
the connectivity and
relative gene
significance of the
TF’s increase.
Brown Module
• Shows Oct4 is a highly connected hub and it is
highly significant in this module.
• This module could not have been detected in an
unsigned network.
• Note that the signed intramodular connectivity is
a biologically important screening variable.
• Biological importance of module is verified by 2
fold enrichment of Oct4 and Nanog binding.
Conclusion
• Signed weighted gene co-expression network
analysis is a robust extension of unsigned
WGCNA, preserving large modules while finding
new and biologically interesting modules, thus
facilitating a system’s level understanding of gene
and/or protein interactions.
Acknowledgement
Biostatistics/Bioinformatics
• Steve Horvath
• Qing Zhou
• Peter Langfelder