Hybrid Modeling of Gene Regulatory Networks through Clustering
Download
Report
Transcript Hybrid Modeling of Gene Regulatory Networks through Clustering
Principal Component Analysis based Methodologies
for Analyzing Time-Course Microarray Data
Sudhakar Jonnalagadda and Rajagopalan Srinivasan
Dept. of Chemical and Biomolecular Engineering
National University of Singapore
PCA-based technique for
• Clustering genes
• Finding distinct clusters
• Identifying differentially expressed genes
Motivation
– PCA modeling
X nt ZP T z1p1T z 2pT2 ..... z k pTk E
Assays
X
PC
Gene Score Vectors
Genes
Z
=
PT
=
Scores Plot of expression data
0.4
6
PC 1
PC 2
0.2
0
-0.2
-0.4
-0.6
4
Scores on PC 2 (17.05%)
• Time-course microarray
experiments provide large
amount of data related to
dynamic changes in the cells
• Large number of genes are
measured
– Multivariate data
• To answer different biological
problems, different data mining
techniques are needed
• Challenge: Can we develop a
generalized tools that are
applicable to several datamining problems?
2
0
-2
-4
2
4
6
8
10
12
-6
-5
0
5
Scores on PC 1 (66.20%)
•Few PCs are sufficient to model the data
adequately
•Removes noise from the data
10
PCA Modeling
PCA Model
Gene Clustering
•Group genes into different cluster
that minimizes the sum of
– normalized
distance
between each gene to the
cluster centroid within the
PCA model
– the orthogonal distance to
the PCA model
C
min
i
xCi
Tx2
T0.95
Comparing Clusters
Identifying DEG
Model each cluster using PCA and
measure the similarity of models
using PCA similarity factor
Model the control data and
project the treatment on to the
model. Compare the scores to
find differentially expression
l
l
A B
i
j
S PCA
( A, B)
cos2 ij
i 1 j 1
Zi Zi(1) Zi( 2)
l
A B
i i
Qx
Q0.95
MDi2 (Zi Z )S 1(Zi Z )T
i
1515
1
Q
1010
T2
0.5
θ22θ
55
θ21
B
Histone family protein H1f0
θ11
12
AAA
00
Heat-shock protein
0
1
-5-5
1
0.5
-10
-10
0.5
0 0
-15
-15
-15
-15
-10
-10
-5-5
00
55
1010
1515
Results: clustering genes
k-means clustering
GK clustering
• PCA and GK clustering
correctly identifies the
clusters
• Only PCA clustering
correctly identifies the
clusters
Yeast cell-cycle data
Artificial Data 1
PCA clustering
Artificial Data 2
DATA
• All clusters need two PCs
to model
• Clusters A, B and C
needs 3,2, and 2 PCs to
model
• PCA and k-means identify
homogenous clusters
•384 cell-cycle
regulated genes
•5 Clusters:
• Early G1
•Late G1
•S, G2 & M
• All clusters need two PCs
to model
• GK method finds only
four clusters which are not
homogenous
PCA clustering
k-means
GK clustering
Results: Finding Distinct Clusters
Case Study: Yeast cell-cycle Data
1
• Expression data for ~6000 genes at 17 time points
NEPSI Index
• 384 genes found to be cell-cycle regulated
0.8
• Clusters reported: 5
― Early G1, Late G1, S, G2, M
0.6
0.4
0.2
Result:
0
•
NEPSI correctly predicts 5 clusters
•
Clusters enriched with similarly expressed genes
•
Clusters are distinct from other clusters
1
2
3
4
5
6
7
Number of Clusters
8
9
10
Early G1
Genes
1
0.8
Late G1
S
G2
M
Early G1
1
0.183
0.435
0.441
0.233
Late G1
0.183
1
0.262
0.308
0.521
S
0.435
0.262
1
0.467
0.362
G2
0.441
0.308
0.467
1
0.329
M
0.233
0.521
0.362
0.329
1
0.6
Index
Early G1
0.4
Late G1
S
G
Silhouette
Dunns
2
Davies-Bouldin
M
0.2
Time
1
2
3
4 Gene
5
6
7
8
9
10
activation
Number of Clusters
Gene repression
Source: Cho, et al. (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell., 2, 65-73.
0
Results: Finding DEG
Case Study: Mouse data
•
Characterization of the role of HSF1 in mammalian cells
•
Time-course expression data is collected for 9468 genes at 8 time points in
WT (control) and HSF1 KO mouse (Treatment)
•
Several mouse genes (homologue of human genes) bound by HSF1 are
differentially expressed in KO mouse.
•
However, several genes that are not bound by HSF1 are induced in both WT
and KO mouse.
•
Conclusion: HSF1 doesn’t regulate all the heat-induced genes in mammalian
cells.
Result:
•
PCA identifies 288 differentially expressed genes
–
Novel genes shows differential expression
in wild-type and mutant mice
78 of them are previously reported as differentially expressed
•
PCA identified 4 (out of 9) mouse genes homologues of human genes that are
both bound by HSF1 and induced in WT mouse but not activated in HSF1 KO
mouse
•
13 (out of 15) mouse genes homologue of human genes that are not bound by
HSF1 are found to be similarly expressed in both WT and KO mouse
•
Conclusions:
–
PCA correctly identifies differentially expressed genes
–
Results support that HSF1 doesn’t regulate all the heat-induced genes
in mammalian cells
Trinklen,N.D. et al. (2004) The role of heat shock transcription factor 1 in the genome-wide regulation of the mammalian heat shock response. Mol.Biol.cell. 15, 1254-1262.