Functional annotation and network reconstruction through

Download Report

Transcript Functional annotation and network reconstruction through

Functional annotation and
network reconstruction through
cross-platform integration of
microarray data
X. J. Zhou et al. 2005
Challenges in microarray
data analysis
• Integration of multiple microarray data sets.
– Different platforms, e.g. cDNA arrays, Affymetrix arrays
– Alternative experimental parameters
• Identification of functionally related genes which
do not have similar expression patterns.
• Reconstruction of transcriptional regulatory
networks.
– It is difficult to elucidate the cooperativity between TFS
because the changes in their expression are often subtle
and their activities are often controlled at levels other
than expression.
Data pre-processing
• Classify the 618 expression profiles
into 39 data sets. A data set contains
a set of expression profiles measured
under relevant conditions.
– 19 cDNA data sets from SMD
– 4 Affymetrix data sets from GEO
– 16 data sets from Rosetta
19 SMD data sets
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Alpha factor release
cdc15 block release
DTT Exposure
Elutriation
Forkhead regulation
Gamma radiation
Menadione exposure
DNA damage (MMS) response
Nitrogen depletion
Nutrition limitation
Osmotic shock
SIR proteins (Chromatin Silencing)
Sorbitol effects
H2O2 response
Heat shock
Heat steady
CellCycle Factor
YPD Stationary phase
Zinc homoeostasis
Corresponding to 19 SMD subcategories
4 GEO data sets
• Aging
• Chitin synthesis
• Fermentation time course
• Ume6 regulon
16 Rosetta data sets
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Cell cycle control
Cell wall organization
Chromatin assembly
Ion homeostasis
Nucleotide metabolism
Organelle biogenesis
Perception of external stimulus
Protein biosynthesis
Protein degradation
Classification is based on the
Protein metabolism
Protein phosphorylation
GeneOntology (GO) biological
Protein transport
process categories of the
Pseudohyphal growth
deleted genes.
Steroid metabolism
Amino Acid Starvation
MAPK pathway
The idea: 2nd-order
expression correlation
• 1st-order expression correlation
– Correlation of expression patterns from
one data set
– For each pair of genes, a vector of
length n is obtained. n is the number of
data sets.
• 2nd-order expression correlation
– Correlation of the 1st-order expression
correlation
An example
The overall expression similarity between the two gene pairs is
not significantly high. However, their 1st-order expression
correlation profiles exhibit high correlation, that is, the four
genes have high 2nd-order expression correlation.
Clustering functionally
related genes
• Procedure
– Identification of doublets
• A doublet is a pair of genes that is tightly
co-expressed in multiple data sets.
– Clustering of doublets based on their
1st- order expression correlation profiles
• Results
– 72 of the top 100 tightest clusters are
functionally homogeneous.
Gene function prediction
• A prediction of function is made for a
doublet only if it is in a tight cluster
that includes at least three doublets
and in which all remaining doublets
share the same function.
• 79 functions are assigned to 67
unknown genes. Some have been
verified by experimental studies.
Reconstruction of regulatory networks
• For each transcription module, a 1st-order
average expression correlation profile (a vector
with the same length as the number of data sets)
is calculated. The profile of a module can be
interpreted as the activity profile of the
transcription factor(s) that regulate the module.
– A transcription module is defined to be a set of genes
that are regulated by the same transcription factor(s)
based on genome-wide location data, and are
coexpressed in multiple data sets.
– 60 TM are identified.
• A 2nd-order expression correlation is calculated
for two activity profiles of transcription factors, to
measure the cooperativity between the two
transcription factors.
– 34 pairs show high 2nd-order correlation.
Clustering of modules
Annotation of TFs
• The function of a TF is predicted
based on two evidences:
– The functions of known genes in its
target module
– The functions of known genes in other
modules in the same module cluster
• TF GAT3 is predicted to play a role in
mitotic and meiotic cell cycles.