GESSICA – Gene ExpreSSion and Interactions Cluster Analysis

Download Report

Transcript GESSICA – Gene ExpreSSion and Interactions Cluster Analysis

MATISSE - Modular Analysis
for Topology of Interactions
and Similarity SEts
http://acgt.cs.tau.ac.il/matisse
Igor Ulitsky and Ron Shamir
Identification of Functional Modules using Network
Topology and High-Throughput Data.
BMC Systems Biology 1:8 (2007).
Microarray data analysis
• Input: expression levels
of (all) genes in several
conditions
• Analysis methods:
• Clustering (CLICK)
• Biclustering (SAMBA)
• Extraction of regulatory
networks
Protein interaction network
analysis
• Input: Network with
nodes=proteins/genes
edges=interactions
• Analysis methods:
•
•
•
•
Global properties
Motif content analysis
Complex extraction
Cross-species
comparison
Integrated analysis
•
•
•
•
Combined support for low quality data
Joint visualization
Statistics of known pathways
Detection of “hot spots”
MATISSE
• Identify sets of genes (modules) that
• Have highly correlated expression patterns
• Induce connected subgraphs in the
interaction network
Interaction
High
Similarity
MATISSE workflow
• Seed generation
• Greedy optimization
• Significance filtering
Advantages of MATISSE
• No need for confidence estimation on
individual measurements
• Works even when only a fraction of the
genes have expression patterns
• Can handle any similarity data, not only
expression
• Produces connected modules
• No need to specify the number of
modules
Osmotic shock response of S.
cerevisiae
• Network of 6,246 genes and 65,990
protein-protein and protein-DNA
interactions
• 133 experimental conditions – response of
perturbed strains to osmotic shock
(O’Rourke and Herskowitz, 2004)
• 2,000 genes filtered based on variation
criterion
GO and promoter analysis
Subnetwork
1
Size
120
Front
119
2
3
120
120
118
118
5
120
112
6
120
99
7
120
107
8
11
114
120
85
114
14
15
16
17
120
120
89
120
102
96
61
109
18
20
87
46
59
35
Enriched GO terms
P-value
processing of 20S pre-rRNA
< 0.001
rRNA processing
< 0.001
35S primary transcript processing
< 0.001
ribosomal large subunit assembly and maintenance 0.019
rRNA modification
< 0.001
ribosome biogenesis
0.029
translational elongation
< 0.001
processing of 20S pre-rRNA
< 0.001
rRNA processing
0.03
35S primary transcript processing
0.011
ribosomal large subunit assembly and maintenance 0.019
ribosomal large subunit biogenesis
< 0.001
signal transduction during filamentous growth
0.01
conjugation with cellular fusion
< 0.001
transcription from RNA polymerase III promoter
< 0.001
transcription from RNA polymerase I promoter
0.006
ergosterol biosynthesis
< 0.001
hexose transport
0.019
chromatin remodeling
0.05
pseudohyphal growth
0.01
response to stress
< 0.001
ubiquitin-dependent protein catabolism
0.047
nuclear mRNA splicing, via spliceosome
< 0.001
ubiquitin-dependent protein catabolism
< 0.001
response to stress
< 0.001
mitochondrial electron transport
< 0.001
nuclear mRNA splicing, via spliceosome
0.012
pyridoxine metabolism
0.045
TFs
Fhl1
Rap1
Sfp1
P-Value
4.82E-16
2.89E-11
2.98E-08
Fhl1
1.03E-05
Ste12
Dig1
5.41E-13
5.41E-13
Msn2
Msn4
3.17E-04
1.82E-12
Rpn4
Msn4
6.44E-06
1.74E-03
Pheromone response
subnetwork
Back
Front
Proteolysis subnetwork
Back
Front
Performance comparison
GO-Process
120
GO-Compartment
MIPS Phenotypes
% of modules
100
KEGG Pathw ays
80
60
40
20
0
Matisse
Co-Clustering
CLICK
Random
% of modules with category enrichment at p< 10-3
Performance comparison (2)
% of annotations
45
GO-Process
GO-Compartment
40
MIPS Phenotypes
35
KEGG Pathw ays
30
25
20
15
10
5
0
Matisse
Co-Clustering
CLICK
Random
% annotations w enrichment at p<10-3 in modules
Human cell cycle
• Constructed a network with 6,000 nodes,
25,000 edges
•
•
•
•
HPRD
BIND
Y2H studies
SPIKE
• HeLa cell cycle time series (Whitfield ’02)
• Produced subnetworks enriched with all
the phases of the cell cycle
M phase subnetwork
Extensions of MATISSE
• CEZANNE
• Utilizes confidence-based networks
• Extracts subnetworks that are connected with
high confidence and co-expressed
• Applied to 11 studies of gene expression
in the blood
• Not yet implemented in the MATISSE
application
Extensions of MATISSE
• DEGAS
• Utilizes case-control expression data
• Identifies disregulated pathways – areas in
the network in which many genes are
dysregulated in most of the cases
• Beta version implemented in the MATISSE
software
• Ulitsky, Karp and Shamir RECOMB 2008
Difficulties with prior
approaches
• In case-control data, gene pattern correlation can
be due to diverse non-disease related factors
• Patients are different
• Genetic background
• Other diseases/confounding factors
• Disease grade
• Current methods assume that the same genes are
dysregulated in all the patients
• A weaker assumption – a lot of dysregulated
genes appear in the same dysregulated pathway
www.hrphotocontest.com
HD down-regulated
• The pathway down-regulated in
Huntington’s disease (HD)
• Enriched with:
• HD modifiers
• HD relevant genes
• Calcium signalling
Clear outlier
Huntingtin
Extensions of MATISSE
• Identification of modules correlated with
external parameters
• Numerical parameters: Age, tumor grade etc.
• Logical parameters: Gender, tumor type
• Identifies subnetworks with genes that
are both
• Correlated with the clinical parameter
• Correlated with one another
MATISSE tool capabilities
• MATISSE algorithm execution
• Dynamic subnetwork layout
• Customized node/edge highlighting
• Dynamic expression matrix viewer
• Module annotation
• TANGO – Gene Ontology
• Annotations with custom datasets
• Calculation of different coefficients based
on network/expression