Module 3 Lab: Cytoscape and Enrichment Map

Download Report

Transcript Module 3 Lab: Cytoscape and Enrichment Map

Canadian Bioinformatics Workshops
www.bioinformatics.ca
In collaboration with
Cold Spring Harbor Laboratory
&
New York Genome Center
Module #: Title of Module
3
Module 3 Lab
Cytoscape & Enrichment Map
Jüri Reimand
Network Analysis Workflow
Collect genomics
data (e.g. mRNA
expression)
Normalize and
score (e.g.
compute
differential
expression)
Visualize and Identify
interesting pathways
and networks
Generate
gene list
Drill down to
understand molecular
mechanism
Learn about underlying
cellular mechanism
using pathway and
network analysis
Publish model
explaining
data
• A specific example of this workflow:
• Cline, et al. “Integration of biological networks and gene expression data
using Cytoscape”, Nature Protocols, 2, 2366-2382 (2007).
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Pathways vs Networks
- Detailed, high-confidence consensus
- Biochemical reactions
- Small-scale, fewer genes
- Concentrated from decades of literature
- Simplified cellular logic, noisy
- Abstractions: directed, undirected
- Large-scale, genome-wide
- Constructed from omics data integration
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Networks
• Represent relationships of biological molecules
– Physical, regulatory, genetic, functional interactions
• Useful for discovering relationships in large data sets
– Better than tables in Excel
• Visualize multiple data types together
– Discover interesting patterns
• Network analysis
– Finding sub-networks with certain properties (densely connected, coexpressed, frequently mutated, clinical characteristics)
– Finding paths between nodes (or other network “motifs”)
– Finding central nodes in network topology (“hub” genes)
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Network basics (i): nodes and edges
A simple mapping
• one molecule - node
• one interaction - edge
A more realistic mapping
• Cell localization
• cell cycle
• cell type
• taxonomy
• Physiologically relevant
• Edges - diverse relationships
Critical: what do nodes and edges mean?
Network basics (i): nodes and edges
Node (molecule)
A simple mapping
• one molecule - node
• one interaction - edge
A more realistic mapping
• Cell localization
• cell cycle
• cell type
• taxonomy
• Physiologically relevant
• Edges - diverse relationships
Critical: what do nodes and edges mean?
•
•
•
•
•
•
Gene
Protein
Transcript
Drug
MicroRNA
…
Edge (interaction)
•
•
•
•
•
•
Genetic interaction
Physical protein interaction
Co-expression
Signaling interaction
Metabolic reaction
DNA-binding
Directed or undirected?
Network Representations
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Network basics (ii) layout and visual attributes
Use layout to
interpret network
Local relationships
Guilt-by-association
Dense clusters
Global relationships
Visualise multiple
types of data
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Automatic network layout
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Force-directed network layout
• Force-directed: nodes repel and edges pull
• Good for up to 500 nodes
– Bigger networks give hairballs - reduce number of edges
• Advice: try force directed first, or hierarchical for tree-like
networks
• Tips for better looking networks
– Manually adjust layout
– Load network into a drawing program (e.g. Illustrator) and
adjust labels
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
force directed layout
1
http://sydney.edu.au/engineering/it/~aquigley/avi/spring.avi
3
Module 3 Lab: Cytoscape and Enrichment Map
2
4
bioinformatics.ca
Dealing with ‘hairballs’: zoom or filter
MKK1
MKK2
SLT2
Zoom
Focus
PKC (Cell Wall Integrity)
Wsc1/2/3
WSC2
Mid2
WSC3
MID2
SLG1
SWI4
SWI6
RLM1
Bni1
Polarity
Synthetic Lethal
Transcription Factor Regulation
Protein-Protein Interaction
RHO1
Rho1
PKC
Cell
Wall
Integrity
Pkc1
PKC1
BNI1
Bck1
BCK1
MKK1
Mkk1/2
MKK2
SLT2
Up Regulated Gene Expression
Slt2
Down Regulated Gene Expression
Swi4/6
Rlm1
SWI4
SWI6
RLM1
Attribute color Arrow shape
Visual Features
• Node and edge attributes
– Represent properties of genes,
interactions
– Text (string), integer, float,
Boolean, list
Edge shape
Node shape
• Visual attributes
– Node, edge visual properties
– Colour, shape, size, borders,
opacity...
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
What Have We Learned?
•
•
•
•
•
•
Networks help discover relationships in large data sets
Networks help integrate several datasets or types
Important to understand meaning of nodes and edges
Avoid hairballs by focusing analysis
Automatic layout is required to visualize networks
Visual attributes enable multiple types of data to be
shown at once – useful to see their relationships
Module 37Lab:
Cytoscape
and Enrichment Map
Part
II
bioinformatics.ca
Cytoscape - Network Visualization
Cytoscape is
• an open source software platform
• for visualizing complex networks and
integrating these with any type of
attribute data.
• a lot of apps are available for various
kinds of problem domains, including
bioinformatics, social network
analysis, and semantic web.
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Manipulate Networks
Automatic Layout
Filter/Query
Interaction Database Search
The Cytoscape App Store
http://apps.cytoscape.org
Pathway analysis
Gene expression analysis
Complex detection
Literature mining
Network motif search
Pathway comparison
Introduction to Cytoscape (3.1.0)
save your session
save image
Results panel
Control panel
Table panel
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Navigate through the network (3.1.0)
NETWORK
move the
blue square
to navigate
through the
network
Module 37Lab:
Cytoscape
and Enrichment Map
Part
II
bioinformatics.ca
Try different Cytoscape layouts (3.1.0)
Circular
Layout
Module 37Lab:
Cytoscape
and Enrichment Map
Part
II
bioinformatics.ca
yFiles Circular
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
yFiles Organic
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Change visual Features (3.1.0)
STYLE
Colors
Shapes
Sizes
Brushes
Transparency
…
Module 37Lab:
Cytoscape
and Enrichment Map
Part
II
bioinformatics.ca
Visual
Style
Load “Your Favorite Network”
File > Import > Network > File …
Visual
Style
Load “Your Favorite Expression”
Dataset
File > Import > Table > File …
Visual
Style
Map expression values to node colours using a continuous mapper
Visual
Style
Expression data mapped
to node colours
Fine-tuning network layout
• Move, zoom/pan,
rotate, scale, align
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Create Subnetwork
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Create Subnetwork
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Network Filtering
Select nodes
with at least 20
interactions
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Interaction Database Search
Query the BioGrid
database for interactions
of the cancer gene KRAS
What have we learned
• Cytoscape is a useful, free software tool for network
visualisation and analysis
• Provides basic network features
– Automated layouts
– Mapping of genomic attributes to visual attributes
• Apps are available to extend functionality to diverse
analyses
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Enrichment Map
• A Cytoscape app to
visualize and interpret
results of pathway
enrichment analysis
• Enriched pathways are
visualized as networks
• Edges connect pathways
with many shared genes
pathway network
Module 3 Lab: Cytoscape and Enrichment Map
node
(pathway)
edge: # of
overlapping
genes
bioinformatics.ca
GO.id
GO:0042330
GO:0006935
GO:0002460
GO:0002250
GO:0002443
GO:0019724
GO:0030099
GO:0002252
GO:0050764
GO:0050766
GO:0002449
GO:0019838
GO:0051258
GO:0005789
GO:0016064
GO:0007507
GO:0009617
GO:0030100
GO:0002526
GO:0045807
GO:0002274
GO:0008652
GO:0050727
GO:0002253
GO:0002684
GO:0050778
GO:0019882
GO:0002682
GO:0050776
GO:0043086
GO:0006909
GO:0002573
GO:0006959
GO:0046649
GO:0030595
GO:0006469
GO:0051348
GO:0007179
GO:0005520
GO:0042110
GO:0002455
GO:0005830
GO.name
p.value
covercover.rat
Deg.mdn
Deg.iqr
taxis
2.18E-06
23 0.056930693 54.94499375 9.139238998
chemotaxis
2.18E-06
23 0.060209424 54.94499375 9.139238998
adaptive immune response based on somatic recombination
7.10E-05
25 0.111111111 57.32306955 16.97054864
adaptive immune response
7.10E-05
25 0.111111111 57.32306955 16.97054864
leukocyte mediated immunity
0.000419328
23 0.097046414 58.27890582 15.58333739
B cell mediated immunity
0.000683758
20 0.114285714 57.84161096 15.03496347
myeloid cell differentiation
0.000691589
24 0.089219331 62.22171598 10.35284833
immune effector process
0.000775626
31 0.090116279 58.27890582 23.86214773
regulation of phagocytosis
0.000792138
8
0.2 53.54786293 5.742849971
positive regulation of phagocytosis
0.000792138
8 0.216216216 53.54786293 5.742849971
lymphocyte mediated immunity
0.00087216
22 0.101851852 57.84161096 16.13171132
growth factor binding
0.000913285
15 0.068181818
83.0405088 10.58734852
protein polymerization
0.00108876
17 0.080952381 57.97543252 17.31639968
endoplasmic reticulum membrane
0.001178198
18 0.036072144 64.02284752 12.05209158
immunoglobulin mediated immune response
0.001444464
19 0.113095238 58.27890582 15.58333739
heart development
0.001991562
26 0.052313883 84.02538284 18.60761304
response to bacterium
0.002552999
10 0.027173913 52.75249873 23.23104637
regulation of endocytosis
0.002658555
11 0.099099099 56.38041132 16.02486889
acute inflammatory response
0.002660742
24 0.103004292 57.80098769 24.94311116
positive regulation of endocytosis
0.002903401
9 0.147540984 54.94499375 6.769909171
myeloid leukocyte activation
0.002969661
7 0.077777778 54.94499375 16.07042339
amino acid biosynthetic process
0.003502921
7 0.017241379 45.19797271 31.18248579
regulation of inflammatory response
0.004999055
7 0.084337349 54.94499375 7.737346076
activation of immune response
0.00500146
23 0.116161616 60.29679989 18.41103376
positive regulation of immune system process
0.006581245
27 0.111570248 60.29679989 22.05051447
positive regulation of immune response
0.006581245
27 0.113924051 60.29679989 22.05051447
antigen processing and presentation
0.007244488
7 0.029661017 54.94499375 16.58797889
regulation of immune system process
0.007252134
29 0.099656357 61.05645008 22.65935206
regulation of immune response
0.007252134
29 0.102112676 61.05645008 22.65935206
negative regulation of enzyme activity
0.008017022
9 0.040723982 53.28031076 17.48904224
phagocytosis
0.008106069
10 0.080645161 55.66270253 12.47536747
myeloid leukocyte differentiation
0.008174948
10 0.092592593 62.86577216 9.401887596
humoral immune response
0.008396095
16 0.044568245 55.05654091 18.94209565
lymphocyte activation
0.009044401
29 0.059917355 61.92213317 21.03553355
leukocyte chemotaxis
0.009707319
7 0.101449275 56.33116709 6.945510559
negative regulation of protein kinase activity
0.010782155
7 0.046357616 52.22863516 12.58524145
negative regulation of transferase activity
0.010782155
7 0.04516129 52.22863516 12.58524145
transforming growth factor beta receptor signaling pathw 0.012630825
13 0.071038251 83.49440788 12.63256309
insulin-like growth factor binding
0.012950071
9 0.097826087 81.41963394 7.528247832
T cell activation
0.013410548
20 0.064516129 59.77891783 26.06174863
humoral immune response mediated by circulating immunogl 0.016780163
10
0.125 54.70766244 14.2572143
cytosolic ribosome (sensu Eukaryota)
0.016907351
8 0.01843318 61.68933284 7.814673781
Ad
Zoom of CNS-Development
Cell projection Neuron
migration
organization
Cell morphogenesis
Cerebral cortex
cell migration
Cell Motility
(stricter cluster)
Neurite development
CNS neuron
differentiation
Brain
development
Axonogenesis
CNS
development
Projection neuron
axonogenesis
N
re
of
Creating an Enrichment Map with data from g:Profiler
1. Go to g:Profiler website - http://biit.cs.ut.ee/gprofiler/ .
2. Select and copy all genes in the tutorial file
MCF7_24hr_topgenes.txt in the Query box
3. In Options, check Significant only, No electronic GO annotations
4. Set the Output type to Generic Enrichment Map (TAB)
5. Show advanced options
6. Set Max size of functional category to 1000 and Min size to 5.
7. Set Min size of gene list and functional category overlap Q&T to 2.
8. Set Significance threshold to Benjamini-Hochberg FDR .
9. Choose GO biological process, molecular function, KEGG and
Reactome from the color legend.
10. Download g:Profiler data as gmt: name
11. Click on g:Profile! to run the analysis
12. Download the result file: Download data in Generic Enrichment
Map (GEM) format
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
g:Profiler files for
Enrichment Map
• Two files are required:
A.
A. Tabular text file with
enriched
processes&pathways
B. GMT file with all
processes&pathways
and associated genes
B.
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
TAB file – Enrichment Map
Process ID
Process name
P-value, FDR
Phenotype
positive 1; negative -1
Genes common to input list
and process/pathway
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
GMT file – Enrichment Map
http://biit.cs.ut.ee/gprofiler/gmt/gprofiler_hsapiens.NAME.gmt.zip
Process ID
Process name
Module 3 Lab: Cytoscape and Enrichment Map
Gene list
bioinformatics.ca
Getting the Enrichment Map app
A. Cytoscape App Store
http://apps.cytoscape.org/apps/enrichmentmap
B. In Cytoscape software
Apps > App manager > Search > Install
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Constructing an
Enrichment Map
A. Set input file format
to “Generic”
A.
B.
C.
(format of g:Profiler results)
B. Select GMT file from
file system
(this file contains gene lists of pathways)
C. Select enrichments
file from file system
D.
D. Set equal P-value and
Q-value cutoffs
(because g:Profiler only provides
corrected P-values=Q-values)
Click Build !
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Enrichment Map –
a network of pathways
Nodes – pathways, processes, functions
Edges – between pathways with many shared genes
Node size – genes in pathway
Node color – enrichment strength
Edge weight – genes shared
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Enrichment Map –
a network of pathways
What are major functional themes?
Apoptosis
Proliferation
Protein
degradation
Mitochondrial
processes
Tissue development
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Edge definition
determines granularity
• Edges between
pathways defined by %
of common genes
• Increasing edge
stringency reveals
finer granularity of
functional themes
Mitochondrial
processes
Module 3 Lab: Cytoscape and Enrichment Map
Protein
degradation
Apoptosis
bioinformatics.ca
Mapping attributes to network
Node labels mapped to pathway IDs by default
Pathway IDs
Module 3 Lab: Cytoscape and Enrichment Map
Pathway names
bioinformatics.ca
Compare two pathway enrichment analyses
g:Profiler analyses of MCF7_12hr_topgenes.txt and MCF7_24hr_topgenes.txt
1st dataset maps to node fillings
2nd dataset maps to node edges
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Mapping further gene sets
to Enrichment Map
E.g. which processes associate to known cancer
genes?
CANCER_GENES
B. Select GMT file from
file system
(this file contains gene lists used in
original Enrichment Map)
B. Select another GMT
file from file system
(this file contains gene lists mapped on
top of original Enrichment Map)
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Post-Analysis: New edges link further
gene lists to existing pathways
Purple edges – pathway genes in this analysis
are significantly enriched in known cancer genes
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
What have we learned?
• Enrichment map uses network visualization to summarize
results of pathway enrichment analysis
• Edges connect pathways with shared genes, and edge
stringency determines granularity of maps
• Two sets of pathways can be compared
• Post-analysis links further gene sets to existing map
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Tips for publishable Enrichment Maps
• Manually curate clusters of connected pathways as
“functional themes”
– e.g. by picking most general process/pathway in the group
– Check genes involved
• Review small groups and singleton nodes
– Many are probably part of larger groups – safe to remove
– Some interesting singletons may be the only representatives!
• Assign further visual attributes using your omics data
• Export as PDF and use graphics software (AI) for finalizing
• Sometimes less is more
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Cytoscape Tips & Tricks
• Network views
– When you open a large network, no view is shown by default
– To improve interactive performance, Cytoscape has the concept
of “Levels of Detail”
• Some visual attributes will only be apparent when you zoom in
• The levels for various attributes can be changed in the preferences
• To see what things will look like at full detail:
– ViewShow Graphics Details
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Cytoscape Tips & Tricks
• Sessions
– Sessions save pretty much everything:
•
•
•
•
Networks
Properties
Visual styles
Screen sizes
– Saving a session on a large screen may require some resizing
when opened on your laptop
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Cytoscape Tips & Tricks
• Memory
–
–
–
–
–
Cytoscape uses lots of it
Doesn’t like to let go of it
An occasional restart when working with large networks helps
Destroy views when you don’t need them
Java doesn’t have a good way to get the memory right at start
• Since version 2.7, Cytoscape does a much better job at “guessing” good
default memory sizes than previous versions
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
Cytoscape Tips & Tricks
• CytoscapeConfiguration directory
– In your user/home directory
– Your defaults and any apps downloaded from the app
store will go here
– Sometimes, if things get really messed up, deleting (or
renaming) this directory can give you a “clean slate”
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca
We are on a Coffee Break &
Networking Session
Module 3 Lab: Cytoscape and Enrichment Map
bioinformatics.ca