Integrative Bioinformatics using Cytoscape (and R2)
Download
Report
Transcript Integrative Bioinformatics using Cytoscape (and R2)
Integrative Bioinformatics using Cytoscape
(and R2)
Human Genetics
(Bio)Chemistry versus Molecular Biology
…some basic concepts
•
•
•
•
•
(Bio)Chemistry
Concentrations
Molecular structures
Reaction equations
Quantitative
Defined experimental setup
•
•
•
•
•
Molecular Biology
Regulation
Large biomolecules
Large scale processes
Qualitative
Complex experimental setup
(by necessity!)
Human Genetics
Molecular Biology: New techniques
Integrative Bioinformatics needed
(Deep)Sequencing – Arrays – Proteomics
• Quantitative analysis
– handling large datasets
– statistics
• Capturing complexity
– integration
– graphs
• Integrative Bioinformatics: Integrated Bioinformaticians!
Human Genetics
Integrative Bioinformatics:
An example
Human Genetics
Integrative Bioinformatics:
What they did
1. Sequence genome; assign gene function using protein sequence, structural
similarities (Bonneau et al., 2004; Ng et al., 2000)
2. Perturb cells: environmental factors; knockouts (Baliga et al., 2004; Kaur et al.,
2006; Kottemann et al., 2005)
3. Measure changes: microarrays (Baliga et al., 2004;Kaur et al., 2006; Whitehead
et al., 2006).
4. Integrate diverse data (mRNA levels, evolutionarily conserved associations
among proteins, metabolic pathways, cis-regulatory motifs, etc.) with the
cMonkey algorithm to reduce data complexity and identify subsets of genes
that are coregulated in certain environments (biclusters) (Reiss et al., 2006).
5. Using the machine learning algorithm Inferelator construct a dynamic
network model for influence of changes in EFs and TFs on the expression of
coregulated genes (Bonneau et al., 2006).
6. Explore the network with Gaggle, a framework for data integration and
software interoperability to formulate and then experimentally test
hypotheses to drive additional iterations of steps 2–6 (Shannon et al., 2006)
Human Genetics
Integrative Bioinformatics:
Their framework
Human Genetics
Integrative Bioinformatics:
results
Human Genetics
Goes to show that:
1. Aggregate
•
2. Search/Visualize
•
Combine data from different
sources
Filter
3. Analyze/Feedback
•
Algorithms
Need for adaptable software
Goal: Facilitate ideas
Human Genetics
Cytoscape - Network Visualization and Analysis
•
•
•
•
•
Freely-available (opensource, java) software, easily
extensible (Plugin API)
Visualizing networks (e.g.
molecular interaction
networks)
Analyzing networks with
gene expression profiles and
other cell state data (GO,
proteomics, …)
Used in several hundred
analyses in recent literature
Continuity guaranteed
Human Genetics
An example Cytoscape work-flow
Human Genetics
Cytoscape Workflow
1.
2.
3.
4.
•
Load Networks (Import network data into Cytoscape)
Load Attributes (Get data about networks into Cytoscape)
Analyze and Visualize Networks
Prepare for Publication
A specific example of this workflow:
– Cline, et al. “Integration of biological networks and gene
expression data using Cytoscape”, Nature Protocols, 2, 23662382 (2007).
Human Genetics
Networks as graphs
• A Network is a collection of
– Nodes (or vertices)
– Edges connecting nodes
(directed or undirected,
weighted, multiple edges,
self-edges)
Nodes can represent proteins, genes, metabolites, or groups of
these (e.g. complexes) - any sort of object
Edges can be either physical or functional interactions,
activators, regulators, reactions - any sort of relations
Human Genetics
Cytoscape Workflow
1.
2.
3.
4.
Load Networks (Get network data into Cytoscape)
Load Attributes (Get data about networks into Cytoscape)
Analyze and Visualize Networks
Prepare for Publication
Human Genetics
Creating a network
Human Genetics
Free-format Text and Excel Files
Specify
Input File
Define
Columns
Text Parsing
Options
Preview
Human Genetics
Pathways: plenty resources
http://pathguide.org : over 240 pathway db’s
Human Genetics
All kinds of network data…
• Physical interactions
– Protein – Protein interactions
– Protein – DNA interactions
– Metabolic interactions
• Functional interactions
– Co-expression relations
– Genetic interactions
– Knockout/siRNA – targets
Human Genetics
Pre-formatted Network Files
• Cytoscape supports many popular file formats:
SIF (Simple Interaction Format)
GML (Graph Markup Language)
XGMML (eXtensible Graph Markup and Modeling Language)
BioPax (Biological Pathway Data)
PSI-MI 1 & 2.5 (Protein Standards Initiative)
SBML Level 2 (Systems Biology Markup Language)
• Available for download from data sources (URLs, web-services,
formatted table files)
Human Genetics
Internet Databases
• Cytoscape version 2.6
– web service clients:
import networks directly
from several trusted
internet resources
IntAct (MBL-EBI)
PathwayCommons
(collection of
data resources)
NCBI Entrez Gene
Many more will be
included...
Human Genetics
Interaction Database Search
Import
Visualize and Analyze
Human Genetics
Cytoscape Workflow
1.
2.
3.
4.
Load Networks (Get network data into Cytoscape)
Load Attributes (Get data about networks into Cytoscape)
Analyze and Visualize Networks
Prepare for Publication
Human Genetics
What are Attributes?
• Any data that describes or provides details about the nodes
and edges in the network
–
–
–
–
–
Gene Expression Data
Mass Spectrometry Data
Protein Structure Information
Gene Ontology (GO) terms
Interaction Confidence Values, etc
• Cytoscape support multiple data types
–
–
–
–
Numbers (integers, floats)
Text (strings)
Logical (booleans)
Lists…
Human Genetics
Attribute Management
Select Attributes
for Display
Node or
Edge ID
Strings and
floating type
of attributes
Specific Attribute Tabs
Load Attributes:
Import Attribute Files
• Map data about Networks onto Networks.
• Attributes can be loaded in many of the same ways as
networks.
Import pre-formatted attribute files
Import formatted text or Excel files
Create attributes manually in attribute editor
Load attributes from web services
ID mapping though node attributes
Human Genetics
ID Mapping
• Mapping identifiers from one
source to another is a major
challenge
• Multiple levels of IDs E.g.
probe->gene
->peptide>protein
• Cytoscape provides an ID
mapping through the BioMart
web service of EBI to convert
the IDs
• Not perfect but sufficient
• Additional mapping mechanism
underway
Human Genetics
Cytoscape Workflow
1.
2.
3.
4.
Load Networks (Get network data into Cytoscape)
Load Attributes (Get data about networks into Cytoscape)
Analyze and Visualize Networks
Prepare for Publication
Human Genetics
Visual Data Integration
1. Network Data
YDR382W
YDR382W
YFL039C
YFL039C
pp
pp
pp
pp
YDL130W
YFL039C
YCL040W
YHR179W
VizMapper
2. Attribute Data
ExpressionValue
YCL040W = 0.542
YDL130W = -0.123
YDR382W = -0.058
YFL039C = 0.192
YHR179W = 0.078
Human Genetics
VizMapper
List of Visual
Styles
Default Visual
Style Editor
List of Visual
Attributes
List of Data
Attributes
Mapping definition
Human Genetics
Types of mappings
•
•
Continuous
Continuous Data mapped to Continuous Visual Attributes (e.g.
gene expression levels mapped to node color)
Continuous Data mapped to Discrete Visual Attributes (e.g. p-value
categories mapped to node shape)
Discrete
Discrete (categorical) Data to Discrete Visual Attributes (e.g. GO
annotation mapped to node shape)
Discrete Data mapped to Continuous Visual Attributes(e.g. multiple
GO terms mapped to pie coloring)
Human Genetics
Network Filtering
Human Genetics
Several Layout Algorithms
Spring-embedded
Circular
Hierarchical
Human Genetics
Linkout
• Nodes and Edges act
as hyperlinks to
external databases.
• User-configurable
URLs
• Collection of the
biological results for
the publication
Human Genetics
Cytoscape Workflow
1.
2.
3.
4.
Load Networks (Get network data into Cytoscape)
Load Attributes (Get data about networks into Cytoscape)
Analyze and Visualize Networks
Prepare for Publication
Human Genetics
Prepare for Publication
• Fine tune the
Figures
• Manual Layout
manipulation
options (align,
scale, rotate)
• Manually override
visual styles
–place labels,
change colors,
etc.
Human Genetics
Finalizing the Figures
• Publication Quality
Graphics in several
formats
PDF, EPS, SVG,
PNG, JPEG, and
BMP
• Export Session to
HTML for Web
Human Genetics
Cytoscape: So what?
The big Pro Cyto argument: EXTENSIBLE
• Plugins, Plugins, Plugins
– In our case enabled extended array data analysis
Human Genetics
Cytoscape is Extensible
• Cytoscape is open source and free software
• A plugin interface that allows any programmer to write their
own extensions to Cytoscape
• Plugins represent the primary biological analysis
mechanism in Cytoscape
• Plugins are distributed from a central Cytoscape database
and can be installed while running
• Several dozens of plug-ins currently available
(www.cytoscape.org/plugins/index.php)
Human Genetics
Hello World Plugin
http://cytoscape.org/cgi-bin/moin.cgi/Hello_World_Plugin
http://cytoscape.org/cgi-bin/moin.cgi/Developer_Homepage
Human Genetics
Extending the workflow through plugins
Graph based integration and analysis of molecular
biological data
Human Genetics
Integrative Bioinformatics in our group
•
•
•
Aggregate data: 18000+ Affymetrix arrays
– Tumor series
– Public data
– Experiments
• Manipulate celllines; Lentiviral library
Search/Visualize/Selection: R2
– Statistical cutoffs
– Correlations: R2
– Clinical data coupling
Analysis/Feedback: R2 and Cytoscape
– Known Interactions
– Transcription Factor binding
Human Genetics
Integrative Bioinformatics in our group
Patient data
GEO arrays
HGServer
R2-array
analysis
interface
Statistical analysis
Perl module
DB
Cytoscape
webstart
AMC Plugin
Cytoscape
interface
External
data sources
Array data: Tumor
and Experiments
Canonical
paths
Algorithms
Human Genetics
Array data analysis: R2
Mainly work by Jan Koster
Human Genetics
R2 interface: Demo
Human Genetics
R2 interface
Human Genetics
R2 interface
Human Genetics
R2 interface
Human Genetics
R2 interface
Human Genetics
R2 interface
Human Genetics
Timeseries in R2 / Cytoscape (Demo)
Human Genetics
Timeseries in R2
Human Genetics
Timeseries in R2
Human Genetics
Timeseries in R2
Integration with Cytoscape through webstart
Human Genetics
Timeseries in Cytoscape: Visualization
Human Genetics
Timeseries in Cytoscape: Aggregate data
Human Genetics
Timeseries in Cytoscape: Search/Filter
Human Genetics
Timeseries in Cytoscape: Filter
Human Genetics
Timeseries in Cytoscape
Human Genetics
Timeseries in Cytoscape
Human Genetics
Tf (green) and partners (red)
Human Genetics
Filtering
Human Genetics
Filtering
Human Genetics
Coloring, layout
Human Genetics
Resuming:
1. Aggregate
•
2. Search/Visualize
•
3. Analyze/
Feedback
•
Combine NOTCH3 knockout
data with TF and PPi data
Layout timeseries/Find
downstream targets
Identify MSX1/Knockout in
new experiment
Human Genetics
More Plugin Examples
•
•
•
•
•
•
•
•
BiNGO (Enriched GO categories found in the sub-network)
WikiPathways (Visualize curated pathways)
MCODE (Putative protein complexes)
GenePro (Protein-Protein interaction cluster visualization)
jActiveModules (Search for significant sub-networks)
NetworkAnalyzer (Statistical analysis of networks)
Agilent Literature Search (Network creation)
CyGoose (Gaggle communication)
• See http://cytoscape.org/plugins for many more
Human Genetics
Timeseries and BinGO: Aggregate
Human Genetics
Timeseries and BinGO: Analyze
Human Genetics
Timeseries and BinGO
Human Genetics
Timeseries and BinGO
Human Genetics
GOlorize plug-in (Pasteur)
•
Node placement on the
basis of both the connection
structure (the edges) and
the class structure (GO)
•
A modification of the classic
force-directed layout
algorithm
•
Beyond GO classes, other
class information can be
used though attributes (e.g.
active modules, complexes)
Human Genetics
GOlorize plug-in interface
Γ
Default settings for the
class attractive force
and separation factor
Class-directed network layout
Human Genetics
Example: genetic interaction network
Γ
Standard Spring-embedded layout algorithm in Cytoscape
Human Genetics
Example: genetic interaction network
Γ
Spring-embedded layout algorithm with GO colour-coding
Human Genetics
Example: genetic interaction network
Γ
Final results of the GOlorize layout algorithm in Cytoscape
Human Genetics
Garcia et al. Bioinformatics 2007
Find Network Clusters - MCODE Plugin
• Network clusters are highly interconnected sub-networks
that may be also partly overlapping
• Clusters in a protein-protein interaction network have been
shown to represent protein complexes and parts of
biological pathways
• Clusters in a protein similarity network represent protein
families
• Network clustering is available through the MCODE
Cytoscape plugin
Human Genetics
Network Clustering
7000 Yeast interactions
among 3000 proteins
Human Genetics
Human Genetics
Bader & Hogue, BMC Bioinformatics 2003 4(1):2
Proteasome
26S
Ribosome
Proteasome 20S
RNA Splicing
RNA Pol core
Human Genetics
Bader & Hogue, BMC Bioinformatics 2003 4(1):2
Find Network Motifs - Netmatch plugin
• Network motif is a sub-network that occurs significantly more
often than by chance alone
• Input: query and target networks, optional node/edge labels
• Output: topological query matches as subgraphs of target
network
• Supports: subgraph matching, node/edge labels, label wildcards,
approximate paths
• http://alpha.dmi.unict.it/~ctnyu/netmatch.html
Human Genetics
Finding query sub-networks
Query
Results
Human Genetics
Ferro et al. Bioinformatics 2007
Finding Signaling Pathways
• Potential signaling pathways from plasma membrane to nucleus via
cytoplasm
NetMatch Results
Signaling pathway example
NetMatch query
Ras
MAP Kinase
Cascade
Raf-1
Mek
MAPK
TFs
Nucleus - Growth Control
Mitogenesis
Shortest path between
subgraph matches
Human Genetics
Find Active Subnetworks
• Active modules are sub-networks that show differential
expression over user-specified conditions or time-points
Microarray gene-expression attributes
Mass-spectrometry protein abundance
• Method
Calculate z-score/node, ZA score/subgraph, correct for
random expression data sampling
Score over multiple experimental conditions
Simulated annealing-based search method is used to find
the high scoring networks
Ideker T, Ozier O, Schwikowski B, Siegel AF
Bioinformatics. 2002;18 Suppl 1:S233-40
Human Genetics
Finding active modules
jActiveModules plug-in
Input: interaction network and
p-values for gene expression
values over several
conditions
Output: significant subnetworks that show
differential expression over
one or several conditions
Human Genetics
Ideker T et al. Science 2001; Bioinformatics 2002
Cerebral: Cellular location and expression data
Human Genetics
Concluding
• Cytoscape is a proven valuable tool for integrative
bioinformatics
• Easily extensible: well suited to answer new biological
research questions
• Analyses can be tedious for biologists; up to
bioinformaticians to translate these in simple workflows
• Therefore: bioinformaticians, integrate into wet-lab
research groups!
Human Genetics
Some notes…
• Plugin lifetime
– Maintenance
– Interoperability
• Visualization issues…
– Standard biologist layouts
– Fancy visuals
Cytoscape 3.0 aims to solve these issues (amongst others)
Human Genetics
Availability
• Cytoscape:
– http://cytoscape.org
– [email protected]
– [email protected]
• R2
– Available shortly through http://humangenetics-amc.nl
– Keep yourself posted on
http://groups.google.com/group/r2-announce
Human Genetics