PPTX - Bioinformatics.ca

Download Report

Transcript PPTX - Bioinformatics.ca

Canadian Bioinformatics Workshops
www.bioinformatics.ca
Module #: Title of Module
2
Module 7 – Part III
Pathway and Network Analysis
Lincoln Stein
Why Pathway Analysis?
• Dramatic data size reduction: 1000’s of genes => dozens
of pathways.
• Increase statistical power by reducing multiple
hypotheses.
• Find meaning in the “long tail” of rare cancer mutations.
• Tell biological stories.
Module 7 – Pathway & Network Analysis
bioinformatics.ca
What is Pathway/Network Analysis?
• Any analytic technique that makes use of biological
pathway or molecular network information to gain
insights into a tumor or other biological system.
• A rapidly evolving field.
• Many approaches.
Module 7 – Pathway & Network Analysis
bioinformatics.ca
Pathways vs Networks
Module 7 – Pathway & Network Analysis
bioinformatics.ca
Ingredients you will Need
1. A list of altered genes, proteins, RNAs, etc.
2. A source of pathways or networks.
Module 7 – Pathway & Network Analysis
bioinformatics.ca
Pathway Databases
• Advantages:
–
–
–
–
Usually curated.
Biochemical view of biological processes.
Cause and effect captured.
Human-interpretable visualizations.
• Disadvantages:
– Sparse coverage of genome.
– Different databases disagree on boundaries of pathways.
Module 7 – Pathway & Network Analysis
bioinformatics.ca
KEGG
Module 7 – Pathway & Network Analysis
bioinformatics.ca
Reactome
Module 7 – Pathway & Network Analysis
bioinformatics.ca
Reactome
• Hand-curated pathways in human.
• Rigorous curation standards – every reaction traceable to
primary literature.
• Automatically-projected pathways to non-human species.
• 1522 human pathways; 7327 human proteins.
• Features:
–
–
–
–
Google-map style reaction diagrams with overlays;
Find pathways containing your gene list;
Calculate gene overrepresentation in pathways;
Find corresponding pathways in other species.
• Open access.
Module 7 – Pathway & Network Analysis
bioinformatics.ca
Networks
• Pathways capture only the “well understood” portion
of biology.
• Networks cover less well understood relationships:
–
–
–
–
–
Genetic interactions
Physical interaction
Coexpression
GO term sharing
Adjacency in pathways
Module 7 – Pathway & Network Analysis
bioinformatics.ca
Network Databases
• Can be built automatically or via curation.
• More extensive coverage of biological systems.
• Relationships and underlying evidence more
tentative.
• Popular sources of curated networks:
– BioGRID – Curated interactions from literature; 529,000
genes, 167,000 interactions.
– InTact – Curated interactions from literature; 60,000 genes,
203,000 interactions.
– MINT – Curated interactions from literature; 31,000 genes,
83,000 interactions.
Module 7 – Pathway & Network Analysis
bioinformatics.ca
Pathway Commons
Types of Pathway/Network Analysis
1) Enrichment of Fixed Gene Sets
• Covered in last module.
• Most popular form of pathway/network analysis.
• Advantages:
• Easy to perform.
• Many good end-user tools.
• Statistical model well worked out.
• Disadvantages:
• Many possible gene sets
• Gene sets are heavily overlapping; need to sort through lists
of enriched gene sets!
• “Bags of genes” obscure regulatory relationships among
them.
Module 7 – Pathway & Network Analysis
bioinformatics.ca
2) De Novo Subnetwork Construction &
Clustering
• Apply list of altered {genes,proteins,RNAs} to a biological
network.
• Identify “topologically unlikely” configurations.
– E.g. a subset of the altered genes are closer to each other on
the network than you would expect by chance.
• Extract clusters of these unlikely configurations.
• Annotate the clusters.
Module 7 – Pathway & Network Analysis
bioinformatics.ca
Reactome FI Network
Machine Learning
+
Curated
Pathway Dbs
Uncurated
Interaction
Evidence
Reactome Functional Interaction Network
(~11,000 proteins; 270,000 interactions)
Extract and Cluster “Disease Genes”
Disease
“modules” (1030)
A human functional protein interaction network and its application
to cancer data analysis, Wu et al. 2010 Genome Biology
Reactome FI Network
•10,956 proteins
(9,542 genes).
•209,988 FIs.
•~50% coverage of
genome.
•False (+) rate < 1%
•False (-) rate ~80%
5% of network
shown here
#Sets
>200 Recurrently Mutated Genes in 52 Pancreatic Cancers
50
%Sets
100
90
80
40
70
60
30
50
40
20
30
20
10
10
0
Module 7 – Pathway & Network Analysis
0
Christina Yung
bioinformatics.ca
Functional Interaction Clustering Reveals Modular
Module 5: Axon guidance
Structure
Module 2: p53
signaling
Module 3: Wnt & Cadherin
signaling
Module 0: ERBB, FGFR, EGFR
signaling, Axon guidance
Module 10:
Spliceosome
Module 4:
Translation
Module 8: MHC class II
antigen presentation
Module 1: Hedgehog, TGFβ
signaling
Module 6: Ca2+ signaling
Module 9: Rho
GTPase signaling
Module 7: ECM, focal adhesion,
integrin signaling
Pancreatic Modules After Hierarchical
Clustering
Tumour type 1
Patient Samples
Tumour type 2
Tumour type 3
Tumour type 4
Modules
Popular Network Clustering Algorithms
• GeneMANIA
– “Birds of a feather” principle.
– Very useful for finding genes that are related to an experimentally
defined set.
• HotNet
– Finds “hot” clusters based on propagation of heat across metallic
lattice.
– Avoids ascertainment bias on unusually well-annotated genes.
• HyperModules Cytoscape App
– Find network clusters that correlate with clinical characteristics.
• Reactome FI Network Cytoscape App
– Offers multiple clustering and correlation algorithms (including
HotNet, PARADIGM and survival correlation analysis)
3) Pathway-Based Modeling
• Apply list of altered {genes,proteins,RNAs} to biological
pathways.
• Preserve detailed biological relationships.
• Attempt to integrate multiple molecular alterations
together to yield lists of altered pathway activities.
• Pathway modeling shades into Systems Biology
Module 7 – Pathway & Network Analysis
bioinformatics.ca
Types of Pathway-Based Modeling
• Partial differential equations/boolean models, e.g.
CellNetAnalyzer
– Mostly suited for biochemical systems (metabolomics)
• Network flow models, e.g. NetPhorest, NetworKIN
– Mostly suited for kinase cascades (phosphorylation info)
• Transcriptional regulatory network-based reconstruction
methods, e.g. ARACNe (expression arrays)
• Probabilistic graph models (PGMs), e.g. PARADIGM
– Most general form of pathway modeling for cancer analysis at
this time.
Module 7 – Pathway & Network Analysis
bioinformatics.ca
Mutation
PARADIGM
MDM2
gene
MDM2
TP53
gene
w1
w4
MDM2
RNA
TP53
Apoptosis
mRNA
Change
CNV
TP53
RNA
w2
w5
MDM2
protein
TP53
protein
w3
MDM2
Active
protein
Adapted from Vaske, Benz et al. Bioinformatics 26:i237 (2010)
w7
Mass spec
w6
TP53
Active
protein
Apoptosis
Vaske, Benz et al. Bioinformatics 26:i237 2010
PARADIGM Applied to GBM Data
Module 7 – Pathway & Network Analysis
bioinformatics.ca
PARADIGM: Good & Bad News
• Bad News
–
–
–
–
Distributed in source code form & hard to compile.
No pre-formatted pathway models available.
Scant documentation.
Takes a long time to run.
• Good News
– Reactome cytoscape app supports PARADIGM (alpha testing).
– Includes Reactome-based pathway models.
– We have improved performance; working on further
improvements.
Module 7 – Pathway & Network Analysis
bioinformatics.ca
Pathway/Network Database URLs
• KEGG
– http:// www.genome.jp/kegg
• BioGrid
– http:// www.thebiogrid.org
• Reactome
– http:// www.reactome.org
• Pathway Commons
– http://www.pathwaycommons.org
• Wiki Pathways
– http://wikipathways.org
Module 7 – Pathway & Network Analysis
bioinformatics.ca
De novo network construction &
clustering
• GeneMANIA
– http://www.genemania.org
• HotNet
– http://compbio.cs.brown.edu/projects/hotnet/
• HyperModules
– http://apps.cytoscape.org/apps/hypermodules
• Reactome Cytoscape FI App
– http://apps.cytoscape.org/apps/reactomefis
Module 7 – Pathway & Network Analysis
bioinformatics.ca
Pathway Modeling
• CellNetAnalyzer
– http://www.ebi.ac.uk/research/saez-rodriguez/software
• NetPhorest/NetworKIN
– http://netphorest.info, http://networkin.info
• ARACNe
– http://wiki.c2b2.columbia.edu/califanolab/index.php/Software/
ARACNE
• PARADIGM
– http://paradigm.five3genomics.com/
Module 7 – Pathway & Network Analysis
bioinformatics.ca
We are on a Coffee Break &
Networking Session
Module 7 – Part III
bioinformatics.ca