fl dlbcl - Center for Cancer Systems Biology
Download
Report
Transcript fl dlbcl - Center for Cancer Systems Biology
CBIO243: Principles of Cancer
Systems Biology
Sylvia Plevritis, PhD
Course Director
Melissa Ko
Teaching Assistant
Fuad Nijim
CCSB Program Manager
March 31, 2014
Goals of CBIO243
• Introduce major principles of cancer systems
biology that integrate experimental and
computational biology.
• Gain familiarity with methods to analyze highdimensional and highly-multiplexed data in
order to synthesize biologically and clinically
relevant insights and generate hypotheses for
functional testing.
Biological
Sciences:
• Cancer Biology,
• Hematology,
• Immunology,
• Genetics,
• etc.
Computational
Sciences:
• Bioinformatics,
• Engineering,
• Computer Science,
• Physics,
• Statistics,
• etc.
Components of Cancer Systems Biology
Approach: Integrative Analysis
Cancer Research Goal:
Drug Targets
Drug Resistance
Combination
Therapies
Tumor Evolution
Cancer Drivers
Metastasis
Tumor Heterogeneity
Cancer Stem Cells
EMT
Personalized Medicine
Biomarkers
Other ______
Experimental
Sciences:
Sequencing
Methylation
Gene Expression
CNV
TMA
Proteomics
Single Cell Analysis
LCM, Sorted Cells
Drug Screening
Other ______
_______
Computational
Sciences:
Statistical
Regression
Machine Learning
Bayesian Analysis
Boolean Analysis
ODE/PDE
Network
Reconstruction
Pathway Analysis
Other _____
________
Functional Validation
Topics Covered
•
•
•
•
•
•
•
•
Basic principles of molecular biology of cancer
Experimental high-throughput technologies
Design of perturbation studies, including drug screening.
Overview of publically available datasets, including GEO,
TCGA, CCLE, and ENCODE
Online biocomputational tools, including selected
accessible tools from the NCI Center for Bioinformatics
Network reconstruction from genomic data
Application of systems biology to identifying drug targets
Application of systems biology to personalized medicine
Grading
• Weekly paper review/class participation (30%)
• Project Presentations (20%)
• Final Project Report (50%): 6-7 page written
report and oral presentation demonstrating
the understanding of key concepts in cancer
systems biology research.
Weekly Reading Review
• Summarize objective/hypothesis, the data, the
controls, results and the published
interpretations.
• Discuss whether the authors' conclusions
were justified, and suggest improved analyses
and/or future research.
• Describe relevance to cancer systems biology,
and any gaps in training to fully understand
paper.
First Reading Assignment
• Chuang, H.-Y., Lee, E., Liu, Y.-T., Lee, D., &
Ideker, T. (2007). Network-based
classification of breast cancer metastasis.
Molecular Systems Biology.
• Akavia, U. D., Litvin, O., Kim, J., SanchezGarcia, F., Kotliar, D., Causton, H. C.,
Pochanard, P., et al. (2010). An Integrated
Approach to Uncover Drivers of Cancer. Cell,
143(6), 1005–1017.
Background Material
• Overview of Cancer
– Hannahan D, Weinberg RA. Hallmarks of Cancer:
The Next Generation, Cell 14(5), 2011.
• Overview of Molecular Biology
– Kimball’s Biology Pages
– http://home.comcast.net/~john.kimball1/Biology
Pages
Background Material
• Visualization of Genomic Data
• Schroeder MP, et al, Visualizing multidimensional
cancer genomics data, Genome Medicine, 5:9, 2013
• Overview of Programming
– R/Bioconductor
• http://www.r-project.org/
• www.cyclismo.org/tutorial/R/
– Python
• http://www.python.org/
• https://developers.google.com/edu/python/
Center for Cancer Systems Biology
(ccsb.stanford.edu)
• Monthly Seminar Series
–
–
GENOMIC BIOMAKERS OF CANCER PREVENTION AND TREATMENT
Friday April 11th at 11 am (Alway Building, Room M114) Andrea Bild, Department of Pharmacology
and Toxicology, University of Utah
• Annual Symposium (Friday October 17, 2014)
• R25T Training Grant
– Two year postdoctoral training fellowship
Cancer as a Complex System
Pienta et al, Ecological Therapy for Cancer: Defining Tumors Using an Ecosystem Paradigm
Suggests New Opportunities for Nove Cancer Treatments, Translational Oncology, 2008,
1(4):158-164.
Multiscale View of Cancer
•
•
•
•
•
•
•
Genes and proteins
Complex signaling and regulatory networks
Multiple cellular processes
Micro-environment
Host systems
Environmental factors
Population dynamics
Initiation
Progression
Time - Progression
Metastasis
Recurrence
Hallmarks of Cancer
Hanahan, D., & Weinberg, R. A. (2011). Hallmarks of Cancer: The Next Generation. Cell, 144(5), 646–674.
Hanahan, D., & Weinberg, R. A. (2011). Hallmarks of Cancer: The Next Generation. Cell, 144(5), 646–674.
Hanahan, D., & Weinberg, R. A. (2011). Hallmarks of Cancer: The Next Generation. Cell, 144(5), 646–674.
http://www.cell.com/image/S0092-8674(11)00127-9?imageId=gr2&imageType=hiRes
•
•
•
•
Network types
Protein-protein
Protein-DNA
miRNA-RNA
Transcriptional
(expression) networks
• Signaling networks
Sachs et al. http://www.sciencemag.org/content/308/5721/523.full
The Multiscale Challenge
• Many components and interactions of
the “cancer system” are known
• Linkages between global dynamics
and phenotypic properties from local
interactions are not well known
20
http://circ.ahajournals.org/content/123/18/1996/F5.expansion.html
Goals of Cancer Systems
Biology Research
• To derive a comprehensive understanding of
cancer’s complexity by integrating diverse
information to:
– Identify cellular networks and cell-cell interactions
that drive cancer initiation and progression
– Identify potential therapeutic targets and mechanisms
of action
Principles in
Cancer Systems Biology Research
• Cancer networks are dynamic and response to
genetic variants, epigenetics and the
microenvironment
• Tumors may not be a random collection of
malignant cells but cells that may be related
through processes of developmental biology
Cancer Systems Biology
The Past
Experimentation
Computation
Cancer Systems Biology
The Present
Experimentation
Computation
Cancer Systems Biology
The Future
Experimentation
Computation
Objective: Identify genes and networks
differentially expressed in lymphoma
transformation
FL
•
DLBCL
Glas et al. “Gene expression profiling in follicular lymphoma to assess clinical aggressiveness
and to guide the choice of treatment.” Blood 2005
– 24 paired samples (12 FL/12 DLBCL)
– 88 FL/DLBCL arrays
» 30 DLBCL
» 40 FL-transforming (FL_t)
» 18 FL-non-transforming (FL_nt)
• Average Fold Change (AFC)
• Pro: Easy
• Con: Does not account for
variance
• p-value, based on t-test statistic
• Pro: Easy, accounts for
variance
• Con: Does not account for
the problem of multiple
hypothesis testing
-Log10(p-value)
Identify differentially expressed genes
Log2(Average Fold Change)
Statistical Analysis of Microarrays (SAM)
Address the problem of Multiple Hypothesis Testing:
Suppose measure 10,000 genes and nothing changes.
At the %1 significance level, 100 genes could be selected as differentially
expressed but all would be false positives.
observed
SAM corrects for this by
computing the False
Discovery Rate, based on
permutation testing.
expected
http://www-stat.stanford.edu/~tibs/
GOminer
• Identify enrichment in Gene Ontology (GO) terms
based a hierarchy describing biological process; cellular
component; molecular function
Genes significantly differentially expressed in compact vs. non-compact
tumors are related to cell death, Cell-to-cell signaling and interaction, cellular
assembly and organization, DNA replication and Cellular movement
http://discover.nci.nih.gov/gominer/
Gene set enrichment analysis
(GSEA)
• Evaluate enrichment of curated gene
sets, such as
– Pathways
– Genes that share a motif
– Genes at a similar chromosomal
location
– Computationally predicted gene
sets
– Your own favorite list of genes
• Evaluating related genes together
adds statistical power
• http://broad.mit.edu/gsea
GSEA on Lymphoma Data
•
Myc targets up-regulated, in
agreement with Myc up-regulation
found by SAM
•
GSEA detects ~200 sets of
differentially expressed genes at low
FDR
– Many metabolic pathways upregulated in DLBCL
– Myc target genes significant
•
In general, GSEA produces many
“generic” gene sets
– many metabolic
– many a consequence of
aggressive phenotype
– no graphical view of pathways
DLBCL
Legend
FL
UP
DOWN
Overlap expression levels on
canonical pathways
IPA, Ingenuity Pathway Analysis (www.ingenuity.com)
Cellular assembly
& organization network
Cellular assembly
& organization network
• Expand network
using
interactions from
the literature
• Visualization
using cellular
localization
IPA links to literature
Protein-protein Interaction Networks
Protein-protein interaction networks
http://string-db.org
String-db.org - example
• DNA repair genes
BARD1
FANCL
POLD3
TOPBP1
BLM
FEN1
POLE
TREX1
BRCA1
GMNN
POLE2
UNG
BRIP1
ING2
PRIM2A
USP1
DCLRE1A
MLH3
RAD51A
DCLRE1B
MSH2
RAD54B
DDX11
MSH5
RECQL4
DNA2L
MSH6
RFC3
EXO1
PARP2
RFC4
FANCG
PCNA
RPA2
Inferring Gene
Regulatory Networks
Useful non-technical review:
“Computational methods for discovering
gene networks from expression data”
Lee & Tzou
Single gene focus is limiting
individuals
gene A
FL
induced
repressed
DLBCL
Gene interaction is more powerful
A UP
B DOWN
individuals
gene A
gene B
FL
induced
repressed
DLBCL
FL
Interaction of gene clusters
X UP
Y DOWN
individuals
Module
X
Module
Y
FL
induced
repressed
DLBCL
FL
Inferring Gene Regulation
samples
gene1
gene2
Module1
geneN
Module2
Module3
Inferring Gene Regulation
samples
Mod1
Mod3
Mod6
Mod8
Average expression
of each module
Key Idea of Regulatory Module Networks
• Look for a set of regulatory factors that, in combination,
predict a gene’s expression level
• Regulatory factors can include:
–
–
–
–
–
mRNA level of regulatory proteins
Genotypic factors (SNPs, CNVs)
Epigenetic factors (methylation status)
TF binding (measured by ChIP-seq)
…
Transcription factors, signal transduction
proteins, mRNA binding proteins,
chromatin modification factors, …
• Factors that robustly predict a target’s expression across
different experiments are inferred to be its regulators
Segal et al., Nature Genetics 2003
Computational Derived
Regulatory Module
Group of co-expressed genes are driven by
Gene A
Off
a computationally derived
On
Gene B
transcriptional regulatory program,
derived from a candidate list of
regulators.
On
Module
genes
Off
Regulatory
program
Segal E et al, Nature Genetics 2003.
Core module network
of FL transformation
Gentles A et al, Blood 2009
LPS 1.14 * ModuleA 0.72 * GFL3027 1.35 * GFL2738
Integration with survival data
• Module A is single most predictive of survival data by Cox
regression (bad prognosis in FL)
• Define a linear predictor of survival:
– LPS=1.14*ModuleA + 0.72*GFL3027 – 1.35*GFL2738
Bad Part: ESC like
expression
Good Part: TGFB
signaling
Gentles A et al, Blood 2009
Survival based on LPS
Gentles A et al, Blood 2009
DATABASES
• TCGA
• CCLE
• ENCODE
The Cancer Genome Atlas (TCGA)
• Phase I: Initiated in 2005 by the National Cancer
Institute and National Human Genome Research
Institute to catalog genetic mutations causing
cancer, using genome sequencing; focused on
GBM, lung and ovarian cancer
• Phase II: Expanded to 20-25 different cancer
types, complement genome sequencing with
genomic characterization, including gene
expression profiling, copy number variation, DNA
methylation, miRNA
TCGA:Cancer measured at multiple scales
– mRNA & miRNA
expression
– Copy number
– DNA Methylation
– Mutation (NGS)
– Pathology images
– Medical Images
– Treatment
– Survival Outcome
Number of Patients with Samples
1000
900
800
700
600
500
400
300
200
100
0
TCGA Cancer Types
TCGA Organization
TSS:Tissue Source
Sites
BCR: Biospecimen
Core Resources
DCC: Data
Coordinating
Center
GCC: Genome
Characterization
Centers
GSC: Genome
Sequencing Center
CGSub: Cancer
Genomics Hub
GDACS: Genome
Data Analysis
Centers
•
Major
TCGA
Publications
Comprehensive molecular characterization of human colon and rectal cancer.
Nature. 487 (7407):330-337, 2012.
–
•
•
Mutations in ARlD1A, SOX9, FAM123B/WTX;, IGF2; mutations in WNT pathway
Comprehensive genomic characterization of squamous cell lung cancers. Nature.
489 (7417):519:525, 2012.
Comprehensive molecular portraits of human breast tumors. Nature. 490
(7418):61-70, 2012.
- Mutations in ESR1, GATA3, FOXA1, XBP1, and cMYB.
•
Integrated genomic analyses of ovarian carcinoma. Nature. 474 (7353):609-615,
2011.
–
•
•
•
Mutations in TP53 occurred in 96% of the cases studied; mutations in BRCA1 and BRCA2 occurred
in 21% of the cases
An integrated genomic analysis identifies clinically relevant subtypes of
glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR and NF1.
Cancer Cell. 17 (1):98-110, 2010.
Identification of a CpG Island Methylator Phenotype that Defines a Distinct
Subgroup of Glioma. Cancer Cell. 17 (5):510-522 , 2010.
Comprehensive genomic characterization defines human glioblastoma genes and
core pathways. Nature. 455 (7216):1061-1068, 2008.
–
Mutations in NF1, ERBB2, TP53, PlK3R1
UCSC Cancer Browser –
Chromosome View
https://genome-cancer.ucsc.edu
UCSC Cancer Browser
Gene View
Cancer Browser – Survival Analysis
Cancer Cell Line Encyclopedia (CCLE)
• The Cancer Cell Line Encyclopedia (CCLE) project
is a collaboration between the Broad Institute,
and Novartis to conduct a genetic and
pharmacologic characterization of a large panel
of human cancer cell lines
• Link distinct drug response to genomic patterns
and to translate cell line integrative genomics into
cancer patient stratification.
• Public access analysis and visualization of DNA
copy number, mRNA expression and mutation
data for about 1000 cell lines.
http://www.broadinstitute.org/ccle/home
Cellular Information Processing
ENCODE
http://genome.ucsc.edu/ENCODE/index.html
ENCODE
Summary
•
•
•
•
•
•
•
•
Basic principles of molecular biology of cancer
Experimental high-throughput technologies
Design of perturbation studies, including drug screening.
Overview of publically available datasets, including GEO,
TCGA, CCLE, and ENCODE
Online biocomputational tools, including selected
accessible tools from the NCI Center for Bioinformatics
Network reconstruction from genomic data
Application of systems biology to identifying drug targets
Application of systems biology to personalized medicine