CancerBrowser_COAT2012

Download Report

Transcript CancerBrowser_COAT2012

The Cancer Genome Browser
Sofie Salama
COAT-PhD Summer School 2012
1
The Cancer Genome Browser
• OUTLINE
– Slide show to introduce the Cancer Genomics Browser
• What’s there?
• How to visualize the data?
• Tools
– Live Demo
• Basic setup
• Breast cancer data
– Using signatures
– Microarray vs RNA-Seq
– Comparing across datasets
• GBM data
– Genesets
– What genes correlate with phenotypes?
– Playtime!
2
UCSC Genome Browser
• Base level to full
genome display
capability
• ENCODE
• Human sequence
variation
• Whole genome
association studies
• Human genetic
and disease
related genome
annotation
https://genome.ucsc.edu
3
Large-scale Medical Genomics Datasets
New issues arise to visualize high-throughput cancer genomics data:
data security and access control, sample cohort, multi-analytes, and
clinical and phenotypic information.
https://genome-cancer.ucsc.edu
4
UCSC Cancer Genomics Browser
• Simultaneously display patient genomic and
clinical data from a cohort of samples
• Base level to full genome display capability
• Multiple studies
• Growing list of published studies, including
public-tier TCGA data
• Integrated with popular UCSC Genome Browser
and its vast store of genomic information
Zhu J et. al Nature5Methods. 2009
Sanborn JZ et.al. Nucleic Acids Res. 2010
New UCSC Cancer Browser Portal
genome-cancer.ucsc.edu
User Interface: A portal to display high throughput data sets
genome-cancer.ucsc.edu
Teresa Swatloski, Brian Craft, Mary Goldman
User Interface Features
help menu
resize
panels
select dataset
to view
link to tumor image
browser
link to human
genome browser
view in
chromosome
mode
view in
gene
mode
user sign in
toggle on/off
RefSeq genes
position or
gene search
bar
configure
genesets
configure
genomic
signatures
genome-cancer.ucsc.edu
Teresa Swatloski, Brian Craft, Mary Goldman
Dataset selection showing TCGA breast cancer data
TCGA breast cancer datasets
• Gene expression, copy number, DNA
Methylation, RPPA, Paradigmlite
• TCGA clinical data
Teresa Swatloski
Genomic and phenotypic data
heatmaps
TCGA glioblastoma multiforme (GBM) copy number Gistic2 estimate • N=538
Heatmap
Box Plot
Proportions
Adjust
Copy Number (Gistic2)
Features
Genomic data
genome-cancer.ucsc.edu
Clinical data
Individual dataset layout
TCGA glioblastoma multiforme (GBM) copy number Gistic2 estimate • N=538
Heatmap
Box Plot
Proportions
Adjust
Copy Number (Gistic2)
Features
Samples
Genomic locations / Genes
Genomic data
genome-cancer.ucsc.edu
Clinical data
Genomics Heatmap
TCGA glioblastoma multiforme (GBM) copy number Gistic2 estimate • N=538
Heatmap
Box Plot
Proportions
Adjust
Featu
Samples
amplification
deletion
Clinical Heatmap
Primary solid tumor
• Multiple clinical features
• Clinical data encoded in
color
Samples
Solid tissue normal
Metastatic
sample_type
days_to_last_followup
Sample sorting determined by clinical
data
TCGA glioblastoma multiforme (GBM) copy number Gistic2 estimate • N=538
Heatmap
Box Plot
Proportions
Adjust
Copy Number (Gistic2)
Features
Samples
• Sample (i.e. vertical) order is determined by the clinical data on the right
• The samples is always sorted by clinical features
• Tie break using subsequent clinical features
genome-cancer.ucsc.edu
Zoom in to See Individual Sample
drag zoom
genome-cancer.ucsc.edu
slider
Individual Dataset Control
click to show dataset
detail
heatmap
view
box plot
summar
y view
proportions
summary
view
remove
dataset
adjust display
coloring
genomic
heatmap
configuration window for
clinical variables, sample
subgrouping and statistics
clinical
heatmap
Teresa Swatloski
Summary Views
Heatmap View - Amplified / Deleted Regions
Box Plot Summary View
Proportions Summary View
DNA Copy Number Profile Summary View
glioblastoma
multiforme
breast
carcinoma
lung
squamous
cell
TCGA CNV
DNA Copy Number Profile Summary View
glioblastoma
multiforme
EGFR
CDKN2A,
CDKN2B
breast
carcinoma
lung
squamous
cell
TCGA CNV
Genes View Mode
genome-cancer.ucsc.edu
“Genes” Configuration
Currently displayed gene list
1
Three ways to add a gene list
2
3
Type or copy and paste user
defined genes
genome-cancer.ucsc.edu
20
Teresa
Swatloski
Genes view to see the PAM50 intrinsic gene
expression subtypes in TCGA Breast data
Basal
LuA
LuB
Her2-like
Normal-like
PAM50: Parker et al.,
Journal of Clinical
Oncology (2009)
Same thing with RNA-Seq Data
Her2
Basal
Tumor
LumA
LumB
Solid
normal
Online statistical tests compare two
subgroups
TCGA glioblastoma multiforme (GBM) copy number Gistic2 estimate • N=538
Heatmap
Box Plot
Proportions
Adjust
Copy Number (Gistic2)
Features
Samples
Subgroup samples
genome-cancer.ucsc.edu
Online statistical tests compare two
subgroups
TCGA glioblastoma multiforme (GBM) copy number Gistic2 estimate • N=538
Heatmap
Box Plot
Proportions
Adjust
Copy Number (Gistic2)
Features
Samples
p
values
Subgroup samples
genome-cancer.ucsc.edu
Sample subgroup configuration
click to view detail
and use the variable
to subgroup samples
variables used in
defining
subgroups
“Active Feature List”
area
subgroup
1
subgroup
2
perform statistical tests to
compare subgroup1 and
subgroup2
Compare subgroups using the summary
view
EGFR amplification in
GBM is largely in the non
CpG island DNA
methylator samples (non
G-CIMP)
methylator samples in GBM is
largely proneural by gene
expression, also from younger
patients, with better survival
Evaluate Genomic Signature on the
Browser
B. Computed signatures online
-> approximate prediction
A. Enter signature as an algebraic expression
Evaluate Genomic Signature on the
Browser
• 21 gene signature predicts rate of recurrence at 10 yr in ER+ patients treated with TAM (Paik
2004)
• Genomic signature online approximation: higher score -> higher likelihood of recurrence; low
score -> lower likelihood of recurrence
Evaluate Genomic Signature on the
Browser
• Browser view of ER+ patients in a preoperative chemotherapy study dataset
• Signature score correlates with pathCR: the paradox that ER+ patient who is more likely to have
recurrent disease in 10 years treated with TAM is also more likely to respond to chemotherapy
Genomic Signature Configuration
Current signatures
1
Three ways to add a genomic signature
2
3
Enter signature as an algebraic expression
Such as: + TP53 – 0.25* ERBB2
Teresa Swatloski
User Support
[email protected]
31
Mary Goldman, Teresa Swatloski
Web API
Create a url to specify a view to the cancer browser
•base: https://genome-cancer.ucsc.edu/hgHeatmap/#?
•data track(s): comma separated gene names
•display mode
•gene list: coma separated gene names
•chromosomal position
•genomic signature: e.g. +TP53-0.25*ERBB2
Examples
•dataset=vijver2002&pos=chr2:123767566-chr2:187943340
•dataset=ucsfNeveCGH&displayas=geneset&gene_list=TP53,ERBB2
Documentation
https://genome-cancer.soe.ucsc.edu/proj/site/help
Brian Craft, Mary Goldman
User Account and Security
genome-cancer.ucsc.edu
Brian Craft
cgData: Cancer Genomic data
specification
•
•
Gene expression, copy number, RPPA, DNA methlylation,
siRNA viability, phenotypes, clinical data
Support large-scale genomic data repository
- Currently supports Cancer Browser
•
•
•
- Plan to support automated data analysis pipeline
“Solve” (address) common data linking problem
Meta data tracking
Once data in this specification, automated data ingestion to
UCSC Cancer Browser
Kyle Ellrott
Cancer Browser Updates
•
•
•
•
Current improved version launched January,
2012
Monthly data freeze
Latest freeze data viewable on the Cancer
Browser within a few days
July, 2012 – Added ability to download
processed datasets and improved user
interface for clinical features, subgrouping
and statistics
Data freeze 2012-02-28 summary (sample
number)
Summary
• Simultaneously display patient genomic and
clinical data from a cohort of samples
• Multiple studies data visualization
• Base level to full genome, and genesets display
capability
• cgData data repository driven
• Monthly data freeze and version control
• User account
• Project-specific access-control
• Single signon portal
• Provide web API for linking
[email protected]
37
DCC,
Firehose
UCSC Cancer
Genomics
Browser
converter
cBio
UCSC
cgData
Repository
PARADIGM
pathway
analysis
UCSC Next-gen Sequencing
Data Analysis
Bam files
•DNA-seq (bambam, bridget)
•mutation, allelic-specific copy number,
structural rearrangement
•Combined RNA/DNA analysis
•RNA editing
Clinical
Predictors
(TopModel)
Mutation call
comparison
Acknowledgment
UCSC Cancer Genomics Group Collaborators
The Cancer Genome Atlas
Brian Craft
Stand Up To Cancer
Teresa Swatloski
Intl. Cancer Genomics Consortium
Mary Goldman
Kyle Ellrott
Erich Weiler
Chris Wilks
Singer Ma
Christopher Szeto
Sofie Salama
Mia Grifford
Sam Ng
Ted Goldstein
Dan Carlin
Daniel Zerbino
Melissa Cline
Mark Diekhans
Josh Stuart
David Haussler
ISPY consortium
MSKCC
LINCS consortium
Christopher Benz, Buck Institute
Laura Esserman, UCSF
Joe Gray, OHSU
Eric Collisson, UCSF
Gordon Mills, MDACC
Rachel Schiff, BCM
Funding Agencies
NCI/NIH, NHGRI
American Association for Cancer Research
39
The Cancer Genome Browser
• OUTLINE
– Slide show to introduce the Cancer Genomics Browser
• What’s there?
• How to visualize the data?
• Tools
– Live Demo
• Basic setup
• Breast cancer data
– Using signatures
– Microarray vs RNA-Seq
– Comparing across datasets
• GBM data
– Genesets
– What genes correlate with phenotypes?
– Playtime!
40
cgData Packages
clinical data1
(FFPE, timepoint)
meta-data
genomic data A
(CNV)
meta-data
clinical data 2
(patient, age,..)
meta-data
genomic data B
(RPPA)
meta-data
Most likely
your data
files
Need to add
meta data file
cgData Packages
clinical data1
(FFPE, timepoint)
TCGA-01-ABCD-01A
clinical data 2
(patient, age,..)
TCGA-01-ABCD
idMap
(TCGA BRCA)
genomic data A
(CNV)
patient
sample sample
genomic data B
(RPPA)
aliquot aliquot
TCGA-01-ABCD-01A-EG
TCGA-01-ABCD-01A-JH
cgData Packages
clinical data1
(FFPE, timepoint)
clinical data 2
(patient, age,..)
Most likely
your data
files
Need to add
meta data file
idMap
(TCGA BRCA)
genomic data A
(CNV)
assembly
(hg18)
Identifiers used in
data files
genomic data B
(RPPA)
probeMap B
(antibody)
probeMap B
parent-child
relationships
Mostly likely
already in
UCSC
cgData library