Lect19_TumorSeq

Download Report

Transcript Lect19_TumorSeq

Tumor Genome Sequencing
Xiaole Shirley Liu
STAT115, STAT215, BIO298, BIST512
Cancer
• Cancer will affect 1 in 2 men and 1 in 3 women in
the United States, and the number of new cases of
cancer is set to nearly double by the year 2050.
• Cancer is a genetic disease caused by mutations
in the DNA
• Clinically tumors can look the same but most
differ genetically.
2
Different Sequencing Approaches
• Capture-seq ($400-600)
– Could focus well known mutations
• Exome-seq ($700-2K)
– All the exons in genes; promoters and LncRNA genes?
• RNA-seq ($500-2K)
– Expression and mutations together, miss anything?
• Whole genome sequencing ($3-4K)
– Majority of mutations non-coding, function unknown
– Better at detecting structural changes (translocations,
fusions)
– Cost-vs-benefit balance
3
Two Major Cancer Genome Projects
• TCGA: The Cancer Genome Atlas (US)
–
–
–
–
> 30 cancer types and > 10K tumor samples
Primary tumors, fewer death events
Genome, transcriptome, DNA methylome, proteomics
Rigorous tumor sample QC, consistent profiling
platform
• ICGC: International Cancer Genome
Consortium (11 countries)
– 20 cancer types * 500 tumor samples each
4
Tumor Gene Expression
• Microarrays or RNA-seq
• Data analysis?
• Differential expression between cancer and
normal
• Cluster the tumor samples into sub-types
– Consensus clustering: sampling genes or tumors, get
robust clustering
• Predict patient outcome (survival or recurrence)
Break
5
Survival Analysis
• Do patients receiving the treatment live longer?
• Are smokers more likely to have cancer currence
• Censored data: the value of a measurement or
observation is only partially known
– Some patients left the study
– Study concluded
6
Survival Without Censoring
7
Survival With Censoring
8
Kaplan Meier Curve
• More individuals in each group, better separation
of the groups, better p-value
9
Log Rank Test
10
Log Rank Test
11
More Variables
• 50-signature?
• Logistic regression:
– Estimate odds ratio: ratio of proportions
– Linear combination of all the genes to separate
outcome (0, 1).
• Cox Regression
– Estimate hazard ratio: ratio of incidence rates
– Models the effect of covariates on the hazard rate but
leaves the baseline hazard rate unspecified
12
Use Cox Regression to Separate
Two Groups by Gene Signature
13
Caution About Gene
Signature’s Predictive Power
Break
14
Mutations in the Tumor Genome
• Help us identify important genes for
tumorigenesis and cancer progression
• Drivers – a.k.a gatekeepers, mutations that cause
and accelerate cancers
• Passengers – Accidental by-products and
thwarted DNA-repair mechanisms
• Recurrent mutations on genes or pathways are
likely drivers
15
High Throughput Driver Detection
• Differential gene expression
• Copy number aberration (CNA) or variation
(CNV) using CGH, tiling or SNP arrays
16
Comparative genomic hybridization (CGH)
17
GISTIC
• Gscore: frequency of occurrence and the amplitude of the
aberration
• Statistical significance evaluated by permutation
• FDR adjust for multiple hypothesis testing
18
GATK
• https://www.broadinstitute.org/gatk/guide/best-practices
FASTQ-> BAM
BAM->VCF
Annotate
19
MAF and VCF Formats
• VCF (GWAS format) and MAF (TCGA format)
• Both can annotate somatic mutations and germline
variants
• Tab delimited text file
• CHROM, POS, ID (SNP id, gene symbol, or ENTREZ
gene id), REF (reference seq), ALT (altered sequence),
QUAL (quality score), FILTER (PASS vs “q10;s50”
quality <=10, <=50% samples have data here), INFO
(allele counts, total counts, number of samples with data,
somatic or not, validated, etc)
20
Example of a Cancer Genome
Mutations Profile
• Circos Plot: how messed up a cancer genome is
21
Total alterations affecting proteincoding genes in selected tumors
22
Vogelstein et al, Science 2013
Somatic Mutation Frequency
in 3K Tumor-Normal Pairs
• Typical tumors: median 45 mutations / tumor
• More mutations for tumors facing outside
Break
23
TS vs Oncogenes, GoF vs LoF
• Tumor suppressors vs oncogenes
• Gain of Function (GoF) or Loss of Function
(LoF) mutations
– Phenotypes
• How to tell?
– From mutation patterns
– From expression patterns
– Functional studies
• Some genes can be both TS and oncogenes
24
Mutation Rate Heterogeneity
• Mutation rate correlated with replication timing,
gene expression, and gene length
• Tumor evolution and selection
25
Lawrence et al, Nat 2013
Recurrent Mutations
• Known
• Novel
clear
cancer
assoc
• Novel
26
Lawrence et al, Nat 2014
How Much Should We Sequence?
• Need ~200 patients for 20% mutation rate, ~550 pts for
10%, ~1200 pts for 5% mutation rate.
• Most driver mutations have been found, pressing need in
basic cancer research to study their function
• Biggest surprise: mutations on chromatin regulators
–
–
–
–
> 50% new and strong cancer driver genes
Oncogenes: DNMT3A, IDH1
Tumor Suppressor: MLL, ATRX, ARID1A, SNF5
Both: EZH2
• Sequencing metastasized or drug resistant tumors might
yield insights on tumor progression
27
Resources
• MSKCC CBioPortal
– GUI interface for experimental biologists
• Broad FireHose
– API for accessing processed TCGA data
• UCSC CGHub
– API for accessing raw and processed cancer data
• Sanger COSMIC
– Catalog of Somatic Mutations in Cancer
• Many also provide software tools
28
Summary
•
•
•
•
•
•
Different sequencing approaches
Gene Expression, tumor sub-typing
Survival analysis: KM vs Cox Regression
Different mutation types and distributions
Gain or loss of function mutations
Tumor suppressor vs oncogenes
29
Acknolwedgement
•
•
•
•
•
•
•
•
Aleksandar Milosavljevic
Kristin Sainani
Linda Staub & Alexandros Gekenidis
Yin Bun Cheung, Paul Yip
John Pack
Cheng Li
Xujun Wang
Peng Jiang
30