Transcript Slides

Molecular Biology of Cancer AND
Cancer Informatics (omics)
David Boone
Outline
• What is cancer and how is it different from
other diseases?
• Major players in tumor initiation (Understand
one before many)
Molecular
biology of
cancer
– Oncogenes
– Tumor suppressors
• Types of mutations
• The many – informatics
– arrays
– sequencing
– How to read a heatmap
• Personalized Medicine/cancer genomics WORKS
– examples from breast cancer.
• Summary
Cancer
Informatics
(omics)
Goals
• Understand why cancer treatment is difficult.
• Learn how changes in genome and/or expression
can affect cell proliferation/death.
– looking at signaling schematic be able to identify what a
certain mutation might do.
• Define oncogenes and tumor suppressors.
• Define two ways of gathering ‘omics’ data
• Be able to analyze heat maps and Kaplan Meier
Curve
• Describe how informatics is impacting personalized
medicine.
Outline
• What is cancer and how is it different from
other diseases?
• Major players in tumor initiation (Understand
one before many)
Molecular
biology of
cancer
– Oncogenes
– Tumor suppressors
• Types of mutations
• The many – informatics
– arrays
– sequencing
– How to read a heatmap
• Personalized Medicine/cancer genomics WORKS
– examples from breast cancer.
• Summary
Cancer
Informatics
(omics)
How is cancer different from a
bacterial infection?
• Term used to describe 100s or 1000s of distinct neoplastic
disorders caused by your own cells growing out of control.
Normal
Cancer
Metastasis
Spreading of cancer from one organ to another
The many steps of Metastasis occur through clonal
selection and the accumulation of different mutations
Fidler 2003 nature reviews cancer
Outline
• What is cancer and how is it different from
other diseases?
• Major players in tumor initiation (Understand
one before many)
Molecular
biology of
cancer
– Oncogenes
– Tumor suppressors
• Types of mutations
• The many – informatics
– arrays
– sequencing
– How to read a heatmap
• Personalized Medicine/cancer genomics WORKS
– examples from breast cancer.
• Summary
Cancer
Informatics
(omics)
Structure = function
DNA
RNA
PROTEIN
The Central Dogma of Molecular
Biology
DNA
RNA Protein
• Structure is very important
– replication
– transcription/translation
• info to create proteins
• 23 chromosomes
• 3.2 billion base pairs
• ~20000 genes (1.5% of genome)
Question
• If all of our cells (mostly) have identical DNA
sequences that are densely packed in nuclei
then how do we have the different types of
cells and organs necessary for human life?
Regulation of gene expression
Central Dogma
No T’s replaced
with U’s
Translation start
Translation stop
Important genes in cancer
• Oncogenes- genes that encode for
proteins that have the potential to
cause cancer.
– like a gas pedal they make the cells
divide more frequently or survive when
they shouldn’t.
– turned on by activating mutations,
amplifications, and overexpression.
– ex. c-Myc, IGF1R, Ras (growth factors,
signaling molecules, transcription
factors)
Survival
• IGF1R is an important oncogene in BC
– Can you pick out any other potential oncogenes?
Important genes in cancer
• Tumor Suppressors- genes that encode for
proteins that prevent tumor development.
– like a break pedal they prevent proliferation and
initiate cell death if there are problems like DNA
damage. They keep proliferation and oncogenes in
check.
– Stop the cell cycle, induce apoptosis, DNA repair, etc.
– turned off by inactivating mutations, deletions, and
lack of expression.
– ex. p53, BRCA1/2, PTEN, RB
too much
too much
Myc (oncogene) is one of the most frequent amplifications in BC.
p53 (tumor suppressor) is one of the most frequently mutated or lost genes in BC.
Outline
• What is cancer and how is it different from
other diseases?
• Major players in tumor initiation (Understand
one before many)
Molecular
biology of
cancer
– Oncogenes
– Tumor suppressors
• Types of mutations
• The many – informatics
– arrays
– sequencing
– How to read a heatmap
• Personalized Medicine/cancer genomics WORKS
– examples from breast cancer.
• Summary
Cancer
Informatics
(omics)
Genetic alterations (All mutations are
not equal)
• Point mutations
Frameshift mutation
DNA
RNA
Protein
K
A
*
Genetic alterations
• Point mutations
• Duplications or amplifications
Myc
Myc
Myc
Genomic alterations
• Point mutations
• Duplications or amplifications
• Deletions
p53
Genomic alterations
•
•
•
•
Point mutations
Duplications or amplifications
Deletions
Insertions/ Inversions
Genomic alterations
•
•
•
•
•
Point mutations
Duplications or amplifications
Deletions
Insertions/ Inversions
Translocations
Expression alterations outside of the
genome
• Overexpression.
– ex. Myc (high mitogenic signaling results in high
expression of unmutated Myc.
• Epigenetic silencing
– ex. methylation
Outline/Summary Molecular Biology
of Cancer
• What is cancer and how is it different from
other diseases?
• Major players in tumor initiation (Understand
one before many)
Molecular
biology of
cancer
– Oncogenes
– Tumor suppressors
• Types of mutations
• The many – informatics
– arrays
– sequencing
– How to read a heatmap
• Personalized Medicine/cancer genomics WORKS
– examples from breast cancer.
• Summary
Cancer
Informatics
(omics)
Outline
• What is cancer and how is it different from
other diseases?
• Major players in tumor initiation (Understand
one before many)
Molecular
biology of
cancer
– Oncogenes
– Tumor suppressors
• Types of mutations
• The many – informatics
– arrays
– sequencing
– How to read a heatmap
• Personalized Medicine/cancer genomics WORKS
– examples from breast cancer.
• Summary
Cancer
Informatics
(omics)
Now we know a lot about individual
genes, but how do we study global
genetic or expression changes?
• Microarrays
– Hybridization based (complementary base pairing)
– Comparative Genome Hybridization (CGH) or SNP
arrays for known DNA variants and copy number
changes.
– mRNA Expression arrays for RNA expression.
• Next-Gen sequencing
– Sequence based
– can be used for DNA or RNA
– can detect mutations, copy number changes, and
expression differences.
Microarrays
Used for:
1) Gene expression
2) Copy number changes
whole transcript expression
array covers 28,869 wellannotated genes with 764,885
distinct probes.
affy 6.0 SNP chip- 906,000 SNPs
and ~ 1,000,000
ADVANTAGES:
Relatively cheap
Analyze all known genes
Analysis is relatively easy
DISADVANTAGES
Cannot detect novel genes,
mutations, or copy number
changes
Next-gen sequencing
Human Genome Project took ~13
years and cost ~3 billion dollars.
3billion bases
Now in a few weeks for less than a
few thousand dollars, we can
sequence a genome.
Advantages-No limitation on novel
detections
-simultaneously discover
expression or copy number
changes AND mutations.
Disadvantages:
-expensive
-analysis is complex and difficult
(but not for iBRIC scholars!)
The sequencing revolution
Human genome project
15 years $3,000,000,000
45 genomes
1 day
$45,000
$1000 genome
The Cancer Genome Atlas (TCGA)
• The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated
effort to accelerate the understanding of the molecular basis of cancer
through the application of genome analysis technologies, including largescale genome sequencing.
•
•
•
•
•
•
•
•
•
•
33 cancers
275 million dollars
11000 patients
2700 publications since started in 2006
RNA expression (RNAseq and microarray)
Exome sequencing (mutation)
Whole genome (for some)
Copy number
Methylation
miRNA expression
ENCODE (encyclopedia of DNA elements) – The
human genome project or our generation
Build a comprehensive parts list of functional
elements in the human genome.
Outline
• What is cancer and how is it different from
other diseases?
• Major players in tumor initiation (Understand
one before many)
Molecular
biology of
cancer
– Oncogenes
– Tumor suppressors
• Types of mutations
• The many – informatics
– arrays
– sequencing
– How to read a heatmap
• Personalized Medicine/cancer genomics WORKS
– examples from breast cancer.
• Summary
Cancer
Informatics
(omics)
HOW DO WE MAKE SENSE OF IT
ALL?!?!
Pattern recognition/Logic Problems
Personalized medicine
Bayer
Breast Cancer- IT WORKS!!
• Historically classified based on tumor size,
nodal involvement, invasion, histology, etc.
• More recently genomics and transcriptomics
have provided clues to what drives tumor
initiation and progression and has the
potential to divide patients into separate
groups that might respond to different
therapies.
Sørlie T et al. PNAS 2001;98:10869-10874
Gene Expression Patterns Reveal Novel Breast Cancer
Sub-types
Gene expression patterns of 85
experimental samples
representing 78 carcinomas, three
benign tumors, and four normal
tissues, analyzed by hierarchical
clustering using the 476 cDNA
intrinsic clone set.
patient
Finding Patterns!!!
gene
ER-
ER+
basal HER2+ Normal
luminal A
basal
luminal B
like
low
high
Sørlie T et al. PNAS 2001;98:10869-10874
Overall and relapse-free survival analysis of the 49 breast cancer patients, uniformly treated
in a prospective study, based on different gene expression classification.
Personalized medicine WORKS!!!!
Sørlie T et al. PNAS 2001;98:10869-10874
©2001 by National Academy of Sciences
Gene expression use in the clinic
• Mammaprint
– approved by FDA
– 70 gene signature for patients with node negative and
ER+ tumors
– compared to conventional classification in ~300
patients 87 would have been treated differently. 67
were determined high risk by conventional methods
and were given chemo but by Mammaprint were
classified low risk. Followed patients for 10 yrs and
Mammaprint was more accurate
• Oncotype
• PAM50
All good for deciding chemo vs. endocrine therapy, but still need classifications for ERtumors
Summary
•
•
Cancer is different than other diseases because it is really 100s or 1000s of
diseases and is your own cells gone bad.
Oncogenes
– cause cancer
– hyperactive
– mutation, amplification
•
Tumor Suppressors- prevent cancer
–
–
–
–
–
•
Different types of alterations
–
–
–
–
•
Prevent cell cycle progression
Induce cell death
DNA repair
lost or turned off
mutation, deletion, methylation
Somatic mutation
amplification
deletion
methylation
Global analysis (omics)
– Microarray
– Sequencing
•
Personalized Medicine
– Biomarker
– How to read a heatmap and KM curve
1
2
3
4
5
A
low
high
Sample Preparation
ChIPseq
Lyse cells
WGS
RNAseq
DNA
ChIPseq
RNA
cDNA
Histone modifications/TF Binding
Library Preparation
Or cDNA
Cluster Generation and Sequencing
Illumina
Sequencing by synthesis
Data Analysis
RNAseq
Additionally may require 1) transcript assembly
and 2) estimation of abundance
Problem
• After performing RNAseq analysis on a
matched pair of tumor and normal tissue from
a single patient you find that a novel gene is
expressed 4 times higher in the tumor sample.
Your collaborator conclude that this gene is
transcribed at higher rates in tumors.
– What are alternative interpretations? Think of
caveats to RNAseq.
RNAseq caveats
• Rnaseq is a steady state measurement
– RNA abundance is a result of transcription and
degradation
• RNAseq is the measurement of the average
expression in the population.
– Doesn’t tell you expression in individual cells.
Instead demonstrates expression in the pool of
cells and perhaps different cell types.
• RNA temporally and spatially regulated.
Other sequencing caveats. Aligning to
the ‘reference genome’
• 38th main alignment
• 20th main release
• Still have regions of unknown location
Other sequencing caveats..Repetitive
regions and shared exons
• Up to 50% of human genome is non-unique
• Problem for aligning
• Many isoforms share exons