ppt - people.vcu.edu

Download Report

Transcript ppt - people.vcu.edu

Scenario 6
Distinguishing different types of
leukemia to target treatment
Acute Myeloid Leukemia
(AML)
vs
Acute Lymphoblastic Leukemia
(ALL)
Golub, T. R., et al. 1999. Molecular classification of
cancer: class discovery and class prediction by gene
expression monitoring. Science 286:531-7.
AML
ALL
AML:
ALL:
AML:
ALL:
1
2
3
4
5
AML:
ALL:
AML
ALL
1
1
2
2
3
3
4
4
5
5
AML:
ALL:
AML
ALL
1
1
2
2
3
3
4
4
5
5
AML:
ALL:
AML
ALL
1
1
2
2
3
3
4
4
5
5
AML:
ALL:
AML
ALL
1
1
2
2
3
3
4
4
5
5
AML:
ALL:
AML
ALL
1
1
2
2
3
3
4
4
5
5
AML:
ALL:
AML
ALL
1
1
2
2
3
3
4
4
5
5
AML
+
ALL
1
1
2
2
3
3
4
4
5
5
AML
+
1
2
3
4
5
ALL
Spotted Microarray
Process
CTRL
TEST
Microarray Platforms
• Spotted arrays
• Inserts from cDNA libraries, PCR
products, or oligonucleotides
• Probed with labeled RNA or cDNA from
2 samples
• Affymetrix GeneChip arrays
• 25mer oligonucleotides synthesized on a
glass wafer
• Probed with labeled RNA or cDNA from
a single sample
Affymetrix Synthesis of Ordered
Oligonucleotide Arrays
Light
(deprotection)
Mask
OOOOO
TTOOO
HO HO O O O
T–
Substrate
Light
(deprotection)
Mask
CATAT
AGCTG
TTCCG
TTCCO
TTOOO
C–
Substrate
REPEAT
®
Affymetrix GeneChip
Probe Array
®
Affymetrix GeneChip Probe Arrays
Hybridized Probe Cell
GeneChip Probe Array
Single stranded, fluorescently
labeled DNA target
*
*
*
*
*
*
Oligonucleotide probe
24µm
1.28cm
Each probe cell or feature contains
millions of copies of a specific
oligonucleotide probe
Over 250,000 different probes
complementary to genetic
information of interest
BGT108_DukeUniv
Image of Hybridized Probe Array
Affymetrix Probe Tiling Strategy
The presence or absence of each Gene is
determined by a panel of 20 perfect match and
20 mismatch (control) oligonucleotides
(25-mer)
Sample output:
Data Analysis
Sample 1
Sample 2
Sample 1 (Light units)
Data Analysis
Sample 1
Sample 2
Sample 2 (Light units)
Sample 1 (Light units)
Data Analysis
Sample 1
Sample 2
Sample 2 (Light units)
Sample 1 (Light units)
Data Analysis
Sample 1
Sample 2
Sample 2 (Light units)
Golub, T. R., et al. 1999. Molecular classification of
cancer: class discovery and class prediction by gene
expression monitoring. Science 286:531-7.
http://www-genome.wi.mit.edu/cgi-bin/cancer/datasets.cgi
Near the bottom of the page:
“Molecular classification of cancer: class
discovery and class prediction by gene
expression monitoring.”
Paper, data tables, supplemental figures
Golub, T. R., et al. 1999. Molecular classification of
cancer: class discovery and class prediction by gene
expression monitoring. Science 286:531-7.
• Measured the expression of 6817 human genes using
Affymetrix arrays.
• Initially examined 27 ALL and 11 AML samples. Each
ALL or AML specimen was used to prepare labeled RNA
that was apparently hybridized with a single chip.
• “Samples were subjected to a priori quality control
standards regarding the amount of labeled RNA and the
quality of the scanned microarray image.” Eight of 80
leukemia samples were discarded.
Golub, T. R., et al. 1999. Molecular classification of
cancer: class discovery and class prediction by gene
expression monitoring. Science 286:531-7.
• The signal strength from each chip was apparently
normalized to that of the other chips by multiplying every
value in the chip by the multiplication factor listed in the
“Rescaling factors” table on the web.
Experimental Goals
1. “Class Prediction”-- Determine whether an
unknown sample belongs to a predefined class.
–
–
Find a set of genes whose expression is high in AML
and low in ALL or vice versa.
Measure the expression of these genes in unknown
samples and use the measurements as a class
predictor.
P(g,c) = Correlation Coefficient measuring the degree
to which expression of a given gene in the set of
samples correlates with assignment to either class
(AML or ALL)
=
m1(g) – m2(g) or
s1(g) + s2(g)
m2(g) – m1(g)
s1(g) + s2(g)
Figure 2. Neighborhood analysis: ALL vs AML. For the 38 leukemia samples in the initial dataset, the plot shows the
number of genes within various 'neighborhoods' of the the ALL/AML class distinction together with curves showing
the 5% and 1% significance levels for the number of genes within corresponding neighborhoods of the randomly
permuted class distinctions (see notes 16,17 in the paper). Genes more highly expressed in ALL compared to AML are
shown in the left panel; those more highly expressed in AML compared to ALL are shown in right panel. Note the
large number of genes highly correlated with the class distinction. In the left panel (higher in ALL), the number of
genes with correlation P(g,c) > 0.30 was 709 for the AML-ALL distinction, but had a median of 173 genes for random
class distinctions. Note that P(g,c) = 0.30 is the point where the observed data intersects the 1% significance level,
meaning that 1% of random neighborhoods contain as many points as the observed neighborhood round the AMLALL distinction. Similarly, in the right panel (higher in AML), 711 genes with P(g,c) > 0.28 were observed, whereas a
median of 136 genes is expected for random class distinctions.
Votes are cast in favor of either AML or ALL for each informative
gene. The magnitude of each vote is given by:
wivi where vi =
xi – mAML + mALL (xi = exp. of genei )
2
And wi= a weighting factor that reflects how well the gene is
correlated with the class distinction.
The class with the most votes wins (either ALL or AML).
Prediction Strength (PS) = Vwin – Vlose and must be >0.3.
Vwin + Vlose
Figure 3b. Genes distinguishing ALL from AML. The 50 genes most highly correlated with the ALL/AML class
distinction are shown. Each row corresponds to a gene, with the columns corresponding to expression levels in
different samples. Expression levels for each gene are normalized across the samples such that the mean is 0 and the
standard deviation is 1. Expression levels greater than the mean are shaded in red, and those below the mean are
shaded in blue. The scale indicates standard deviations above or below the mean. The top panel shows genes highly
expressed in ALL, the bottom panel shows genes more highly expressed in AML. Note that while these genes as a
group appear correlated with class, no single gene is uniformly expressed across the class, illustrating the value of a
multi-gene prediction method.
Supplementary fig. 2. Expression levels of predictive genes in independent dataset. The expression levels of the 50
genes most highly correlated with the ALL-AML distinction in the initial dataset were determined in the independent
dataset. Each row corresponds to a gene, with the columns corresponding to expression levels in different samples.
The expression level of each gene in the independent dataset is shown relative to the mean of expression levels for
that gene in the initial dataset. Expression levels greater than the mean are shaded in red, and those below the mean
are shaded in blue. The scale indicates standard deviations above or below the mean. The top panel shows genes
highly expressed in ALL, the bottom panel shows genes more highly expressed in AML.
Experimental Goals
1.
2. “Class Discovery”-- Determine whether a group
of samples can be divided into two or more
classes based only on measurement of their gene
expression.
–
–
Employs “self-organizing maps.”
Must address two requirements: construction of
algorithms to cluster the samples by gene expression
and determining whether the class assignments
produced by the algorithm are meaningful