Classification and Diagnostic Prediction of Cancers using

Download Report

Transcript Classification and Diagnostic Prediction of Cancers using

Classification and Diagnostic
Prediction of Cancers using Gene
Expression Profiling and Artificial
Neural Networks
JAVED KHAN ET AL.
NATURE MEDICINE – Volume 7 – Number 6 – JUNE 2001
The Small, Round Blue Cell
Tumors (SRBCTs) of Childhood




Four categories – Neuroblastoma (NB),
rhabdomyosarcoma (RMS), non-Hodgkin lymphoma
(NHL) and the Ewing family of tumors (EWS).
Similar in appearance on routine histology.
However accurate diagnosis is essential – as
treatment options , response to therapy, etc, vary.
No single test can precisely distinguish SRBCTs –
Immunohistochemistry, cytogenetics, interphase
fluorescent in situ hybridization and reverse
transcription.
Gene Expression Profiling
using cDNA Microarrays.




Micoarrays measure the activities of several
thousand genes simultaneously.
Can be used for Cancer Classification.
This will give better therapeutic
measurements to cancer patients by
diagnosing cancer types with improved
accuracy.
and furthermore cancers belonging to
several diagnostic categories – SRBCTs.
Artificial Neural Networks
(ANNs) – put to the task.




Modeled on the structure and behavior of neurons in
the human brain.
Can be trained to recognize and categorize complex
patterns.
Pattern recognition achieved by adjusting of the
ANN by a process of error minimization through
learning from experience.
ANNs were applied to decipher gene-expression
signatures of SRBCTs and then used for diagnostic
classification.
Error Minimization
Mean Squared Error
Summed Square Error
Network Architecture and
Parameters







Due to limited amount of calibration data and the fact that four
output nodes are needed, the network architecture was limited to
Linear perceptrons.
10 input nodes were used representing the 10 PCA components
described later on.
4 output nodes modeled by the Sigmoid function.
Calibration is performed using JETNET, with learning rate η = 0.7,
momentum coefficient p = 0.3.
The learning rate is decreased with a factor of 0.99 after each
iteration.
Initial weight values are chosen randomly from [-r,r], where r =
0.1/max[Fi], where Fi is the number of nodes connecting to node i.
Weight values are updated after every 10 samples.
Back-propagation



Minimizing by gradient descent is the least sophisticated but nevertheless in
many cases a sufficient method.
It amounts to updating the weights according to the Back-propagation
learning rule.
The partial derivative ∂Et/∂w represents a sensitivity factor, determining the
direction of search in weight space for the synaptic weights.
where
Delta rule
….continue

A momentum is often added to stabilize the learning.
where α < 1
Calibration and validation of
the ANN Models.





cDNA microarrays containing 6567 genes:
63 training samples comprised of 13 EWS and 10
RMS from tumor biopsy and 10 EWS, 10 RMS, 12
NB, 8 BL from cell lines.
25 test samples comprised of 5 EWS, 5 RMS, 4 NB,
from tumors and 1EWS, 2 NB, 3BL from cell lines.
Plus 5 non-SRBCT samples (test ability reject
diagnosis).
Filtering for the minimal number of expression
reduced the genes to 2308.
Principle Component Analysis (PCA) further reduced
dimensionality.
….continue




10 dominant PCA components per sample were
used as inputs….
and four outputs – (EWS, RMS, NB, BL).
A three-fold cross-validation procedure was used
and 3750 ANN models were produced (Figure 1).
No sign of “over-training” of the models as would be
shown by a rise in the summed square error for the
validation set with increasing iterations (epochs) see figure 2.
The Artificial Neural Network
1.
2.
3.
4.
5.
6.
7.
8.
9.
Quality Filtering
PCA
25 test samples set aside and the 63 training
samples are randomly partitioned into 3
groups
One group is reserved for validation and the
other two used for calibration.
For each model the calibration was optimized
with 100 iterative cycles (epochs).
This was repeated using each of the three
groups for validation.
The samples were again randomly partitioned
and the entire training process repeated. For
each selection of a validation group one
model was calibrated, resulting in a total of
3750 trained models.
Once the models were calibrated they were
used to rank the genes according to their
importance for classification.
The entire process was repeated using only
top ranked genes.
….continue Validation





Each validation sample is then passed through 1250
models and hence 1250 predictions for each validation
sample are produced.
Each ANN model gives a number between 0 (not this
cancer type) and 1(this cancer type) as an output for
each cancer type.
The average for all model outputs for every validation
sample is then computed (denoted the average
committee vote).
Each sample is classified as belonging to the cancer
type corresponding to the largest committee vote.
Using these ANN models, all 63 training samples were
correctly classified to their respective categories.
Optimization of Genes used
for Classification.



The contribution of each gene to the
classification by the ANN models was then
assessed.
Feature extraction was performed in a model
dependent way due to relatively few samples.
This was achieved by monitoring the
sensitivity of classification to a change in
the expression level of each gene, using the
3750 previously calibrated models.


Sensitivity (S) of the outputs (o) with respect to any
2308 input varaibles (xk) is defined as:
Where Ns is the number of samples (63) and No is
the number of outputs (4). The procedure for
computing Sk involves a committee of 3750 models.
….continue



In this way genes were ranked according to
the significance of classification and the
classification error rate using increasing
numbers of these ranked genes was
determined.
The classification Error rate minimized at 0%
at 96 genes.
Using only these 96 genes, recalibration of
the ANN models was performed and again all
63 samples were correctly classified.
Assessing the Quality of
Classification - Diagnoses.


The aim of diagnoses is to be able to reject
test samples which do not belong to any of
the four categories.
To do this a distance dc from a sample to the
ideal vote for each cancer type was
calculated:
….continue





Where c is the cancer type, oi is the average committee vote for cancer i,
and δi,c is unity if i corresponds to cancer type c and zero otherwise.
The distance is normalized such that the distance between two ideal
samples belonging to different disease categories is unity.
Based on the validation set, an empirical probability distribution of distances
for each cancer type was generated.
The empirical probability distributions are built using each ANN model
independently.
Thus, the number of entries in each distribution is given by 1250 multiplied
with the number of samples belonging to the caner type.
….continue




For a given test sample it is thus possible to reject
possible classifications based on the these
probability distributions.
Hence for each disease category a cuttoff distance
from the ideal sample was defined within which it is
expected a sample of this category to fall in.
The distance given by the 95th percentile of the
probability distribution was chosen.
This is the basis of diagnoses, as a sample that falls
outside the cuttoff distance cannot be confidently
diagnosed.
Diagnostic Classification and
Hierarchical Clustering.




The diagnostic capabilities of all 3750 ANN models were then
tested using the 25 blinded test samples.
A sample is classified to a diagnostic category if it receives the
highest vote for that category and because this classifier has only
four possible outputs, all samples will be classified to one of the
four categories.
If a sample falls outside the 95th percentile of the probability
distribution of distances between samples and their ideal output
(for example for EWS it is EWS = 1, RMS = NB = BL = 0), its
diagnosis is rejected.
Using the 3750 ANN models calibrated with the 96 genes, 100%
classification was achieved for the 20 SRBCT test samples and
furthermore all of the 5 non-SRBCT samples were excluded from
any of the four diagnostic categories, since they fell outside the
95 percentile.
….continue

Hierarchical clustering using the 96 genes,
identified from the ANN models, correctly
clustered all 20 of the test samples