Defense talk

Download Report

Transcript Defense talk

Diagnosis of Ovarian Cancer
Based on Mass Spectrum of Blood Samples
Hong Tang
Committee:
Eugene Fink
Lihua Li
Dmitry B. Goldgof
Outline
• Introduction
• Previous work
• Feature selection
• Experiments
Motivation
Early cancer detection is critical
for successful treatment.
Five year survival for ovarian cancer:
• Early stage: 90%
• Late stage: 35%
80% are diagnosed at a late stage.
Motivation
Desired features of
cancer detection:
• Early detection
• High accuracy
• Low cost
Mass spectrum
intensity
We can detect some early-stage cancers
by analyzing the blood mass spectrum.
102
100
10–2
10–4
0
5,000
10,000
15,000
20,000
ratio of molecular weight to electrical charge
Mass spectrum
Blood
Results
Mass spectrum
Data mining
Outline
• Introduction
• Previous work
• Feature selection
• Experiments
Initial work
• Vlahou et al. (2001): Manual diagnosis
of bladder cancer based on mass spectra
• Petricoin et al. (2002): Application of
clustering to mass spectra for the ovariancancer diagnosis
Later work
Decision trees
Adam et al. (2002): 96% accuracy for prostate cancer
Qu et al. (2002): 98% accuracy for prostate cancer
Clustering
Petricoin et al. (2002): 80% accuracy for prostate cancer
Neural networks
Poon et al. (2003): 91% accuracy for liver cancer
Outline
• Introduction
• Previous work
• Feature selection
• Experiments
intensity
Feature selection
Cancer
Healthy
200
400
600
ratio of molecular weight to electrical charge
Statistical difference: 1  2 /  12   22
intensity
Feature selection
Cancer
Healthy
200
400
600
ratio of molecular weight to electrical charge
Window size: minimal distance between selected points
Outline
• Introduction
• Previous work
• Feature selection
• Experiments
Data sets
Data
set
1
2
3
Number of cases
Cancer Healthy
100
100
162
116
116
91
Learning algorithms
• Decision trees (C4.5)
• Support vector machines (SVMFu)
• Neural networks (Cascor 1.2)
Control variables
• Number of features, 1–64
• Window size, 1–1024
Best control values
Decision trees
Data Number of Window
set
features
size
1
4
1
2
8
4
3
8
64
Accuracy
82%
94%
99%
Best control values
Support vector machines
Data Number of Window
set
features
size
1
32
16
2
4
2
3
16
8
Accuracy
83%
94%
99%
Best control values
Neural networks
Data Number of Window
set
features
size
1
32
256
2
32
1
3
16
2
Accuracy
82%
96%
99%
Learning curve
Data set 1
accuracy (%)
100
90
80
70
60
50
100
150
200
training size
Decision trees, SVM, Neural networks
250
Learning curve
Data set 2
accuracy (%)
100
90
80
70
60
0
50
100
150
200
training size
Decision trees, SVM, Neural networks
250
Learning curve
Data set 3
accuracy (%)
100
90
80
70
60
0
50
100
150
200
training size
Decision trees, SVM, Neural networks
250
Main results
Automated detection of ovarian cancer by
analyzing the mass spectrum of the blood
• Identification of the most informative
points of the mass-spectrum curves
• Experimental comparison of decision
trees, SVM and neural networks
Future work
• Experiments with other data sets
• Other methods for feature selection
• Combining with genetic algorithm