What is Knowledge Discovery?

Download Report

Transcript What is Knowledge Discovery?

Knowledge Discovery
in Biomedicine
Limsoon Wong
Institute for Infocomm Research
Plan
• Knowledge discovery in brief
• Eg 1: Optimizing treatment of childhood ALL
• Eg 2: Predicting survivals of patients with
DLBC lymphoma
• Concluding remarks
Copyright © 2004 by Limsoon Wong
Copyright © 2004 by Limsoon Wong
Knowledge Discovery
in Brief
What is Knowledge Discovery?
Jonathan’s blocks
Jessica’s blocks
Whose block
is this?
Jonathan’s rules : Blue or Circle
Jessica’s rules : All the rest
Copyright © 2004 by Limsoon Wong
What is Knowledge Discovery?
Question: Can you explain how?
Copyright © 2004 by Limsoon Wong
Steps of Knowledge Discovery
• Training data gathering
• Feature generation
– k-grams, colour, texture, domain know-how, ...
• Feature selection
– Entropy, 2, CFS, t-test, domain know-how...
• Feature integration
– SVM, ANN, PCL, CART, C4.5, kNN, ...
Some classifiers/learning methods
Copyright © 2004 by Limsoon Wong
Knowledge Discovery for
Copyright © 2004 by Limsoon Wong
Optimizing Treatment
of Childhood ALL
Image credit: Yeoh et al, 2002
Childhood ALL
• Major subtypes: T-ALL,
E2A-PBX, TEL-AML,
BCR-ABL, MLL genome
rearrangements,
Hyperdiploid>50,
• Diff subtypes respond
differently to same Tx
• Over-intensive Tx
• The subtypes look
similar
• Conventional diagnosis
– Immunophenotyping
– Cytogenetics
– Molecular diagnostics
– Development of
secondary cancers
– Reduction of IQ
• Under-intensiveTx
– Relapse
Copyright © 2004 by Limsoon Wong
•
Unavailable in most
ASEAN countries
Single-Test Platform of
Microarray & Knowledge Discovery
training data collection
feature integration
Image credit: Affymetrix
Copyright © 2004 by Jinyan Li and Limsoon Wong
Impact
Conventional Tx:
• intermediate intensity to all
 10% suffers relapse
 50% suffers side effects
 costs US$150m/yr
Our optimized Tx:
• high intensity to 10%
• intermediate intensity to 40%
• low intensity to 50%
• costs US$100m/yr
Copyright © 2004 by Jinyan Li and Limsoon Wong
•High cure rate of 80%
• Less relapse
• Less side effects
• Save US$51.6m/yr
Knowledge Discovery for
Copyright © 2004 by Limsoon Wong
Predicting Survival of
Patients with DLBC
Lymphoma
Image credit: Rosenwald et al, 2002
Diffuse Large B-Cell Lymphoma
• DLBC lymphoma is the
most common type of
lymphoma in adults
• Can be cured by
anthracycline-based
chemotherapy in 35 to
40 percent of patients
 DLBC lymphoma
comprises several
diseases that differ in
responsiveness to
chemotherapy
Copyright © 2004 by Limsoon Wong
• Intl Prognostic Index (IPI)
– age, “Eastern Cooperative
Oncology Group” Performance
status, tumor stage, lactate
dehydrogenase level, sites of
extranodal disease, ...
• Not good for stratifying
DLBC lymphoma patients
for therapeutic trials
 Use gene-expression
profiles to predict
outcome of
chemotherapy?
Knowledge Discovery from Gene
Expression of “Extreme” Samples
240
samples
“extreme”
sample
selection
47 shortterm survivors
26 longterm survivors
knowledge
discovery
from gene
expression
84
genes
T is long-term if S(T) < 0.3
T is short-term if S(T) > 0.7
7399
genes
80
samples
Kaplan-Meier Plot for 80 Test Cases
p-value of log-rank test: < 0.0001
Risk score thresholds: 0.7, 0.5, 0.3
Improvement Over IPI
(A) IPI low,
p-value = 0.0063
(B) IPI intermediate,
p-value = 0.0003
Merit of “Extreme” Samples
(A) W/o sample selection (p =0.38)
(B) With sample selection (p=0.009)
No clear difference on the overall survival of the 80 samples in the validation
group of DLBCL study, if no training sample selection conducted
Knowledge Discovery for
Copyright © 2004 by Limsoon Wong
A Few Other
Biomedical
Applications
Predict Epitopes,
Find Vaccine Targets
• Vaccines are often the
only solution for viral
diseases
• Finding & developing
effective vaccine targets
(epitopes) is slow and
expensive process
• Develop systems to recognize
protein peptides that bind
MHC molecules
• Develop systems to recognize
hot spots in viral antigens
Recognize Functional Sites,
Help Scientists
• Effective recognition of
initiation, control, &
termination of biological
processes is crucial to
speeding up & focusing
scientific expts
• Data mining of bio seqs
to find rules to
recognize & understand
functional sites
Dragon’s 10x
reduction of
TSS recognition
false positives
Understand Proteins,
Fight Diseases
• Understanding function
& role of protein needs
organised info on
interaction pathways
• Such info are often
reported in scientific
paper but are seldom
found in structured db
• Knowledge extraction
system to process free text
• extract protein names
• extract interactions
Benefits of Bioinformatics
• To the patient:
– Better drug, better treatment
• To the pharma:
– Save time, save cost, make more $
• To the scientist:
– Better science
Copyright © 2004 by Limsoon Wong
References
• A. Yeoh et al, “Classification, subtype discovery, and
prediction of outcome in pediatric acute lymphoblastic
leukemia by gene expression profiling”, Cancer Cell,
1:133--143, 2002
• A. Rosenwald et al, “The use of molecular profiling to
predict survival after chemotherapy for diffuse large
B-cell lymphoma”, NEJM, 346:1937--1947, 2002
• H. Liu et al, “Selection of patient samples and genes
for outcome prediction”, Proc. CSB2004, pages 382-392
Copyright © 2004 by Limsoon Wong
Copyright © 2004 by Limsoon Wong
Any Question?
•
•
•
•
To be presented
10/10/04, 8.30--10.00am
Raffles Convention Centre
NHG-IBM Symposium
Copyright © 2004 by Limsoon Wong