Classification of microarray gene expression data using

Download Report

Transcript Classification of microarray gene expression data using

Classification of microarray
gene expression data using
support vector machines (SVM)
A presentation on the topic
For CIS 595 Bioinformatics course
by Despina Kontos
Spring 2003 – Temple University
Overview…
• What are microarray gene expression
data?
• What are Support Vectors Machines?
• How can we use them to utilize these
gene expression data?
CLASSIFICATION EXPERIMENTS !!!
Microarrays…
• What are they anyway??
Gene expression levels on tissue or
cell for varying environment conditions
Microarrays…
• From a machine learning point of view…
Genes
Experiment
g-1
g-2
……
g-n
ex-1
ex-2
…….
…….
ex-m
Function classification
Tissue classification
Support Vector Machines (SVM)
• Linear classifiers
• Attempt to avoid overfitting by finding the optimal
hyperplane that separates the data
HOW???
By maximizing the Margin..
Support Vectors
Introduced by V.Vapnic and co-workers in 1995
Support Vector Machines (SVM)
• And what about datasets that are not linearly separable??
Map the data into higher dimensional space and make linear
classification there (theorem!!)
Support Vector Machines (SVM)
Some mathematical
formulations…
We need ONLY
the support vectors for
computations!!
We can use KERNEL
functions to avoid
computations in higher
dimensional space
Some experiments…
M.P.S.Brown, W.N.Grundy, D.Lin, N.Cristianini, C.W.Sugnet, T.S.Furey, M.Ares Jr. and D.Haussler,“Knowledge-based analysis of microarray
gene expression data by using support vector machines", Proc.Natl.Acad.Sci.USA,97, 1, pp.262-267, 2000.
Classification of gene function from microarray data using SVM
2,476 genes
79 DNA
hybridization
experiments
Genes
Experiment
6 gene
function
families
SVM provided
optimal classification!!!
g-1
g-2
……
g-n
ex-1
ex-2
…….
…….
ex-m
F1 F2 F3 ...
Function Classification
More experiments…
T.furey, N.Cristianini, N. Duffy, D. Bednarski, M. Schummer and D Haussler, “Support Vector Machine Classification and Validation of
Cancer Tissue Samples Using Microarray Expressioin Data”, Bioinformatics, 2000.
Gene expression data on tissue
97,802 DNA clones
31 tissue samples
Cancer ovarian
Normal ovarian
Normal non-ovarian
Genes
Experiment
ex-1
ex-2
…….
…….
ex-m
g-1
g-2
……
g-n
Cancer
Not Cancer
...
...
Tissue
Cancer
Classification
Conclusions
• Microarray gene expression data are a very useful
format of biological information (..expensive to obtain!!)
• SVM new and very promising classification apprach
• A lot of research still to be done on Biological
information processing using techniques developed in
fields such as Machine Learning, Data Mining, etc..
Additional resources..
Osuna, R. Freund, and F. Girosi. Support vector machines: Training and applications.
In A.I. Memo. MIT A.I. Lab, 1996
N. Cristianini. ICML'01 tutorial, 2001
http://www.kernel-machines.org/
http://research.microsoft.com/users/jplatt/svm.html
http://www.isis.ecs.soton.ac.uk/resources/svminfo/
THANK YOU!!!!!