Mult-lecture

Download Report

Transcript Mult-lecture

Multidimensional Analysis
If you are comparing more than two
conditions (for example 10 types of
cancer) or if you are looking at a time
series (cell cycle or progression of
cancer) you are looking at a
multidimensional problem
Example: 6000 genes in 10 patients
• 6000 points in 10dimensional space
(gene view)
• 10 points in 6000dimensional space
(patient view)
Reduction of dimensions:
• Principal Component
Analysis (PCA)
• Clustering
• Correspondence
Analysis
Patient view
Classification
1: patients surviving 5 years after breast cancer surgery
2: patients dead within 5 years of breast cancer surgery
Other classifiers
• Neural Networks
• Support Vector Machines
• Other classifiers from statistical literature
Issues in building a classifier
• Feature selection: a selected group of genes
may be optimal (t-test)
• Independent validation: you must test the
classifier on samples that were not used for
feature selection or for building the
classifier (training set - test set or leave-oneout crossvalidation)
Promoter Analysis
• Genes that pass the significance test are clustered
and their corresponding promoter regions
extracted.
• Regions are searched for potential transcription
factor binding sites that they have in common
• Saco-patterns looks for exactly identical patterns
• Gibbs sampler allows for degeneracy of patterns
with weight matrix description
• Transfac is a database of known transcription
factor binding sites.
Patterns can be assessed based on overrepresentation in cluster
relative to background set.