Lecture - Weizmann Institute of Science

Download Report

Transcript Lecture - Weizmann Institute of Science

SPIN (Sorting Points Into
Neighborhoods) Algorithm
ELTEM Neurogenomics Course
Biozentrum – Basel
December 1, 2006
Presented by Tal Shay and Assif Yitzhaky
Weizmann Institute of Science
Rehovot, Israel
D. Tsafrir , I. Tsafrir , L. Ein-Dor , O. Zuk , D.A. Notterman , and E. Domany
Sorting points into neighborhoods (SPIN): data analysis and
visualization by ordering distance matrices. Bioinformatics 2005 21:
2301-2308
SPIN
• A new method for mining gene expression
data
• The philosophy behind SPIN – feel and
play with the data
What is sorting?
One dimensional ordering of a set of objects
according to a particular trait.
Height
Time
Samples distance matrix
Distance matrices
genes
genes
Genes distance matrix
samples
samples distance matrix
genes
Expression
matrix
samples
High Distance
60
50
40
30
20
10
1
Low
Multiple clusters
PCA 2
5
4
3
2
1
0
-1
-2
-3
-4
-5
-5
100
200
200
300
300
400
400
500
500
600
600
700
700
-4
100
-3
-2
-1
0
1
PCA 1
2
3
4
5
400
500
Unsorted
600
700
800
100
200
300
400
500
600
700
800
SPIN
Q: How many objects?
Q: What are their shapes?
Q: What is the relative orientation ?
Large
0
300
5
200
10
800
100
15
800
Small Distances
Decode this matrix
Q: How many objects?
Q: What are their shapes?
Q: What is the relative orientation ?
SPIN Interface Layout
Distance matrix
Scores &
Diagnostics
Expression
matrix
SPC
Dendrogram
PCA
Analysis
Cluster
buttons
Transpose
Sorting buttons
Side-To-Side on a circle:
5
5
4
4
3
3
2
2
1
1
0
0
-1
-1
-2
-2
-3
-3
-4
-4
-5
-5
Penalizes blue
points far from
the main diagonal.
-5
0
5
-5
0
5
50
50
100
100
Penalizes red
points near the
main diagonal.
150
150
200
200
250
250
300
300
350
350
400
50
100
150
200
250
300
Side to side
350
400
400
50
100
150
200
250
300
350
Neighborhood
400
‘Electronic Microdissection’
Two-Way Sorted Expression Matrix
Normal
Liver Met.
Liver
Primary
Tumor
A
A
K
U
Important: 97% of
genes that appear
significantly overexpressed in
Metastasis vs.
Carcinoma are
Liver-Specific and
irrelevant to
cancer!
Genes
B
H
K
M
U
V
Normal Liver Primary Polyp Normal
Liver Met. Tumor
Colon
Identify ‘clean’
metastasis
samples
Normal Liver
Normal Lung
Adenomas
Liver metastasis
Normal colon Lung metastasis
Carcinomas
SPIN versus clustering
• SPIN rearranges points in a way that reflects
the shape of their arrangement.
• Clustering aims to rearranges points to
categories; points within a cluster are more
“similar” as compared to points outside the
cluster.
• SPIN rearranges the points and the user
should identify the clusters. Clustering do it
for you in a formal way.
Software availability
• SPIN is available upon request.
• Email [email protected]
Exercise Time
The EXCEL
The EXCEL
Shift header line one
column to the left