72x48 Poster Template - Florida International University

Download Report

Transcript 72x48 Poster Template - Florida International University

WEIGHTED CONCENSUS CLUSTERING FOR PPI NETWORK
Yi Zhang, Erliang Zeng, Tao Li, Giri Narasimhan – Florida International University
The 8TH International Conference On Machine Learning And Application 2009
Introduction and Challenge
Consensus Clustering Algorithm
Introduction
• Proteins are central components of cell machinery and
life; Protein-Protein Interaction (PPI) networks are
important sources related to biological processes and
complex metabolic functions of the cell.
Challenge [1]
• Data quality: each experimental and Insilco methods
used to compute interactions, have their own strengths
and weaknesses and all are believed to be noisy;
• Partition the PPI network using classical graph
partitioning or clustering schemes is inherently difficult
due to its charactics;
• Some proteins are believed to be multi-functional.
Input:
P  ( P , P ,, P )
1
T clusterings :
Clustering
P
t
:
Base Clustering
Similarity Matrix
Repeated
Bisection
Betweenness
Based
Direct K-way
Partitioning
PPI
Data
Multilevel-way
Partition
Clustering
Coefficient
Based
Special
Clustering
Similarity Matrix
Similarity Measure:
1.Clustering coefficient-based [1]
SCC vi , v j   CCvi  CCv j  CC 'vi CC 'v j
2nv
CC v  
kv kv  1
2.Betweenness-based [1]
S vi , v j   1 
Base Clustering
~
t
2
M ij ( P ) 
t

min~ M  H H

T
n
1 T
1
min* J   d ( P t , P* )    M ij ( P t )  M ij ( P* )
P
T t 1
T t 1 i , j 1
2.Direct k-way Partitioning;
3.Multilevel k-way Partitioning;
2
 M P 
T
t
NMF Formulation [4]
ij
t 1
4.Spectral Clustering [3].
TEMPLATE DESIGN © 2008
www.PosterPresentations.com
min J  w Aw  2b w  const.
NMF Formulation [4]
bi
U
2
0 0
~
T
Min
M

HH
H is cluster indicator

0 0 H 0
1 0
D  diag ( H T H )  diag (n1 ,, nk )
H
1 0
2
~
1 0
Min
M  HH T
H H  D , H 0

0 1
Symmetric

NMF
0 1
2
T
Min
~T ~
~
~ T
MH DH
~
H H  I , H , D 0
Weighted Consensus Clustering [5]
W  w1 , w2 ,, wT  , wi  0,
Connectivity Matrix:
1 T
M ij   wi M ij ( P t )
T t 1
~
        P 
~
~
 H M P H
uv
T
i
Soft Clustering [5]
M ( P*)  HH T
~
s.t.  wi  1, wi  0
uv
2
Min M  M ( P*)
1
1

0

0
0
0

0
T
References
i 1
ij
Cluster Indicator
T
This can be solved using quadratic programming
• Where: A  Tr M P i M P j 
M Pi M
~

Two Step Iterating:
~
• Solve for H while fixing w
– Using NMF-based method
~
• Solve for w while fixing H , the optimization problem
becomes:
T
T
Metis [2]

U
1
~
M ij 
T
~ ~ ~ ~T ~ ~T
MH  HH HH
Optimization Procedure [5]
2
Min M  M ( P*)
where:
T
~T ~ ~
max
Tr  H M H 
~
w, H


1 ( i , j )C k ( P t )
0 Otherwise
~
~~
~

 Tr MM  2 H
w,H
t
k
Weights for Partition:
Base Clustering:
1.Repeated bisections;
2
~ T
~
Objective Function:
SPij
SPmax
T
t
1
Connectivity Matrix:
Concensus
Clustering
Weighted
Consensus
Clustering
(WCC)
2
P  (C , C ,, C )
t
Diagram of Consensus Clustering Approach [1]
Our
Contribution!!
Objective Function [5]:
• H is not exactly orthogonal.
• Suppose a protein has a posterior distribution as
this protein is clustered
0.96,0,0.04,0,,0
into one cluster.
• Suppose another protein has a posterior
distribution as 0.48,0.48,0.04,0,,0
this protein is clustered into two clusters.
Evaluation [1,6,7,8]
j
uv
1. S. Asur, D. Ucar, and S. Parthasarathy. An ensemble framework
for clustering proteincprotein interaction networks. Bioinformatics,
23(13):i29–i40, 2007.
2. http://glaros.dtc.umn.edu/gkhome/views/metis.
3. U.von Luxburg. A tutorial on spectral clustering. Statistics and
Computing, 17(4), August, 2007.
4. T. Li and Chris Ding. The Relationships among Various
Nonnegative Matrix Factorization Methods for Clustering. In
Proceedings of the IEEE International Conference on Data Mining
(ICDM), pp 60–63, 2006.
5. T. Li and C. Ding. Weighted consensus clustering. In Proceedings
of 2008 SIAM International Conference on Data Mining (SDM), 2008.
6. M. E. J. Newman and M. Girvan. Finding and evaluating
community structure in networks, The American Physical Society,
E69, 026113, 2004.
7. The gene ontology consortium online database
http://db.yeastgenome.org/cgi-bin/GO/goTermFinder.
8. A. Strehl, J. Ghosh and C. Cardie. Cluster Ensembles – A
Knowledge Reuse framework for Combining Multiple Partitions,
Journal of Machine Learning Research, 3:583-617, 2002.
Contact Information
T
w
i 1
i
1
ECS 318
11200 SW 8th STREET
MIAMI, FL, 33199, U.S.A
PHONE: 1.305.348.1218
FAX:
1.305.348.3549
EMAIL: [email protected]