Transcript Methods

Gene-Disease Associations
Based on network
报告人:李金金
Contents
1
Background
2
Heterogeneous network
3
Methods
Background
 Correctly identifying association of genes with
diseases has long been a goal in biology.
 Identifying association of genes with diseases
has contributed to improving medical care and
understanding of gene functions and
interactions.
 Clinical diseases are characterized by distinct
phenotypes. To identity disease genes, the
relationship between genes and phenotypes is
involved.
Background
Problems
Gene
Disea
-se
Association
Pheno
-type
Gene
Pheno
-type
Construction heterogeous network
 Gene network based on HPRD
g3
g1
g2
g5
g4
g7
g6
AG
Construction heterogeous network
 Phenotype network using MinMiner
p1
p2
AP
p4
p5
Construction heterogeous network
 Gene-Phenotype network based on OMIM
p1
p2
p4
p5
B
g3
g1
g2
g5
g4
g7
g6
Construction heterogeous network
AG ( n*n ) AP ( m*m ) B( n*m )
 AG
A T
B
B
AP 
Methods
Katz
CATAPULT
Methods
CIPHER
GeneWalker
Prince
RWRH
Methods
Methods
Katz
is successfully
applied for link
prediction in
social networks.
CATAPULT
is a supervised
learning method.
Features are
derived from
hybrid walks
through the
heterogeneous
network.
Katz
g1
g2
g3
g5
g4
g6
0
0

1
A
0
1

0
0
0
1
1
0
1
1
1
0
1
1
1
0
1
1
0
0
1
1
0
1
0
0
0
0
1
1

1
0

0
Katz
g1
g2
g3
g5
2
1

1
A2  
1
1

1
g4
g6
3
4
AAA
5
……
1
3
2
2
1
2
1
2
5
2
1
2
1
2
2
3
1
2
1
1
1
1
2
1
1
2
2

2
1

3
Katz
 How to get the similarity matrix?
k
Sij    l ( A ) ij , l  
l
l 1
 Katz measure:
k
S
katz
l   l
   A  ( I  A)  I ,  
l
l 1
l
l  0
1
1
A
2
Small values of k (k=3 or k=4) are known to yield
competitive performance in the task of
recommending similar nodes.
Katz on the heterogeneous network
Adjacency matrix of heterogeneous network:
 AG
A T
B
B  BHS
 APHS
AP  
 0
B

AP 
BS 
0 
APS 
AG
gene-gene network
B
the bipartite network genes
and phenotypes
APHS
the similarity matrix of
human diseases
APS
the similarity matrix of
phenotypes of other species
Katz on the heterogeneous network
 Katz similarity measure specialized to A:
k
S Katz ( A) ij    l ( Al ) ij
l 1
 K=3,the similarities between gene nodes
and human disease nodes could be
denoted by S HKatz
s ( A)
2
S HKatz
(
A
)


B


( AG BHs  BHs APHs )
s
Hs
3
2
 (
BBT BHs  AG BHs  AG BHs APHs  BHs APHs
)
2
CATAPULT
 How to train a biased SVM?
T
the number of bootstraps
A
the sets of positive
the set of unlabeled gene-phenotype pairs
n+
the number of examples in A
Step 1: Draw a bootstrap sample U 
of size n+ .
Step 2: Train a linear classifier θ using the positive
training
examples A and U as negative examples.
CATAPULT
 How to train a biased SVM?
Step 2: Training classifier
CATAPULT
 How to train a biased SVM?
Step 3: For any x  U \ U t update: