Transcript Methods
Gene-Disease Associations
Based on network
报告人:李金金
Contents
1
Background
2
Heterogeneous network
3
Methods
Background
Correctly identifying association of genes with
diseases has long been a goal in biology.
Identifying association of genes with diseases
has contributed to improving medical care and
understanding of gene functions and
interactions.
Clinical diseases are characterized by distinct
phenotypes. To identity disease genes, the
relationship between genes and phenotypes is
involved.
Background
Problems
Gene
Disea
-se
Association
Pheno
-type
Gene
Pheno
-type
Construction heterogeous network
Gene network based on HPRD
g3
g1
g2
g5
g4
g7
g6
AG
Construction heterogeous network
Phenotype network using MinMiner
p1
p2
AP
p4
p5
Construction heterogeous network
Gene-Phenotype network based on OMIM
p1
p2
p4
p5
B
g3
g1
g2
g5
g4
g7
g6
Construction heterogeous network
AG ( n*n ) AP ( m*m ) B( n*m )
AG
A T
B
B
AP
Methods
Katz
CATAPULT
Methods
CIPHER
GeneWalker
Prince
RWRH
Methods
Methods
Katz
is successfully
applied for link
prediction in
social networks.
CATAPULT
is a supervised
learning method.
Features are
derived from
hybrid walks
through the
heterogeneous
network.
Katz
g1
g2
g3
g5
g4
g6
0
0
1
A
0
1
0
0
0
1
1
0
1
1
1
0
1
1
1
0
1
1
0
0
1
1
0
1
0
0
0
0
1
1
1
0
0
Katz
g1
g2
g3
g5
2
1
1
A2
1
1
1
g4
g6
3
4
AAA
5
……
1
3
2
2
1
2
1
2
5
2
1
2
1
2
2
3
1
2
1
1
1
1
2
1
1
2
2
2
1
3
Katz
How to get the similarity matrix?
k
Sij l ( A ) ij , l
l
l 1
Katz measure:
k
S
katz
l l
A ( I A) I ,
l
l 1
l
l 0
1
1
A
2
Small values of k (k=3 or k=4) are known to yield
competitive performance in the task of
recommending similar nodes.
Katz on the heterogeneous network
Adjacency matrix of heterogeneous network:
AG
A T
B
B BHS
APHS
AP
0
B
AP
BS
0
APS
AG
gene-gene network
B
the bipartite network genes
and phenotypes
APHS
the similarity matrix of
human diseases
APS
the similarity matrix of
phenotypes of other species
Katz on the heterogeneous network
Katz similarity measure specialized to A:
k
S Katz ( A) ij l ( Al ) ij
l 1
K=3,the similarities between gene nodes
and human disease nodes could be
denoted by S HKatz
s ( A)
2
S HKatz
(
A
)
B
( AG BHs BHs APHs )
s
Hs
3
2
(
BBT BHs AG BHs AG BHs APHs BHs APHs
)
2
CATAPULT
How to train a biased SVM?
T
the number of bootstraps
A
the sets of positive
the set of unlabeled gene-phenotype pairs
n+
the number of examples in A
Step 1: Draw a bootstrap sample U
of size n+ .
Step 2: Train a linear classifier θ using the positive
training
examples A and U as negative examples.
CATAPULT
How to train a biased SVM?
Step 2: Training classifier
CATAPULT
How to train a biased SVM?
Step 3: For any x U \ U t update: