Presentation Slides

Download Report

Transcript Presentation Slides

TECHNOLOGY
C O R PO RATE
Infinite Hidden Relational Models
Zhao Xu1, Volker Tresp2, Kai Yu2, Shipeng Yu and Hans-Peter
Kriegel1
1 University of Munich, Germany
2 Siemens Corporate Technology, Munich, Germany
Information &
Communications
Intelligent
Autonomous
Systems
Motivation
• Relational learning is an object oriented approach to
representation and learning that clearly distinguishes between
entities (e.g., objects), relationships and their respective attributes
and represents an area of growing interest in machine learning
• Learned dependencies encode probabilistic constraints in the
relational domain
• Many relational learning approaches involve extensive structural
learning, which is makes RL somewhat tricky to apply in practice
• The goal of this work is an easy to apply generic system which
relaxes the need for extensive structural learning
• In the infinite hidden relational model (IHRM) we introduce for each
entity an infinite-dimensional latent variable, whose state is
determined by a Dirichlet process
• The resulting representation is a network of interacting DPs
© Siemens AG, CT IC 4
Work on DPs in Relational Learning
• C. Kemp, T. Griffiths, and J. R. Tenenbaum (2004). Discovering Latent
Classes in Relational Data (Technical Report AI Memo 2004-019)
• Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. & Ueda, N. (2006).
Learning systems of concepts with an infinite relational model. AAAI 2006
• Z. Xu, V. Tresp, K. Yu, S. Yu, and H.-P. Kriegel (2005). Dirichlet enhanced
relational learning. In Proc. 22nd ICML, 1004-1011. ACM Press
• Z. Xu, V. Tresp, K. Yu, and H.-P. Kriegel. Infinite hidden relational models. In
Proc. 22nd UAI, 2006
• P. Carbonetto, J. Kisynski, N. de Freitas, and D. Poole. Nonparametric
bayesian logic. In Proc. 21st UAI, 2005.
© Siemens AG, CT IC 4
Ground Network With an Image Structure
A
R
R
A
R
R
R
R
A
A
A
A
R
R
A
R
A
A
R
R
R
A
R
R
R
R
A
R
A
• Ground Network
• A: entity attributes
• R: relational attributes (e.g.,
exist, not exist)
• Limitations
• Attributes locally predict the
probability of a relational
attribute
• Given the parent attributes, all
relational attributes are
independent
• To obtain non local
dependency: structural learning
might be involves
© Siemens AG, CT IC 4
Ground Network With an Image Structure and
Latent Variables: The HRM
Z
R
A
R
Z
R
A
R
R
A
R
Z
Z
Z
A
Z
R
A
R
A
Z
R
A
A
Z
• Z: latent variable
A
Z
A
R
R
R
Z
R
R
R
R
Z
R
A
Z
A
• Information can now flow
through the network of latent
variables
• In an IHRM, Z can be thought
of as representing unknown
attributes (such as a cluster
attribute)
• Note, that in image processing,
Z would correspond to the true
pixel value, A to a noisy
measurement and R would
encode neighboring pixel value
constraints
© Siemens AG, CT IC 4
A Recommendation System
users
A
A
A
R R R R
items
users
A
A
Z
A
A
R R R R
items
Z
A
Z
• A relational attribute (like) only
depends on the attributes of
the user and the item
• If both attributes are weak,
we’re stuck
R R R R
A
Z
A
A
Z
A
Z
A
R R R R
A
Z
A
Z
A
• A relational attribute (like) only
depends on the states of the
latent variables of user and
item
• If entity attributes are weak,
other known relations are
exploits, we exploit
collaborative information
© Siemens AG, CT IC 4
The Hidden Relational Model (HRM)
users
 0u
u
Z
A
R
items

m
0
m
Z
Multinomial with Dirichlet
priors;
A
u
G0u
b
G0b
m
G0m
• A relational attribute (like) only
depends on the states of the
latent variables of user and
item
• If entity attributes are weak,
other known relations are
exploits, we exploit
collaborative information
Three Base
Distributions
For a DP model, the number of states
becomes infinite; the prior distribution
for is denoted as
 ~ Stick ( 0 )
The Infinite Hidden
Relational Model (IHRM)
© Siemens AG, CT IC 4
Inference in the IHRM
1. Gibbs sampler derived from the Chinese restaurant process
representation (Kemp et al. 2004, 2006, Xu et al. 2006);
2. Gibbs sampler derived finite approximations to the stick breaking
representation
1. Dirichlet multinomial allocation
2. Truncated Dirichlet process
3. Two mean field approximations based on those procedures
4. A memory-based empirical approximation (EA)
(2,3,4 in Xu et al 2006, submitted)
© Siemens AG, CT IC 4
Generative Model with CRP
© Siemens AG, CT IC 4
Generative Model with Truncated DP
© Siemens AG, CT IC 4
Inference (1): Gibbs Sampling with CRP
© Siemens AG, CT IC 4
Inference (2): Gibbs Sampling with Truncated DP
© Siemens AG, CT IC 4
Inference (3): Mean Field with Truncated DP
© Siemens AG, CT IC 4
Experimental Analysis on
Movie Recommendation (1)
• Task description
• To predict whether a user likes a movie given attributes of users and
movies, as well as known ratings of users.
• Data set: MovieLens
• Model
User
Like
Movie
Zu
R
Zm
User
Attributes
u
b
m
Movie
Attributes
u
G0u
 0u
G0b
 0m
m
G0m
© Siemens AG, CT IC 4
Experimental Analysis on
Movie Recommendation (2)
• Result
Prediction Accuracy (%)
Method
Time (s)
#Compu #Compm
given5 given10 given15 given20
GS-TDP
65.71
66.47
66.99
68.33
23497
67
41
MF-TDP
65.06
65.38
66.54
67.69
1014
9
6
EA
63.91
64.10
64.55
64.55
386
---
---
Note, for GS-TDP and MF-TDP, α0=100
943 users, 1680 movies
© Siemens AG, CT IC 4
Experimental Analysis on
Gene Function Prediction (1)
• Task description
• To predict functions of genes given the information on the genelevel and the protein-level, as well as interaction between genes.
• Data set: KDD Cup 2001
• Model
© Siemens AG, CT IC 4
Experimental Analysis on
Gene Function Prediction (2)
Interact
Zp
Rg,p
Phenotype
Observe
Zc
Rg,c
Gene
Zg
Gene
Attributes
belong
Complex
Rg,g
Zf
Have
Function
Rg,m
Zm
Contain
Motif
Rg,cl
Form
Structural
Category
Rg,f
Zcl
© Siemens AG, CT IC 4
Experimental Analysis on
Gene Function Prediction (3)
• An example gene
Attribute
Value
Gene ID
G234070
Essential
Non-Essential
Structural Category
1, ATPases 2, Motorproteins
Complex
Cytoskeleton
Phenotype
Mating and sporulation defects
Motif
PS00017
Chromosome
1
1, Cell growth, cell division and DNA synthesis
Function
2, Cellular organization
3, Cellular transport and transport mechanisms
© Siemens AG, CT IC 4
Experimental Analysis on
Gene Function Prediction (4)
• Results
Accuracy(%)
#Compgene
GS-TDP
89.46
15
MF-TDP
91.96
742
EA
93.18
---
Kdd cup
winner
93.63
---
Algorithm
© Siemens AG, CT IC 4
Experimental Analysis on
Gene Function Prediction (5)
• Results
The importance of a variety of relationships in
function prediction of genes
Relationships
Prediction Accuracy (%)
(without the relationship)
Importance
Complex
91.13
197
Interaction
92.14
100
Structural Category
92.61
55
Phenotype
92.71
45
Attributes of Gene
93.08
10
Motif
93.12
6
© Siemens AG, CT IC 4
Experimental Analysis on
Clinical Data (1)
• Task description
• To predict future procedures for patients given attributes of
patients and procedures, as well as prescribed procedures and
diagnosis of patients.
• Model
© Siemens AG, CT IC 4
Experimental Analysis on
Clinical Data (2)
Patient
Zpa
Patient
Attributes
pa
Take
Rpa,pr
pa,pr
pr
Procedure
Make
Zpr
Rpa,dg
Procedure
Attributes
pa,dg
dg
Diagnosis
Zdg
Diagnosis
Attributes
pa
G0pa
α0pa
G0pa,pr
α0pr
θpr
G0pr
G0pa,dg
α0dg
θdg
G0dg
© Siemens AG, CT IC 4
Experimental Analysis on
Clinical Data (3)
• Results
ROC curves for predicting procedures,
average on all patients
ROC curves for predicting procedures, only
considering patients with prime complaint
circulatory problem
E1: one-sided CF E2: 2-sided CF E3: full model E4: no hidden E5: content
based BN
© Siemens AG, CT IC 4
Conclusion
• The IHRM is a new nonparametric hierarchical Bayes model for
relational modeling
• Advantages
• Reducing the need for extensive structural learning
• Expressive ability via coupling between heterogeneous relationships
• The model decides itself about the optimal number of states for the latent
variables.
• Scaling:
• # of entities times # of occupied states times # of known relations
• Note: default relations (example: by default there is no relation) can often be
treated as unknown and drop out
• Conjugacy can be exploited
© Siemens AG, CT IC 4
A memory-based empirical approximation
• First, we assume the number of components to be equal to the
corresponding entities in the corresponding entity class
• Then in the training phase each entity contributes to its own
class only
• Based on this simplification the parameters in the attributes
and relations can be learned very efficiently. Note that this
approximation can be interpreted as relational memory-based
learning
• To predict a relational attributes we assume that only the states
of the latent variables involved in the relation are unknown
© Siemens AG, CT IC 4