Transcript Document

Hierarchical Relational Models for
Document Networks
Jonathan Chang and David Blei
Facebook and Princeton University
The Annals of Applied Statistics, 2010
Images and some text are from the original paper.
Presented by
Haojun Chen
Introduction
• Network data attracted lots of research interests in
machine learning and applied statistics.
• Previous work focused only for the network structure but
ignores the attributes of nodes.
For example, in a citation network of articles, text and abstracts of
documents should be used for exploiting the latent structure in the
data too.
• In this paper, Relational Topic Model (RTM) is developed
for network data, which accounts for both links and node
attributes.
Data Example for RTM
Graphical Model for RTM
Generative Process for RTM
Link Probability Function
• Four Link Probability Function:
CDF of Normal distribution
: Hadamard product
Model Inference, Estimation and Prediction
• Variational inference for
•
and
Maximum likelihood estimate for
• Prediction
– Link prediction from words
– Word prediction from links
,
and
Empirical Results
• Data summary
• Three experiments
– Evaluating the predictive distribution
– Automatic link suggestion
– Modeling spatial data
Evaluating Predictive Distribution (1/2)
Lower is Better
Evaluating Predictive Distribution (2/2)
Automatic Link Suggestion (1/3)
• Citation suggestion
Suggest citation given the abstract
• Cora dataset and number of Topic is set to 10
• RTM improves precision over LDA+Regression by 80%
in the first 20 documents retrieved from the model
Automatic Link Suggestion (2/3)
Automatic Link Suggestion (3/3)
Modeling Spatial Data (1/4)
• Local News Data: 51 documents and each document for
one state
• Number of Topic is set to 5
• Word are ranked by the following score:
Modeling Spatial Data (2/4)
• Each color depicts a single topic. Each state’s color intensity indicates
the magnitude of that topic’s component.
• Corresponding words associated with each topic are given in the table.
RTM
LDA
Modeling Spatial Data (3/4)
RTM
LDA
Modeling Spatial Data (4/4)
RTM
LDA
Discussion
• Relational Topic Model (RTM) is a hierarchical model of
networks and per-node attribute data.
• It is demonstrated qualitatively and quantitatively that
RTM is effective and useful mechanism for analyzing and
using network data.