Transcript Slides

Towards Ontology Learning from
Folksonomies
Jie Tang*, Ho-fung Leung#, Qiong Luo+,
Dewei Chen*, and Jibin Gong*
*Dept.
of Computer Science and Technology, Tsinghua University
#Dept. of Computer Science and Engineering, The Chinese U. of Hong Kong
+Dept. of Computer Science, Hong Kong U. of Science and Technology
July. 14th 2009
1
Motivation
• The Semantic Web aims to provide a Web environment in
which each Web document is annotated with machinereadable metadata (e.g., concept from an ontology).
– Manual annotation tool, e.g., Protégé (Noy, et al., IS’01)
– Automatic annotation methods using ML, e.g., iASA (Tang, et al.,
JoDS’05), TCRF(Tang, et al., ISWC’06)
• Folksonomy provides a way to annotate the Web…
– , but a really free way……
– It also poses a big challenge in reliability and consistency due to
the lack of terminological control.
• This work aims to learn ontology from folksonomies
2
Motivating Example
Things
web
Merge
Web2.0
web2
social web
Merge
semanticweb
semweb
Data mining
clustering
Merge
ontologyfolksonomy
ontology
folksonomy
tag
foaf
tag-clustering
3
Motivating Example
Things
web
Merge
Web2.0
web2
social web
Merge
semanticweb
semweb
Several key challenges:
Data mining
clustering
Merge
ontologyfolksonomy
• How to define this problem inontology
a principled
way?
folksonomy
• How to model the synonym/hypernym/homonym
tag
foaf
between tags?
tag-clustering
•How to construct the hierarchical ontology according to
the modeling results?
4
Our Solution
1. Use topic to model tags
and documents.
Web2.0
web2
social web
ontology
...
tag
foaf
2. Define four divergence
measures to estimate the
difference between tags.
3. Present an algorithm to
construct the hierarchical
structure from the tags.
tags
Tags
documents
Documents
-------------------------------------------------------------
Ontology-based
...
-------------------------------------------------------------------------------------------------------------------------
5
-------------------------------------------------------------
Outline
• Related Work
• Our Approach
– Modeling Folksonomy
– Divergence Estimation
– Hierarchical Structure Construction
• Experiments
• Conclusion & Future Work
6
Previous Work
Ontology learning from text
• WebOntEx (Han and Elmasri, 03);
• Protégé plug-in (Buitelaar et al., 99);
• (Maedche and Staab, 2001; Sleeman et al., 03); etc.
Web2.0
web2
social web
ontology
...
tag
foaf
tags
Folksonomy integration
• Learning syno-/hyper-nym between tags(Li et al., 07);
• Clustering tags (Specia and Motta, 2007);
• Learning hierarchical relations between tags (Zhou et
al., 07);
• Non-taxonomic relations (Mori et al., 06); etc.
Tags
documents
Documents
-------------------------------------------------------------
Ontology-based
...
-------------------------------------------------------------------------------------------------------------------------
7
-------------------------------------------------------------
Topic models
• PLSI (Hofmann, 1999); LDA (Blei et al., 03); Authortopic model (Steyvers et al., 04); etc.
Outline
• Related Work
• Our Approach
– Modeling Folksonomy
– Divergence Estimation
– Hierarchical Structure Construction
• Experiments
• Conclusion & Future Work
8
How to model tags and documents?
• Input: Assume that a tag ti is used to
annotate multiple documents and a
document d contains a vector wd of Nd
words. Then a set of tags with the
annotated documents can be represented as
Web2.0
web2
social web
ontology
...
tag
foaf
Tags
Documents
• Modeling: how to represent each document
and each tag? and how to characterize the
relationship between documents and tags?
-------------------------------------------------------------
Ontology-based
...
-------------------------------------------------------------
words
tags
Tag-Topic (TT) Models
9
-------------------------------------------------------------
topic
-------------------------------------------------------------
Generative Story of Tagging
• Generative process
Document
Latent Dirichlet Co-clustering
IR
NLP
ML
P(w|z)
1
2
3
4
mining
0.23
clustering
0.19
classification 0.17
….
DM
Data mining
NLP
IR
DM
P(w|z)
ML
probabilistic model
……
10
1
2
3
4
model
0.23
learning 0.19
boost
0.17
….
We present a generative model for
clusteringdocuments and terms.
clustering
Our model is a four hierarchical
bayesian model. We present efficient
inference
inference techniques based on
Markow Chain Monte Carlo. We
report results in document modeling,
document and terms clustering …
Tags: Data mining, clustering,
probabilistic model
Tag-Topic (TT) Models
Generative process:
words
tags
Topic
Tag-Topic (TT) Models
11
Topic Smoothing
The new objective function:
with
Smoothing term
Log-likelihood of the
tag-topic (TT) model.
12
Divergence Estimation
• Tag divergence
• Hypernym-divergence
• Merging-divergence
• Keep-divergence
13
Estimated topic
distribution
Posterior probability
derived from the topic
modeling results
Hierarchical Structure Construction
Correspond to a
divergence
Step 1.
Step 2.
14
Penalty to the
complex of the
generated hierarchy
Outline
• Related Work
• Our Approach
– Modeling Folksonomy
– Divergence Estimation
– Hierarchical Structure Construction
• Experiments
• Conclusion & Future Work
15
Data Sets and Evaluation Measures
• Data sets
– PAPER: 4,841 papers and their associated tags (8,071
unique tags and a total of 37,010 tags) from CITEULIKE
– MOVIE: 4,009 movies and their tags (18,559 unique
tags and a total of 142,498 tags) from IMDB
• Evaluation Measures
– Accuracy (against ODP or human judgement)
– Case study
• Baseline
– Hierarchical clustering
16
Accuracy Performance
17
Case Study—Movie
Merging
Merging
Merging
Merging
chicken daffy-duck
duck
donald duck
cartoon
looney tunes
bugs bunny
Merging
cat
pig
bird
bear
Merging
porky pig
Merging
dog cartoon
cartoon cat
Merging
Merging
cat versus mouse
tom and jerry
gambling
gambler
wager
money
Merging
Merging
automobile
automobile race
horse racing
horse race
Top
Top
Keep
cartoon
By clustering
cat
dog cartoon
Subordinate
By TT
Subordinate
cartoon
Subordinate
Keep Merging
Merging
gambler
gambling gambler
gambling
wager
Merging
looney
tunes tunes
looney
Subordinate
mouse
cartoon cat catMergingbird
Subordinate
Subordinate
bear
bear
Subordinate
chicken
bugs
bunny
pig
bugs
bunny
duck
wager
cat versus mouse
pig
Subordinate
money
royalty
mickey
Subordinate
Subordinate
Subordinate
mouse
money
Merging
porky pig
Merging
deception
automobile
Merging
minnie
porky pig
Merging
daffy-duck
dog cartoon
tom and
jerry
horse racing
automobile
race
mouse
chicken daffy-duck
cartoon cat
Merging horse racing
horse race
duck
donald duck
donald duck
Subordinate
horse race automobile
Merging
cat versus mouse
tom and jerry
By TT with smoothing
18
Case Study—Paper
Top
Merging
ontologies ontology
clustering
link-analysis
hierarchical-clustering
imaging
Merging
graph
graphs
web-graph
By TT
web
Merging
semantic semantics
radiology
rdf
Top
owl
Merging
semanticweb semantic-web
semantic_web
Merging
integration mapping
Merging
ontologies ontology
clustering
hierarchical-clustering
web
Merging
image
imaging
Merging
semantic
semantics
indexing
knowledge
Merging
knowledgediscovery
semanticweb semantic-web
semantic_web
medical-imaging
image-processing
Merging
systems
system
systembiology
Merging
rdf
owl
By TT with smoothing
19
taxonomy
ir
Outline
• Related Work
• Our Approach
– Modeling Folksonomy
– Divergence Estimation
– Hierarchical Structure Construction
• Experiments
• Conclusion & Future Work
20
Conclusion
• Formalize a novel problem of ontology learning from
folksonomies.
• Exploit a probabilistic topic model to model the tags
and their annotated documents and propose four
divergence measures.
• Present an algorithm to construct the hierarchical
structure from tags.
• Experimental results on two different types of realworld data sets show that our method can effectively
learn the ontological hierarchy from social tags.
21
Future Work
• Discover non-taxonomic relationship between tags
• Ontology learning from noisy tags
• Incremental ontology learning from the dynamic
tagging space
• Applications:
– Personalized tag recommendation
– Social tagging—guiding the tagging process
–…
22
Thanks!
Q&A
HP: http://keg.cs.tsinghua.edu.cn/persons/tj/
Open resource will be available soon at:
http://arnetminer.org/resources
23