Transcript ppt
Presenter: Lung-Hao Lee (李龍豪)
April 22, 2010 @ Room 319
Introduction
Related Work
Using Clickthrough Data to Label Images
◦ Directing Image labelling
◦ Transitive (inherited) image labelling
Experiments and Analysis
Conclusions
2
User judgments in content classification
◦ Classifying and tagging images based on the
collective wisdom of users, derived from their
interactions with search engines
3
Content-based image retrieval (CBIR)
◦ Colour, shape, texture, …
Analysis of surrounding content
◦ Automatic surrounding-text labelling
e.g. Google image search
Object labelling in consensual systems
◦ Flickr, del.icio.us, …..
4
Extract clickthrough data for images (.gif, .jpg,
.png) from raw web logs
An Image was selected at least 2 times from
the results page for the image search on the
search term
*.gif / *.jpg / *.png
Query: “apple”
click
Label: “apple”
5
Label HTML or PDF pages in the same way
Propagate the label onto images contained with in
the Web page
Irrelevant images (“non-content”) were filtered out
as follows:
◦
◦
◦
◦
◦
Too small: images with width or height 50 pixels
Advertisements: based on lists of advertisement sources
Repeated
Logos: anything match *logo* was excluded
Aspect ratio: too narrow was excluded
Query: “apple”
click
6
The six methods comprise:
◦ Direct Labelling
◦ Transitive Labelling
◦
◦
◦
◦
Google Image Search (GIS)
Flickr
Getty Images
Google Image Labeller (GIL)
7
Web logs from the University of Teesside
School (March 2006 to the present)
71 distinct query terms was selected as the
set of all “included terms”
Direct labelling
◦ 405 image/label pairs
Transitive labelling
◦ 445 unique image/label pairs
(260 were content and 185 were non-content)
8
For each of the 71 search terms, a search was
submitted to the site
The first returned 14 (or fewer if not available )
images were selected
Google Image Search (GIS): 966 image/label pairs
Flicker: 958 image/label pairs
Getty Images: 931 image/label pairs
9
Google Image Labeler (GIL): 988 image/label pairs
◦ 4 participants played approximately 200 rounds each of the game
◦ Recording image URLs and any “off-limit” and “included” tags
10
No pair was ranked fewer than 7 times and over
93% of pairs had over 10 rankings
Likert scale:
◦ 0: “not sure” ; 1: “not relevant”
◦ 2: “partly relevant” ; 3: “fully relevant”
11
Precision= #3 / (#1+ #2 + #3)
Direct labelling is 0.8444
Transitive labelling is 0.6441
GIS, GIL, Getty and Flickr is 0.8006, 0.5921,
0.4493, 0.6293 respectively
12
Partial Precision = (#2 + #3) /(#1 + #2 + #3)
Direct labelling is 0.9294
Transitive labelling is 0.8051
GIS, GIL, Getty and
Flickr is 0.9094, 0.8839,
0.6909, 0.819 respectively
13
For the transitive labelling method
Accuracy= #1 / (#1 + #2 + #3)
14
For direct labelling
“selection weight” is the number of clicks
from which the label was derived
15
The use of direct labelling demonstrates that
implicit
relevance
feedback
derived
from
clickthrough can be use to improve relevance while
transitive labelling shows promise as an alternative
method for achieving the same goal
16
Thank you very much
Questions & Answers
17