Visual Data - L3S Research Center

Download Report

Transcript Visual Data - L3S Research Center

Information Extraction from Multimedia
Content on the Social Web
Stefan Siersdorfer
L3S Research Centre, Hannover, Germany
Meta Data and Visual Data on the Social Web
Meta Data:
•
•
•
•
•
•
•
Tags
Title Descriptions
Timestamps
Geo-Tags
Comments
Numerical Ratings
Users and Social Links
Visual Data:
•
•
Photos
Videos
How to exploit combined information from visual
data and meta data?
Example 1: Photos in Flickr
Example 2: Videos in Youtube
Social Web Environments as Graph Structure
User 1
User 2
Video 1
User 3
Group 2
tag1
tag3
Video 2
Video 3
Entities (Nodes):
•
•
•
•
Rescources (Videos, Photos)
Users
Tags
Groups
Relationships (Edges):
•
•
•
•
User-User: Contacts, Friendship
User-Resources: Ownership, Favorite Assignment, Rating
User-Groups: Membership
Resource-Resource: visual similarity, meta data similarity
tag2
User Feedback on the Social Web
•
•
•
•
•
•
•
Numeric Ratings, Favorite Assignments
Comments
Clicks/Views
Contacts, Friendships
Community Tagging
Blog Entries
Upload of Content
How can exploit the community feedback?
Outline
• Part 1: Photos on the Social Web
 1.1) Photo Attractiveness
 1.2) Generating Photo Maps
 1.3) Sentiment in Photos
• Part 2: Videos on the Social Web
 Video Tagging
Part I: Photos on the Social Web
1.1) Photo Attractiveness *
* Stefan Siersdorfer, Jose San Pedro
Ranking and Classifying Attractiveness of Photos in Folksonomies
18th International World Wide Web Conference, WWW 2009, Madrid, Spain
Attractiveness of Images
Which factors influence the human perception of attractiveness?
Landscape
Portrait
Flower
10
Attractiveness Visual Features
Human visual perception mainly influenced by
Color distribution
Coarseness
These are complex concepts
Convey multiple orthogonal aspects
Necessity to consider different low level features
11
Attractiveness Visual Features
Color Features
Brightness
Contrast
Luminance, RGB
Colorfulness
Naturalness
Saturation
Mean, Variance
Intensity of the colors
Saturation is 0 for grey scale images
12
Visual Features
Coarseness
Resolution + Acutance
Sharpness
Critical importance for final appearance of photos [Savakis 2000]
13
Textual Features
We consider user generated meta data
Correlation of topics with image appealing
(ground truth: favorite assignments)
Tags seem appropriate to capture this information
Attractiveness of Photos
Community-based models for classifying/ranking images
according to their appeal. [WWW´09]
Flickr
Photo
Stream
Inputs
#views
#comments
#favorites
...
Community Feedback
(photo’s interestingness)
Content
(visual features)
cat, fence, house
Metadata
(textual features)
Classification &
Regression Attractiveness
Models
Generator
Classification & Regression Models
16
Experiments
17
1.2) Generating Photo Maps *
*Work and illustrations from
David Crandall, Lars Backstrom, Dan Huttenlocher, Jon Kleinberg,
Mapping the World's Photos,
18th International World Wide Web Conference, WWW 2009, Madrid, Spain
Outline: Photos maps
• Use geo-location, tags, and visual features of photos to
 Identify popular locations and landmarks
 Find out location of photos
 Estimate representative images
Spatial Clustering
tatemodern
eiffel
paris
louvre
london
trafalgarsquare
Each data point corresponds to (longitude,latidue) of an image
Mean shift clustering is applied to get hierarchical structure
Most distinctive popular tags are used as labels
(# photos tag in cluster/ # photos with tag in overall set)
Estimating Location of Photos without tags
• Train SVMs on Clusters
 Positive Examples: Photos in Clusters
 Negative Examples: Photos outside the Cluster
• Feature Representation
 Tags
 Visual features (SIFT)
• Best Performance for Combination of Tags and SIFT features
Finding Representative Images
Construct Weighted Graph:
-Weight based on visual similarity of images (using SIFT features)
-Use Graph Clustering (e.g. spectral clustering) to identify tightly
connected components
-Choose image from this connected component
Example 1: Europe
Example 2:
New York
1.2) Sentiment in Photos *
* Stefan Siersdorfer, Jonathon Hare, Enrico Minack, Fan Deng
Analyzing and Predicting Sentiment of Images on the Social Web
18th ACM Multimedia Conference (MM 2010), Florence, Italy
Sentiment Analysis of Images
Data: more than 500,000 Flickr Photos
Image Features



Global Color Histogram: a color is present in the image
Local Color Histogram: a color is present at a particular location
SIFT Visual Terms: b/w patterns rotated and scaled
Image Sentiment

SentiWordNet: provides sentiment values for terms
 e.g. (pos, neg, obj) = (0.875, 0.0 , 0.125) for term „good“
 used for obtaining sentiment categories
 training set + ground truth for experiments
Which are the most discriminative visual terms?
• Use Mutual Information Measure to determine these features:
• Probabilities (estimated through counting in image corpus):
 P(t): Probability that visual term t occurs in image
 P(c): Probability that image has sentiment category c („pos“ or „neg“)
 P(t,c): Prob. that image is in category c and has visual term t
• Intuition: „Terms that have high co-occurence with a category are
more characteristic for that category.“
Most Discriminative Features
Most discriminative visual features:
Extracted using the Mutual Information measure
[ACM MM’11]
Part 2: Videos on the Social Web *
*
Stefan Siersdorfer, Jose San Pedro, Mark Sanderson
Content Redundancy in YouTube and its Application to Video Tagging
ACM Transactions on Information Systems (TOIS), 2011
Stefan Siersdorfer, Jose San Pedro, Mark Sanderson
Automatic Video Tagging using Content Redundancy
32nd ACM SIGIR Conference, Boston, USA, 2009
Near-duplicate Video Content
Youtube: most important video sharing environment
 [SIGCOM’07]: 85 M videos, 65 k videos/day, 100 M downloads per day,
Traffic to/from Youtube = 10% / 20% of the Web total
Redundancy: 25% of the videos are near duplicates
Can we use reduandancy to obtain richer
video annotations?
 Automatic tagging
Automatic Tagging
What is it good for?
 Additional information  Better user experience
 Richer feature vectors for ...
 Automatic data organization (classification and
clustering)
 Video Search
 Knowledge Extraction ( creating ontologies)
Overlap Graph
Video 1
Video 2
Video 3
Video 4
Video 5
Video 1
Video 2
Video 5
Video 3
Video 4
Neighbor-based Tagging (1): Idea
Video 1
A
B
C
Video 2
Video 4
A
E
Video 3
A
B
F
E
B
E
F
automatically
generated
• Video 4 contains original tags A, B; tags F,E are obtained from
neighbors
• Criteria for automatic tagging:
 Prefer tags used by many neighbors
 Prefer tags from neighbors with a strong link
Neighbor-based Tagging (2): Formal
Given: GO = (VO ; EO ) directed overlap graph
with weights w(vi ; vj ) =
jvi \vj j
jvj j
Relevance of tag t for video vi :
rel(t; vi ) =
Weights
correspond to
overlap
P
I(t;
v
)w(v
;
v
)
j
j
i
(vj ;vi )2EO
Sum over all
neighbors
Indicator
function
Neighbor-based Tagging (3)
Apply additional smoothing for redundant regions
Overlap Region
Number of neighbors
with tag t
rel(t; v) =
X
X2P(N(v))
Subsets of
neighbors
k(X)¡1
X
i=0
i
® ¢
¯
¯
¯
¯
\
[
¯
¯
¯v \
¯
x
u
¡
¯
¯
¯ x2X
¯
u2N(v)¡X
jvj
Smoothing factor
TagRank
• Takes also transitive relationships into account
• PageRank-like weight propagation
rel(t; vi ) = T R(vi ; t) =
X
T R(vj ; t)w(vj ; vi )
(vj ;vi )2EO
or in matrix form as Eigenvector equation
0
w(v1 ; v1 )
B w(v2 ; v1 )
TR(t) = B
..
@
.
w(vn ; v1 )
with start vector
w(v1 ; v2 )
w(v2 ; v2 )
..
.
w(vn ; v2 )
¢¢¢
¢¢¢
..
.
¢¢¢
1T 0
1
w(v1 ; vn )
T R(v1 ; t)
w(v2 ; vn ) C B T R(v2 ; t) C
C ¢B
C
..
.
A
@
A
.
.
.
w(vn ; vn )
T R(vn ; t)
³
´T
TR(t) = I(t; v1 ); : : : ; I(t; vn )
Applications of Extended Tag Respresentation
• Use relevancies rel( t, vi) for constructing enriched feature vectors for
videos:
combine original tags with new tags weighted by relevance values
•
automatic annotation :
use thresholding to select most relevant tags for a given videos
 Manual assessment of tags show their relavance
•
Data organization:
 Clustering and Classification experiments (Ground truth: Youtube categories of
videos)
 Improved performance through enriched feature representation
Summary
•
Social Web contains visual information (photos, videos) and meta data
(tags, time stamps, social links, spatial information, ..)
•
A large variety of users provide explicit and implict feedback in social web
environments (ratings, views, favorite assignments, comments, content of
uploaded material)
•
Visual Information & annotations can be combined to obtain enhanced
feature representations
•
Visual information can help to establish links between resources such as
videos (application: information propagation)
•
Feature representations in combination with community feedback can be
used for machine learning (appliciation: classification, mapping).
References
Stefan Siersdorfer, Jose San Pedro, Mark Sanderson
Content Redundancy in YouTube and its Application to Video Tagging
ACM Transactions on Information Systems (TOIS), 2011
Stefan Siersdorfer, Jonathon Hare, Enrico Minack, Fan Deng
Analyzing and Predicting Sentiment of Images on the Social Web
18th ACM Multimedia Conference (MM 2010), Florence, Italy
Stefan Siersdorfer, Jose San Pedro, Mark Sanderson
Automatic Video Tagging using Content Redundancy
32nd ACM SIGIR Conference, Boston, USA, 2009
Stefan Siersdorfer, Jose San Pedro
Ranking and Classifying Attractiveness of Photos in Folksonomies
18th International World Wide Web Conference, WWW 2009, Madrid, Spain
David Crandall, Lars Backstrom, Dan Huttenlocher, Jon Kleinberg
Mapping the World's Photos
18th International World Wide Web Conference, WWW 2009, Madrid, Spain