Transcript Slides

Efficient summarization
framework for multi-attribute
uncertain data
Jie Xu, Dmitri V. Kalashnikov, Sharad Mehrotra
1
The Summarization Problem
Uncertain Data Set
face (e.g. Jeff, Kate)
Extractive
O1
On
O2
O8
O11
O1
O25
location (e.g. LA)
visual concepts (e.g. water, plant, sky)
Abstractive
Kate Jeff wedding at LA
2
Summarization Process
Modeling Information
information
Extract best subset
dataset
summary
object
…
What information does this image contain?
Metrics?
-
Coverage Agrawal, WSDM’09; Li, WWW’09; Liu, SDM‘’09; Sinha, WWW’11
Diversity Vee, ICDE’08; Ziegler, WWW’05
Quality Sinha, WWW’11
3
Existing Techniques
image
Kennedy et al. WWW’08
Simon et al. ICCV’07
customer review
Hu et al. KDD’04
Inouye et al. SocialCom ’11
Ly et al. CoRR’11
Li et al. WWW’09
Sinha et al. WWW’11
•
doc/micro-blog
Liu et al. SDM’09
Do not consider information in multiple attributes
4
•
Do not deal with uncertain data
Challenges

Design a summarization framework for

Multi-attribute Data
face tags
visual concept
event

time
location
Uncertain/Probabilistic Data.
data processing
(e.g. vision analysis)
visual concepts
P(sky) = 0.7,
P(people) = 0.9
5
Limitations of existing techniques - 1
Existing techniques typically model & summarize a single information dimension
Summarize only information
about visual content
(Kennedy et al. WWW’08,
Simon et al. ICCV’07)
Summarize only information
about review content
(Hu et al. KDD’04,
Ly et al. CoRR’11)
6
What information is in the image?
Elemental IU
{sky}, {plant}, …
{Kate}, {Jeff}
{wedding}
{12/01/2012}
{Los Angeles}
Is that all?
Intra-attribute IU
Inter-attribute IU
{Kate, Jeff}
{sky, plant}
…
Even more information from attributes?
{Kate, LA}
{Kate, Jeff, wedding}
…
7
Are all information units interesting?
Is {Liyan,
Ling}
interesting?
Is {Sharad,
Mike} an
interesting
intra-attribute IU?
Yes,
Yes from
they often
my perspective,
have coffeebecause
togetherthey
andare
appear
both frequently
my close friends
in other photos
Are all of the 2n combinations of people interesting? Shall
we select a summary that covers all these information?
Well, probably not! I don’t care about person X and person Y
who happen to be together in the photo of this large group.
8
Mine for interesting information units
O1
O2
O3
O4
O5
face
face
face
face
face
{Jeff, Kate}
T1
{Tom}
T2
{Jeff, Kate, Tom}
T4
{Jeff, Kate}
T5
{Jeff, Kate}
Tn
Modified
Item-set
mining
algorithm
frequent
correlated
{Jeff, Kate}
…
{Kate, Tom}
…
On
T3
face
9
Mine for interesting information units
O1
O2
O3
O4
O5
face
face
face
face
face
{Jeff, Kate}
{Jeff}
{Jeff, Kate, Tom}
{Kate, Tom}
Mine from social
context
(e.g. Jeff is friend
of Kate,
Tom is a close
friend of the user)
{Jeff, Kate}
{Tom}
{Jeff, Kate}
…
On
face
{Jeff, Kate}
10
Limitation of existing techniques – 2
Can not handle probabilistic attributes
IU
dataset
P(Jeff) = 0.6
summary
Not sure whether an object
covers an IU in another object
?
1
P(Jeff) = 0.8
3
2
objects
3
n
…
n
11
Deterministic Coverage Model --- Example
information
Coverage = 8 / 14
dataset
summary
object
12
Probabilistic Coverage Model
Simplify to compute efficiently
Expected amount
of information
covered by S
Expected amount of
total information
Can be computed in polynomial time
The function is sub-modular
13
Optimization Problem for summarization

Parameters:


dataset O = {o1, o2, · · · , on}
positive number K

Finding summary with Maximum Expected Coverage is NPhard.

We developed an efficient greedy algorithm to solve it.
14
Basic Greedy Algorithm
Initialize S = empty set
Too many
operations of
computing Cov.
For each object o in O \ S,
Compute hkjhkhk
(Iteration-level
Optimization)
Yes
Select o* with max
No
done
Expensive to
compute Cov. It is
(Object-level
optimization)
Efficiency optimization – Object-level

Reduce the time required to compute the coverage for one
object

Instead of directly compute and optimize coverage in each
iteration, compute the gain of adding one object o to summary S
gain(S,o) =
-

Updating gain(S,o) is much more efficient (
)
16
Submodularity of Coverage
Expected Coverage Cov(S,O) is submodular:
Cov(S, O)
Cov(S ∪ o, O) – Cov(S, O)
Cov(T, O)
Cov(T ∪ o) - Cov(T, O)
17
Efficiency optimization – Iteration-level

Reduce the number of object-level computations (i.e.
gain(S,o) ) in each iteration of the greedy process

While traversing objects in O \ S, we maintain

the maximum gain so far gain*.

an upper bound Upper(S, O) on gain(S,o). For any
Update in constant time

By definition
By submodularity
prune an object o if Upper(S, o) < gain*.
18
Experiment -- Datasets

Facebook Photo Set
200 photos uploaded by 10 Facebook users
visual concept
event

time
face
Review Dataset
Reviews about 10 hotels from TripAdvisor.
Each hotel has about 250 reviews on average.
visual concept

rating
facets
Flickr Photo Set
20,000 photos from Flickr.
visual
event
time
19
Experiment – Quality
20
Experiment – Efficiency
Basic greedy algorithm without optimization runs more than 1 minute
21
Summary

Developed a new extractive summarization framework

Multi-attribute data.

Uncertain/Probabilistic data.

Generates high-quality summaries.

Highly efficient.
22
23