In Situ Evaluation of Entity Retrieval and Opinion Summarization

Download Report

Transcript In Situ Evaluation of Entity Retrieval and Opinion Summarization

In Situ Evaluation of Entity Ranking
and Opinion Summarization
using
www.findilike.com
Kavita Ganesan & ChengXiang Zhai
University of Illinois @ Urbana Champaign
What is findilike?
• Preference – driven search engine
– Currently works in hotels domain
– Finds & ranks hotels based on user preferences:
Structured: price, distance
Unstructured: “friendly service”, “clean”, “good views”
(Based on existing user reviews)  UNIQUE
• Beyond search: Support for analysis of hotels
– Opinion summaries
– Tag cloud visualization of reviews
…What is findilike?
• Developed as part of PhD. Work – new system
(Opinion-Driven Decision Support System, UIUC, 2013)
• Tracked ~1000 unique users from Jan - Aug ‘13
– Working on speed & reaching out to more users
DEMO
2 Components that can be evaluated
through natural user interaction
1
Summarization of reviews
Generating short phrases
summarizing key opinions
(Ganesan et. al 2010, 2012)
Ranking entities based on
unstructured user preferences
Opinion-Based Entity Ranking
(Ganesan & Zhai 2012)
2
Evaluation of entity ranking
• Retrieval
– Interleave resultsBase
Balanced
interleaving
DirichletLM
A click indicates preference…
Base
(T. Joachims, 2002)
Snapshot of pairwise comparison
results for entity ranking
Algorithms
# Queries
DirichletLM,
is Better
Base,A PL2
A
B
DLM Base
PL2 Base
…
…
# Queries
B is better
CA > CB
CB > CA
(A Better) (B Better)
…
CA = CB > 0
(Tie)
CA = CB = 0
Total
30
35
2
5
72
10
28
3
7
48
…
…
…
…
Snapshot of pairwise comparison
results for entity ranking
A
B
DLM Base
PL2 Base
…
…
CA > CB
CB > CA
(A Better) (B Better)
…
30
35
10
28
…
Base model
better, but DLM
Base model
not
behind
CA =too
CB &
>far
0PL2
CAnot
= CB = 0
better
(Tie)
too good
…
Total
2
5
72
3
7
48
…
…
Evaluation of review summarization
ALGO1
ALGO2
Randomly mix top N
phrases from two
algorithms
Monitor clickthrough on per
entity basis
More clicks on phrases from Algo1 vs. Algo2 
Algo1 better
How to submit a new algorithm?
Extend
existing code
Write Java
based code
Test on mini test
bed
Submit code
Sample Code
Performance
report
Test Data & Gold
Standard
A
B
Evaluator
CA > CB
(A Better)
CB > CA
(B Better)
DLM
Base
30
35
PL2
Base
10
28
…
…
(nDCG, ROUGE)
Mini Testbed
Implementation
Local performance
…
…
Online Performance
More information about evaluation…
eval.findilike.com
Thanks! Questions?
Links
• Evaluation: http://eval.findilike.com
• System: http://www.findilike.com
• Related Papers: kavita-ganesan.com
References
•
Ganesan, K. A., C. X. Zhai, and E. Viegas, Micropinion Generation: An Unsupervised
Approach to Generating Ultra-Concise Summaries of Opinions, Proceedings of the
21st International Conference on World Wide Web 2012 (WWW '12), 2012.
•
Ganesan, K. A., and C. X. Zhai, Opinion-Based Entity Ranking, Information Retrieval,
vol. 15, issue 2, 2012
•
Ganesan, K. A., C. X. Zhai, and J. Han, Opinosis: A Graph Based Approach to
Abstractive Summarization of Highly Redundant Opinions, Proceedings of the 23rd
International Conference on Computational Linguistics (COLING '10), 2010.
•
T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of
the eighth ACM SIGKDD international conference on Knowledge discovery and
data mining, KDD ’02, NY, 2002.
Evaluating Review Summarization
Mini Test-bed
• Base code to extend
• Set of sample sentences
• Gold standard summary for those sentences
• ROUGE toolkit to evaluate the results
• Data set based on - Ganesan et. al 2010
Evaluating Entity Ranking
Mini Test-bed
• Base code to extend
• Terrier Index of hotel reviews
• Gold standard ranking of hotels
• Code to generate nDCG scores.
• Raw unindexed data set for reference
Building a new ranking model
Extend Weighting
Model