Talk - UCLA Computer Science
Download
Report
Transcript Talk - UCLA Computer Science
Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo
The University of Hong Kong
Speaker: Ruirui Li
1
Outline
Motivation
Problem Statement
DQR Model
Experiments & Evaluation
2
Motivation
Massive information arose on the Internet.
Number of Indexed URLs by Google
???
1 trillion
8 billion
year
2004
Numbers from Google Annul report in 2004 and its official blog in 2008
2008
2012
3
Motivation
Search intent
User
activities in the searching Process.
Lodging CIKM
query
: CIKM living place
Search results
clicks
: CIKM 2012 Hotel
user
: Maui Hotel
4
Motivation
Search intent
Lodging CIKM
Search results
query
clicks
: CIKM living place
: CIKM 2012 Hotel
user
: Maui Hotel
Mine
5
Motivation
The effectiveness of IR depends on input queries.
Users suffer:
Translating human thoughts (search intent) into a
concise set of keywords (query) is never straightforward.
Search intent
Search results
Lodging CIKM
: CIKM living place
clicks
: CIKM 2012 Hotel
: Maui Hotel
6
Motivation
Input queries are short.
Composed of only one or two terms.
Number of terms in a query.
number of terms per query
1
2
3
4
4+
proportion
36.8% 25.2% 17.3% 10.0% 10.7%
62%
7
Motivation
Short queries lead to two issues.
Issue 1. Ambiguity:
Example: query ``jaguar’’
Cat
Automobile Brand
NFL Team
Issue 2. Not specific enough:
Example: query ``Disney’’
Park
Store
Cartoon
8
Motivation
Most traditional approaches focus on relevance.
1. The most relevant queries to the input query tend to be similar to
each other.
2. This generates redundant and monotonic recommendations.
3. Such recommendations provide limit coverage of the
recommendation space.
9
Motivation
A recommender should provide queries that are not only
relevant but also diversified.
With diversified recommendations:
1. We can cover multiple potential search intents of the user.
2. The risk users won’t be satisfied is minimized.
3. Finally, users find desired targets in fewer recommendation cycles.
10
Problem statement
Input: a query q and an integer m.
Output: a list of recommended queries Y.
m: Number of recommended queries
Recommended queries Y
Query q
Query recommender
GOAL: At least one query in Y is relevant to the user’s search intent.
11
Problem statement
Input: a query q and an integer m.
Output: a list of recommended queries Y.
m: Number of recommended queries
Recommended queries Y
Query q
Query recommender
GOAL: At least one query in Y is relevant to the user’s search intent.
12
Problem statement
Five properties:
1. Relevance.
2. Redundancy-free.
3. Diversity.
4. Ranking.
5. Real time response.
m: Number of recommended queries
Recommended queries Y
Query q
Query recommender
13
DQR: framework
Offline: Redundancy-free issue.
Online: Diversity issue.
Mine query concepts from search log. Propose a probabilistic diversification model.
14
DQR: offline
Mining query concepts.
The same search intent can be expressed by different queries.
Example: ``Microsoft Research Asia’’, ``MSRA’’, ``MS Research Beijing’’.
A query concept is a set of queries which express the
same or similar search intents.
Microsoft Research Asia
MSRA
MS Research Beijing
15
DQR: online
16
DQR: online
Greedy strategy:
Concept selection
1. Input query:
1.
2.
3. …
2.
Concept pool
…
m.
17
DQR: diversification
: query concept belongs.
: query concepts already selected.
: query concept to be selected.
Objective function:
Favor query concepts which are relevant to the input query.
Penalize query concepts which are relevant to the query
concepts already selected.
18
DQR: diversification
Objective function:
Estimation:
19
DQR: diversification
Click set s: A set of clicked URLs.
20
DQR: diversification
Objective function:
Relevance:
Diversity:
21
Experiments
Datasets:
Search log collected from
Search log collected from
search engine.
search engine.
AOL time period: 01 March, 2006-31 May, 2006.
SOGOU time period: 01 June, 2008-31 June, 2008.
22
Baseline
No golden standard for query commendation
23
Evaluation
User study
12 users, 60 test queries
24
Evaluation
For a test query q and recommendations by a certain approach.
Recommendations
Three relevance levels:
Irrelevant (0 points)
Partially relevant (1 point)
Relevant (2 points)
25
Evaluation
Three performance Metrics:
Relevance
Diversity
Ranking
26
Relevance
Results on AOL
Query level
Concept level
27
Diversity
Metric: Intent-Coverage
It measures the unique search intents covered by the top m
recommended queries.
Since each intent represents a specified user search intent,
higher Intent-Coverage indicates higher probability to satisfy
different users.
28
Evaluation
For a test query q and recommendations by a certain approach.
Recommendations
Three relevance levels:
Irrelevant (0 points)
Partially relevant (1 point)
Relevant (2 points)
29
Diversity
Metric: Intent-Coverage
Results on AOL
30
Ranking
Metric: Normalized Discounted Cumulative Gain (NDCG)
Results on AOL
31
Thanks!
Questions
Suggestions
32
Diversity ranking
Metric:
Results on AOL
Results on SOGOU
33
Motivation
Diversification is highly needed by the use of mobile
devices.
One in Seven queries come from mobile devices.
13.3 inch
15.4 inch
17.0 inch
3.5 inch
Screen size is much smaller
With limited space.
Numbers from Global mobile statistics 2012 (mobiThinking)
34
DQR: clustering
A Hawaii restaurant:
Unlimited tables.
Each table can hold unlimited customers.
Customers arrives in a stream.
Problem: whenever a customer arrives, assign him to a
table.
Properties:
Familiar people together.
Unfamiliar people apart.
35
DQR: clustering
Customer stream
Compactness control:
36
DQR
Extract representative query from query concept.
Voting strategy:
Compute a score for each query q 2 C
(
1 if u submit q at least once;
vote(u; q) =
0 ot herwise.
A score for each query q is therefore computed as:
X
scor e(q) =
vote(u; q)
u2 U
37
Relevance
Results on SOGOU
38
CIKM
Proc. of 2012 Int. Conf. on Information and Knowledge
Management (CIKM’12), Maui, Hawaii, Oct. 2012, to
appear
39