Talk - UCLA Computer Science

Transcript Talk - UCLA Computer Science

Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo
The University of Hong Kong
Speaker: Ruirui Li
1
Outline
 Motivation
 Problem Statement
 DQR Model
 Experiments & Evaluation
2
Motivation
 Massive information arose on the Internet.
Number of Indexed URLs by Google
???
1 trillion
8 billion
year
2004
Numbers from Google Annul report in 2004 and its official blog in 2008
2008
2012
3
Motivation
Search intent
 User
activities in the searching Process.
Lodging CIKM
query
: CIKM living place
Search results
clicks
: CIKM 2012 Hotel
user
: Maui Hotel
4
Motivation
Search intent
Lodging CIKM
Search results
query
clicks
: CIKM living place
: CIKM 2012 Hotel
user
: Maui Hotel
Mine
5
Motivation
 The effectiveness of IR depends on input queries.
 Users suffer:
 Translating human thoughts (search intent) into a
concise set of keywords (query) is never straightforward.
Search intent
Search results
Lodging CIKM
: CIKM living place
clicks
: CIKM 2012 Hotel
: Maui Hotel
6
Motivation
 Input queries are short.
 Composed of only one or two terms.
 Number of terms in a query.
number of terms per query
1
2
3
4
4+
proportion
36.8% 25.2% 17.3% 10.0% 10.7%
62%
7
Motivation
 Short queries lead to two issues.
 Issue 1. Ambiguity:

Example: query ``jaguar’’
Cat
Automobile Brand
NFL Team
 Issue 2. Not specific enough:

Example: query ``Disney’’
Park
Store
Cartoon
8
Motivation
 Most traditional approaches focus on relevance.
 1. The most relevant queries to the input query tend to be similar to
each other.
 2. This generates redundant and monotonic recommendations.
 3. Such recommendations provide limit coverage of the
recommendation space.
9
Motivation
 A recommender should provide queries that are not only
relevant but also diversified.

With diversified recommendations:
 1. We can cover multiple potential search intents of the user.


2. The risk users won’t be satisfied is minimized.
3. Finally, users find desired targets in fewer recommendation cycles.
10
Problem statement
Input: a query q and an integer m.
Output: a list of recommended queries Y.
m: Number of recommended queries
Recommended queries Y
Query q
Query recommender
GOAL: At least one query in Y is relevant to the user’s search intent.
11
Problem statement
Input: a query q and an integer m.
Output: a list of recommended queries Y.
m: Number of recommended queries
Recommended queries Y
Query q
Query recommender
GOAL: At least one query in Y is relevant to the user’s search intent.
12
Problem statement
Five properties:
1. Relevance.
2. Redundancy-free.
3. Diversity.
4. Ranking.
5. Real time response.
m: Number of recommended queries
Recommended queries Y
Query q
Query recommender
13
DQR: framework
Offline: Redundancy-free issue.
Online: Diversity issue.
Mine query concepts from search log. Propose a probabilistic diversification model.
14
DQR: offline
 Mining query concepts.


The same search intent can be expressed by different queries.
Example: ``Microsoft Research Asia’’, ``MSRA’’, ``MS Research Beijing’’.
 A query concept is a set of queries which express the
same or similar search intents.
Microsoft Research Asia
MSRA
MS Research Beijing
15
DQR: online
16
DQR: online
 Greedy strategy:
Concept selection
1. Input query:
1.
2.
3. …
2.
Concept pool
…
m.
17
DQR: diversification



: query concept belongs.
: query concepts already selected.
: query concept to be selected.
 Objective function:


Favor query concepts which are relevant to the input query.
Penalize query concepts which are relevant to the query
concepts already selected.
18
DQR: diversification
 Objective function:
 Estimation:
19
DQR: diversification
 Click set s: A set of clicked URLs.
20
DQR: diversification
 Objective function:

Relevance:

Diversity:
21
Experiments
 Datasets:




Search log collected from
Search log collected from
search engine.
search engine.
AOL time period: 01 March, 2006-31 May, 2006.
SOGOU time period: 01 June, 2008-31 June, 2008.
22
Baseline
 No golden standard for query commendation
23
Evaluation
 User study

12 users, 60 test queries
24
Evaluation
 For a test query q and recommendations by a certain approach.
Recommendations
 Three relevance levels:



Irrelevant (0 points)
Partially relevant (1 point)
Relevant (2 points)
25
Evaluation
 Three performance Metrics:



Relevance
Diversity
Ranking
26
Relevance
Results on AOL
Query level
Concept level
27
Diversity
 Metric: Intent-Coverage


It measures the unique search intents covered by the top m
recommended queries.
Since each intent represents a specified user search intent,
higher Intent-Coverage indicates higher probability to satisfy
different users.
28
Evaluation
 For a test query q and recommendations by a certain approach.
Recommendations
 Three relevance levels:



Irrelevant (0 points)
Partially relevant (1 point)
Relevant (2 points)
29
Diversity
 Metric: Intent-Coverage
Results on AOL
30
Ranking
 Metric: Normalized Discounted Cumulative Gain (NDCG)
Results on AOL
31
Thanks!
 Questions
 Suggestions
32
Diversity ranking
 Metric:
Results on AOL
Results on SOGOU
33
Motivation
 Diversification is highly needed by the use of mobile
devices.

One in Seven queries come from mobile devices.
13.3 inch
15.4 inch
17.0 inch
3.5 inch
Screen size is much smaller
 With limited space.
Numbers from Global mobile statistics 2012 (mobiThinking)
34
DQR: clustering
 A Hawaii restaurant:



Unlimited tables.
Each table can hold unlimited customers.
Customers arrives in a stream.
 Problem: whenever a customer arrives, assign him to a
table.
 Properties:


Familiar people together.
Unfamiliar people apart.
35
DQR: clustering
 Customer stream
Compactness control:
36
DQR
 Extract representative query from query concept.
 Voting strategy:

Compute a score for each query q 2 C
(
1 if u submit q at least once;
vote(u; q) =
0 ot herwise.

A score for each query q is therefore computed as:
X
scor e(q) =
vote(u; q)
u2 U
37
Relevance
Results on SOGOU
38
CIKM
 Proc. of 2012 Int. Conf. on Information and Knowledge
Management (CIKM’12), Maui, Hawaii, Oct. 2012, to
appear
39

Talk - UCLA Computer Science

Transcript Talk - UCLA Computer Science

Directory