Recommender Systems - The NSF Cloud and Autonomic

Download Report

Transcript Recommender Systems - The NSF Cloud and Autonomic

Recommender Systems
Sumir Chandra
The Applied Software Systems Laboratory
Rutgers University
Introduction




Information overload – decisions???
Too many domains, less experience, too much data
- books, movies, music, websites, articles, etc.
System providing recommendations to users based
on opinions/behaviors of others
- efficient attention, better matches, non-obvious
connections, keep users coming back for more …
E.g. – E-commerce: Reel.com, Levi’s, eBay, Excite
Commerce: call centers, direct marketing
Introduction (contd.)



Data sources: purchase data, browsing & searching
data, feedback by users, text comments, expert
recommendations
Taxonomy:
- text comments (expert/user reviews)
- attribute based (this author also wrote …)
- item-to-item correlation (people who bought this
item also bought …)
- people-to-people correlation (users like you …)
Primary transformation: recommendation aggregation
or good matching between recommender and seeker
Correlations
People-to-people correlation
Item-to-item correlation
 Collaborative filtering
 Connect users to items
they may be unaware of  Assumes user will- prefer like-minded prefer
 Based on keywords or
- prefer dissimilar dislike
features of object
 Object ranking by users
 Key statistic: high/low
- # people who bought  CF: majority rules, nearest
neighbor, weighted
A & B / # people who
averages (prediction, S.D.,
bought A
covariance) +ve or -ve
Design Issues
Technical Design Space
 Content of evaluation: single bit to unstructured
textual notations – ease of use, computation overload
 Explicit/Implicit evaluation: nature of recommendation
 User identity: real names, pseudonyms, anonymous
 Evaluation aggregation: research area – weighted
voting, content analyses, referral chains, etc.
 Evaluation usage: filtering out negatives, sorting of
items according to numeric evaluations, display
Design Issues (contd.)
Design Issues (contd.)
Domain-Space Characteristics
of items evaluated




Domain to which items belong
Sheer volume variable
Lifetime – rate of gathering
and distributing evaluations
Cost structure – miss a good
item, sample a bad one, costs
of incorrect decisions
Domain-Space Characteristics of
participants and evaluations




Set of recommenders
Recommendation density – do
recommenders tend to evaluate
many items in common
Set of consumers
Consumer taste variability –
taste matching better for larger
set, personalized aggregation
better when tastes differ
Design Issues (contd.)
Design Issues (contd.)
Social Implications
 Free Riders: take but not give; mandatory, monetary
incentives; weighted voting to avoid unfair evaluation;
discourage “vote early and often” phenomenon
 Privacy: information vs. privacy; privacy blends;
attributed credit for recommendation efforts; blind
refereeing as in peer review system
 Advertisers: charge recipients through subscription or
pay-per-use; advertiser support; charge owners of the
evaluated media
Recommender System Types



Collaborative/Social-filtering system – aggregation of
consumers’ preferences and recommendations to
other users based on similarity in behavioral patterns
Content-based system – supervised machine learning
used to induce a classifier to discriminate between
interesting and uninteresting items for the user
Knowledge-based system – knowledge about users
and products used to reason what meets the user’s
requirements, using discrimination tree, decision
support tools, case-based reasoning (CBR)
Content-based Collaborative
Information Filtering






Research Assistant Agent Project (RAAP)
Nagoya Institute of Technology, Japan
Registration, research profile – bookmark database
Interesting page -> agent suggestion -> classification
-> reconfirm or change
In parallel, agent checks for newly classified bookmarks
-> recommend to other users -> accept/reject on login
Text categorization: positive/negative examples, most
similar classifier for candidate class using term
weighting, with TF-IDF scheme in Information Retrieval
Content-based Collaborative
Information Filtering (contd.)




Relevance feedback – positive/negative prototypes;
similarity measure is simt(c,D) = (Qt+.Dt) – (Qt-.Dt)
Feature selection – removal of non-informative terms
using Information Gain (IG) using prob. of term present
Learning to recommend – agent counts with 2 matrices;
user vs. category matrix (for successful classification)
and user’s confidence factor (0.1 to 1) w.r.t. other users
to compute correlation
Circular reference avoided – verify that recommended
document is not registered in target’s database
Knowledge-based Systems







FindMe technique – knowledge-based similarity retrieval
User selects source item -> requests similar items
“Tweak” application – same but candidate set is filtered
prior to sorting, leaving only candidates satisfying tweak
Car Navigator – conversational interaction/navigation
focused around high-level responses
PickAFlick – multiple task-specific retrieval strategies
RentMe – query menus set, NLP to generate database
Recommender Personal Shopper (RPS) – a domainindependent implementation of FindMe algorithm
Knowledge-based Systems (contd.)






Similarity measures – goal-based, priorities for goals
Sorting algorithm – metric-based bucket sorting
Retrieval algorithm – priority-ordered metric constraints,
plus tweaks, forming an SQL query
Product data – creation of product database in which
unique items are associated with sets of features
Metrics – similarity, directional metrics with preference
Hybrid system – knowledge-based system with
collaborative filtering
Recommender Tradeoffs
Technique
Knowledgebased
Pluses
A. No ramp-up required
H. Knowledge engineering
B. Detailed qualitative preference
I. Suggestion ability is static
feedback
C. Sensitive to preference changes
Collaborative D. Can identify niches precisely
E. Domain knowledge not needed
filtering
Ideal Hybrid
Minuses
F. Quality improves over time
G. Personalized recommendations
J. Quality dependent on large
historical data set
K. Subject to statistical
anomalies in data
L. Insensitive to preference
changes
A, B, C, D, F, G
H
ARMaDA Recommender






No single partitioning scheme performs the best for all
types of applications and systems
Optimal partitioning technique depends on input
parameters and application runtime state
Partitioning behavior characterized by the tuple
{partitioner, application, computer system} (PAC)
PAC quality characterized by 5-component metric –
communication, load imbalance, data migration,
partitioning time, partitioning overhead
Octant approach characterizes application/system state
Adaptive meta-partitioner -> fully dynamic PAC
Dynamic Characterization
RM-3D Switching Test






Richtmyer-Meshkov fingering instability in 3 dimension
Application trace has 51 time-step iterations
RM-3D has more localized adaptation and lower
activity dynamics
Depending on computer system, application RM-3D
resides in octants I and III for most of its execution
Partitioning schemes pBD-ISP and G-MISP+SP are
suited for these octants
Application trace -> Partitioner -> Output trace ->
Simulator -> metric measurements
RM-3D Switching Test (contd.)
RM-3D Switching Test (contd.)
Test Runs
 CGD – complete run
 pBD-ISP – complete run
 CGD+pBD-ISP_load (for improved load balance)
0 – 12 -> CGD
13 – 22 -> pBD-ISP
23 – 26 -> CGD
27 – 36 -> pBD-ISP
37 – 48 -> CGD
49 – 51 -> pBD-ISP
 CGD+pBD-ISP_data (for reduced data migration)
0 – 10 -> CGD
11 – 28 -> pBD-ISP
29 – 34 -> CGD
35 – 51 -> pBD-ISP
RM-3D Switching Test (contd.)
Metric
CGD
pBD-ISP
CGD+pBDISP_load
CGD+pBDISP_data
Avg. max. load
imbalance
18.9048 %
37.9821 %
34.749 %
39.3693 %
Avg. avg. data
movement
127.275
18.3137
187.431
110.216
Avg. avg. intralevel comm.
1063.43
429.804
691.608
723.569
Avg. avg. interlevel comm.
451.49
0
265.882
127.667
Avg. max. no.
of boxes
210.333
2.98039
16.9804
84.8824
Conclusions



YES !!! Experimental results conform to theoretical
observations
Recommender systems in ARMaDA can result in
performance optimization
Future work
- more robust rule-set and switching policies
- partitioner/hierarchy optimization at switch-points
- integration of recommender engine within ARMaDA
- partitioner and application characterization research
to form policy rule base