Overview of KDDCUP 2011

Download Report

Transcript Overview of KDDCUP 2011

Overview of KDDCUP 2011
Nathan Liu
[email protected]
KDDCUP 2011 Music Recommendation
• KDDCUP is the most prominent data mining
competition.
• In recent years, there have been a number of
contest related to movie recommendation:
– Netflix 2006: predict future ratings
– KDDCUP 2007: how many ratings and who rated what
– CAMRA 2010: context aware movie recommendation
• KDDCUP 2011 is organized by yahoo and provides
the first and largest music ratings datasets.
Yahoo Music
KDDCUP 2011
• There are three types of items: songs, artists, albums.
• Songs and albums are annotated with genres.
• You are given the date, time and scores of each user’s ratings of
these different items.
• Challenges:
– Scale: biggest public dataset ever. 1 million user, 0.6 million items, 300
million ratings
– Hierarchical item relation: song belong to albums, albums belong to
artists. All of them are annotated with genre tags.
– Rich meta data: over 900 genres
– Fine temporal resolution: no previous challenge provided time in
addition to date.
• For the project, you will be provided with a small subset of the data
and we will held a mini internal competition to determine which
group obtained the best results.
KDDCUP 2011: Task 1
• The test set consists of hold out ratings from users in the
training set. Each rating is time stamped.
• In the test set, you are given who rated which items at what
time.
• You are asked to predict the rating scores.
• Closely related to Netflix competition, but may require time
of day effect consideration.
• References:
– Koren. Matrix Factorization Techniques for Recommender
Systems. (IEEE Computer 2009)
– Koren. Collaborative Filtering with Temporal Dynamics (KDD’09)
– Xiong. Time-Evolving Collaborative Filtering (SDM’10)
– Liu. Online Evolutionary Collaborative Filtering (RECSYS’10)
KDDCUP 2011: Task 2
• The test set consists of hold out ratings from users in the training
set. Time has been removed.
• In the test set, you are given 6 items for each user.
• You are asked to predict which 3 of the 6 are actually rated by the
user.
• Closely related to KDDCUP 2007 “who rated what” and CAMRA2010
weekly recommendation track
• References:
– Hu. Collaborative Filtering for Implicit Feedback Datasets (ICDM’08)
– Rendle. Bayesian Personalized Ranking from Implicit Feedback (UAI’09)
– Cremonesi. Performance of Recommender Algorithms on Top-N
Recommendation Tasks (RECSYS’10)
– Steck. Training and Testing of Recommender Systems on Data Missing
Not at Random (KDD’10)
For The Project
• We will extract a subset for you to work on.
• We will provide some basic algorithms.
• You can choose to work on one of the two
tasks.
• The minimum requirement is that you should
run thorough experiments with the provided
algorithms and write a report on your findings
about different algorithms.
• There are also new things to try….
Things to Try (1): Ensemble
• Same algorithm different parameter settings
• Different algorithms
• Stacking:
– What meta learner? Gradient Boosted Decision Tree,
Linear Regression
– Any meta features? Tail vs. Head segmentation strategy
• References:
– Bao et. al. Stacking Recommendation Engines with
Additional Meta-Features (RECSYS’09)
– Jahrer et. al. Combining Predictions for Accurate
Recommender Systems (KDD’10)
Things to Try (3): Exploiting Item
Relations and Genres
• From social network of users to networks of items.
• Combining collaborative filtering with genre based
prediction for alleviating sparseness.
• References:
– Ma. Recommender Systems with Social Regularization
(WSDM’11)
– Agarwal. Regression based Latent Factor Models (KDD’09)
– Popescul. Probabilistic Models for Unified Collaborative
and Content-based Recommendation in sparse-data
environments (UAI’01)
– Gunawardana. Tied Boltzman Machines for Cold Start
Recommendations (RecSys’08)
Things to Try (2): Temporal Dynamics
• Various possible types of temporal dynamics:
– Long term effect: people getting pickier over time
– Short term effect: festival mood
– Time of day effect: day time vs. night time
preference
– Periodicity: every Friday night is party time
• References:
• Koren. Collaborative Filtering with Temporal Dynamics
(KDD’09)