Recommendations at Spotify v4

Download Report

Transcript Recommendations at Spotify v4

Music recommendations at Spotify
Erik Bernhardsson
[email protected]
Spotify
-
Launched in 2009
Available in 17 countries
20M active users, 5M paying subscribers
Peak at 5k tracks/s, 1M logged in users
20M tracks
Some applications
Recommendation stuff at Spotify
-
Related artists:
Recommendation stuff at Spotify, cont…
More!
How can we find music?
Recommendations
-
Manual classification
Feature extraction
Social media analysis, web scraping, metadata based
Collaborative filtering
Pandora & Music Genome Project
-
Classifies tracks in terms of 400 attributes
Each track takes 20-30 minutes to classify
A distance function finds similar tracks
-
“Subtle use of strings”
“Epic buildup”
“Acid Jazz roots”
“Beats made for dancing”
“Trippy soundscapes”
“Great trombone solo”
…
Scraping the web is another approach
Feature extraction
Collaborative filtering
Idea:
- If two movies x, y get similar ratings then they are probably similar
- If a lot of users all listen to tracks x, y, z, then those tracks are
probably similar
Collaborative filtering
Get data
… lots of data
Aggregate data
Throw away temporal information and just look at the number of times
OK, so now we have a big matrix
… very big matrix
Supervised collaborative filtering is pretty much matrix completion
Supervised learning: Matrix completion
Supervised: evaluating rec quality
Unsupervised learning
-
Trying to estimate the density
i.e. predict probability of future events
Try to predict the future given the past
How can we find similar items
We can calculate correlation coefficient as an item similarity
-
Use something like Pearson, Jaccard, …
Amazon did this for “customers who bought this also bought”
-
US patent 7113917
Parallelization is hard though
Can speed this up using various LSH tricks
-
Twitter: Dimension Independent Similarity Computation (DISCO)
Are there other approaches?
Natural Language Processing has a lot of similar problems
…matrix factorization is one idea
Matrix factorization
Matrix factorization
-
Want to get user vectors and item vectors
Assume f latent factors (dimensions) for each user/item
Probabilistic Latent Semantic Analysis (PLSA)
-
Hofmann, 1999
Also called PLSI
PLSA, cont.
+ a bunch of constraints:
PLSA, cont.
Optimization problem: maximize log-likelihood
PLSA, cont.
“Collaborative Filtering for Implicit Feedback Datasets”
-
Hu, Koren, Volinsky (2008)
“Collaborative Filtering for Implicit Feedback Datasets”, cont.
Here is another method we use
What happens each iteration
-
Assign all latent vectors small random values
Perform gradient ascent to optimize log-likelihood
Calculate derivative and do gradient ascent
-
Assign all latent vectors small random values
Perform gradient ascent to optimize log-likelihood
2D iteration example
Vectors are pretty nice because things are now super fast
-
User-item score is a dot product:
-
Item-item similarity score is a cosine similarity:
-
Both cases have trivial complexity in the number of factors f:
Example: item similarity as a cosine of vectors
Two dimensional example for tracks
We can rank all tracks by the user’s vector
So how do we implement this?
Hadoop at Spotify
One iteration of a matrix factorization algorithm
“Google News personalization: scalable online collaborative filtering”
So now we solved the problem of recommendations right?
Actually what we really want is to apply it to other domains
Radio
-
Artist radio: find related tracks
Optimize ensemble model based on skip/thumbs data
Learning from feedback is actually pretty hard
A/B testing
More applications!!!
Last but not least: we’re hiring!
Thank you