Slide 1 - COW :: Ceng

Download Report

Transcript Slide 1 - COW :: Ceng

Recommendation Systems
ARGEDOR
•
•
•
•
Introduction
Sample Data
Tools
Cases
Introduction
• Recommender systems reduce information overload by estimating
relevance.
Which artist should I listen based on my preferences? What is the best holiday for me and
my family? Which movie should I watch? Which web sites will I find interesting?
Which book should I buy for my next vacation?
Personalized Recommendation
Collaborative Filtering
• Collaborative: "Tell me what's popular among
my peers"
Content Based Recommendation
• Content-based: "Show me more of the same
what I've liked"
Knowledge Based Recommendation
• Knowledge-based: "Tell me what fits based on
my needs"
Hybrid Models
Hybrid: combinations of various inputs
and/or composition of different
mechanism
Sample Data: MovieLens 1M dataset
Content Data Item
movies.dat
MovieID::Title::Genres
1::Toy Story (1995)::Animation|Children's|Comedy
Sometimes content data contains time of creation
for the content.
(http://www.grouplens.org/datasets/movielens/ )
Sample Data: MovieLens 1M dataset
Content Data User
users.dat
UserID::Gender::Age::Occupation::Zip-code
1::F::1::10::48067
!! Since this data anonymized no user name
related information.
Sample Data:TTNet Music
TTNet Music User Rating Logs
userId,songId,albumId,artistId,timeofaction,ratingValue
,channel
2330295,3313069,286068,546697,2013-03-26 15:17:49,0.9,SI
– Rating value is a derived value obtained by a formula
depending on user’s actions(listened,downloaded,listened
before etc)
– For TTNET music recommendation engine we have
approximately 1 million unique user action logs daily.
– Stored on distributed file system. Used for collaborative
filtering.
User Profiling
Content data
Age: 18
Gender: F
Occupation: 45674
User’s Ratings
Item1:3
Item2:5
User profiling enables weighting of similarity metrics
Context Awareness
location
weather
Context
mood
time of day
season
Context information is taken into account when generating recommendations
Tools
Apache Mahout (http://mahout.apache.org/)
Open Source machine learning library for large
scale applications
– Classification(Complementary Naive Bayes classifier,
Random forest decision tree based classifier)
– Clustering(K-Means, Fuzzy K-Means clustering)
– Collaborative filtering ,User based ,Item based
recommendations.
Tools
hadoop.apache.org
Open source distributed file system.
– Large Scale DBMS runs on Hadoop file system.
Tools
http://www.neo4j.org/
Open source graph database.
–
–
–
–
Storage for highly connected data
Fast query response for large scale databases
Most graph traversal algoritms implemented.
Instead of scaning whole database just visit connected
parts.
– Collaborative filtering data model is possible with Neo4j.
– Used for Content based music recommendation projects
of ARGEDOR
Example:Movie Graph DB Relations
Both content data and user actions are stored on graph db
https://github.com/neo4j-examples/cineasts