Waikato CS 2007 Progress Presentation - Graph-RAT

Download Report

Transcript Waikato CS 2007 Progress Presentation - Graph-RAT

Music Recommendation
By Daniel McEnnis
Outline
• Sociology of Music Recommendation
• Infrastructure
– Relational Analysis Toolkit
• Description
• Evaluation
– GATE and Review Mining
Why do we like what we like?
• Personal Identity and Music
– Music and Lifestyle Correlations
• Social Associations
• Peer Groups
• Content of the Music and Lyrics
– Culture specific understanding of music
– Social meanings of musical forms
– Ability to understand the lyrics
Social Networks and Music
Recommendation
• What is social information?
– Age and personal collections/preferences
– Friends’ musical tastes
– Opinions of local associations or groups
– Local (geographical) opinions about music
– Cultural background of the person
Where is the Data?
• Play-lists, personal
music collections, and
recorded listening
habits
• Social network sites
such as Facebook and
Live Journal
• Web sites such as blogs
and lists of favorite web
pages
• Relationships between
these artifacts
What Infrastructure is Needed
• Toolkit for synthesizing
social data.
• Text mining tools for
analyzing web-pages,
music reviews, and
blogs.
• Play-list analyzers
• Content-based music
analysis toolkits
Social Toolkit Requirements
• Intuitive Java-Based Graph Toolkit
• Arbitrary multi-valued properties on
nodes
• Social network analysis algorithms
• Efficient back-end processing
• Scripting support for experiments
Relational Analysis Toolkit
(RAT)
• Low Level
– Graph
– Actor (Node)
– Link (Arc, Edge)
• High Level
– Collection of
algorithms
– Scripting support
Exponential Similarity
-1.3k
k
+2
0
Music Recommendation
Degree Centrality
Djikstra Shortest paths
• Djikstra’s shortest path algorithm over
this graph. Closeness measures are
stored in a Path object cached at the
graph object.
• Optimized version used inside
Closeness and Betweeness for
performance reasons.
Closeness Centrality
Betweeness Prestige
Page Rank
Kleinberg’s HITS
• Generates a set of ‘hubs’ (central
actors) and ‘authorities’ (prestigious
actors).
• Intuitively good hubs (User) point
(Knows) to good authorities (User) and
vice versa.
• Implemented in naïve and optimized
versions.
Clique Definition
Evaluation
• How well can this method recreate a
persons list of liked music
• 4% average precision
• 16% average recall
• Standard deviation > 100 for both
– Sometimes it works really well, but often
doesn’t
Weka in RAT
Artist-User
User Music Beatles BeachBoys Monkeys Metallica ListensTo
Beatles-A
B
C
E
+
0
T
F
F
T
T
F
F
T
F
F
F
F
T
T
T
Beach Boys-A
B
C
E
+
0
T
F
F
T
T
F
F
T
F
F
F
F
F
F
F
Monkeys-A
B
C
E
+
0
T
F
F
T
T
F
F
T
F
F
F
F
F
F
F
Metallica-A
B
C
E
+
0
T
F
F
T
T
F
F
T
F
F
F
F
F
F
F
Weka Evaluation
Same data as Ad Hoc algorithm
J48 Classifier
• 1% Precision
• 62% Recall
More coming….
Music Reviews - epinions.com
• Uses GATE Parts of
speech analyzer
• Predicting
positive/negative
reviews
• Useful for tag
extraction
• Negation problems
Conclusions
• Social information is important for music
recommendation
• RAT has centrality algorithms, but
requires more clustering and learning
algorithms
• Music review mining ready for
integration into the RAT environment
Future Work
•
•
•
•
•
•
Evaluate with more Weka algorithms
Implement graph-based clustering algorithms
Implement other distance measures
Implement blog and web-page text mining
Integrate existing content based methods
Evaluate results with a user study
Questions?