Predicting Popular Music using data mining and machine learning

Download Report

Transcript Predicting Popular Music using data mining and machine learning

MACHINE LEARNING
TECHNIQUES FOR
MUSIC PREDICTION
S. Grant Lowe
Advisor: Prof. Nick Webb
RESEARCH QUESTIONS
Can we predict the year in which a
song was released?
Can we predict the genre of a song?
Can we identify which attributes are
the strongest in answering these
questions?
BACKGROUND
Hit Song Science
Genre Classification
Year Prediction
APPROACH
Use WEKA
Use the Million Song Dataset
WEKA
Machine Learning Software
Contains Visualization tools and algorithms
for data analysis and modeling
DATA
Million Song Data Set: commercial tracks
from 1922-2011,collected by LabROSA
EARLY CHALLENGES
Data in the wrong Format: HDF5 vs CSV
Lots of missing Data!
Almost half of the songs are missing year, a very
important attribute
Many attributes are being ignored because a
majority of the songs are missing data.
ArtistID -> Year?
ATTRIBUTES
 The MSD contains 53 descriptive attributes for each song, along with 90
timbre attributes. Attributes were removed if they were not good
indicators of release year or genre, or if they were too closely tied to
what was being classified.
ATTRIBUTE MOTIVATION
 Ranked Descriptive Attributes
•
•
•
•
•
•
Loudness (measured in decibels)
Duration (in seconds)
Tempo (estimated tempo in BPM)
Time Signature (estimated beats per bar)
Key
Mode (major or minor)
 Timbre is the quality of a musical note or sound that distinguishes different types of
musical instruments, or voices. It is a complex notion also referred to as sound
color, texture, or tone quality, and is derived from the shape of a segment’s spectrotemporal surface, independently of pitch and loudness.
EARLY RESULTS – DESCRIPTIVE
ATTRIBUTES
 Discretized into 6 decades; 1960-1970, 1970-1980, etc.
 Baseline (Chance selection): 16.67%
 First Tests: 6-9% correctly classified
 More recent Tests: 25-30%
 Why Random Forest and BayesNet?
EARLY RESULTS
TIMBRE RESULTS
GENRE PREDICTION
 Genres:
 Classic pop and rock
 Classical
 Dance and Electronica
 Folk
 Hip-Hop
 Jazz
 Metal
 Pop
 Rock and Indie
 Soul and Reggae
GENRE PREDICTION RESULTS
CONCLUSIONS & FUTURE WORK
 Timbre Attributes are better than Descriptive Attributes – Why?
 Taste Profile
 Lyrical/Emotional Content
 Tag Dataset
QUESTIONS?