presentation

Download Report

Transcript presentation

The Greek Audio Dataset
Dimos Makris, Katia Lida
Kermanidis, and Ioannis Karydis
Dept. Of Informatics, Ionian University
Music Information Retrieval
• Musical data
▫ acoustic, i.e. sound recordings
▫ symbolic, i.e. sheet music
▫ associated information to the musical content
 metadata, social tags
• Required for testing
▫ efficiency & effectiveness of the methods
▫ comparison of existing methods
 show improvement
Nature of music
• Highly artistic
▫ ornamentation
▫ personal expression during performance
▫ adaptation
• Local music
▫ numerous differences from pop mainstream
 different instruments & rhythms
• Methods’ application results
▫ not always intuitive
• MIR methods
▫ require all kinds of music
Existing datasets
Intellectual property
issues
The Greek Audio Dataset
• Freely-available collection of Greek musical data
▫ for the purposes of MIR
• Each song contains
▫ audio features
 immediate use in MIR tasks
▫ lyrics of song
▫ manually annotated mood & genre labels
▫ YouTube link
 further feature extraction
Greek music
• Greek musical tradition
▫ diverse and celebrated
• Greek contemporary music
▫ Greek traditional music
▫ Byzantine music
• Greek traditional (folk) music
▫ Combination of songs, tempos and rhythms from
a litany of Greek regions
▫ basis for the Modern Greek traditional music
scene
Dataset creation process
• Selection of the music tracks
▫ aiming to make the set balanced
• Sources from
▫ personal CD collections
• Audio feature extraction
▫ jAudio
• Lyrics
▫ various sources
• YouTube link selection criteria
▫ number of views, number of responses, best audio
quality, audio similarity to CD
Genre classification
• Greek musical culture oriented tags
▫ Rembetiko, Laiko, Entexno, Modern Laiko, Rock,
Class
# of tracks
Hip-Hop/R & B, Pop,
Rempetiko
65
Alternative
• Genre assignment
▫ Listening tests per song
Laiko
186
Entexno
195
Modern laiko
175
Rock
195
Hip/hop
60
Pop
63
Alternative
61
Mood classification
• 5 annotators per song
• Thayer model - 16 Mood taxonomies
▫ valence & arousal
▫ 2-dimensional emotive plane into 4 parts by having
positive/high and negative/ low values respectively
• Arousal -> linked to energy
▫ moods range from “angry” & “exciting” to “tired” &
“serene”
• Valence -> linked to tension
▫ moods range from “sad” & “upset” to “happy” &
“content”
GAD content
• 1000 songs
• For each song
▫ its lyrics
▫ a YouTube link
• A total of 277 unique artists
• The accumulated lyrics contain:
▫ 32024 lines
▫ 143003 words
▫ 1397244 characters
GAD availability
• http://di.ionio.gr/hilab/gad
• two formats
• HDF5
▫ efficient for handling the heterogeneous types of
information
 audio features in variable array lengths
 names as strings
 easy for adding new type of features
• CSV
▫ compatible for processing with
 Weka, RapidMiner & similar data mining platforms
Acoustic features
• Based on timbre, rhythm & pitch.
• Includes derived features
▫ application of meta-features to primary features
• Timbral Texture Features
▫ used to differentiate mixture of sounds based on their
instrumental compositions
▫ FFT, MFCC, Spectrum, Method of Moments (MoM)
• Rhythm Features:
▫ used to characterize regularity of rhythm, beat, tempo
▫ Beat, Freq, Beat Histogram
• Pitch Content Features:
▫ describe the distribution of pitches
▫ Linear Predictive Coding (LPC)
Future Direction of the Dataset
• Inclusion of user generated tags
▫ from tagging games or web-services
• Increase of labels for mood and genre
▫ more users
• Expansion of the number of songs
▫ include more & latest top-chart songs
• Genres’ refinement
▫ addition of detailed labels with descriptions
• Content balancing
▫ in terms of moods and/or genres,
• Inclusion of scores
• Development of programming language wrappers
The Greek Audio Dataset
Thank you for your attention