D13. Folk analysis internal report april 13

Download Report

Transcript D13. Folk analysis internal report april 13

Computational analysis
on folk music of Cyprus
Internal report
Andreas Neocleous
University of Groningen
University of Cyprus
April 2013
Folk music classification - predict the class label of an unseen folk tune
[1] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2009
Objective of the study:
•Global features models and Event features models for the
task of folk song classification.
•Conclusions on the robustness of each feature model
A global feature set summarizes a piece as a feature
vector, which can be viewed as a data point in a
feature space.
Folk music classification - predict the class label of an unseen folk tune
[1] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2009
Global features:
•Alicante set of 28 global features – selected 12 features.
•92 features computed by the program called Feature ANalysis
Technology Accessing STatistics [2] – selected 37 features.
•The Jesser set - 40 pitch and duration statistics [3].
•The McKay set of 101 global features,developed for the
classification of orchestrated MIDI files [4].
Folk music classification - predict the class label of an unseen folk tune
[1] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2009
Event features:
Excerpt of the Scottish jig “With a hundred pipers”,
illustrating the difference between global features
and event features.
Folk music classification - predict the class label of an unseen folk tune
[1] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2009
Event features:
Classification with n-gram models
Used in probability, communication theory, computational linguistics
1) The probability of a piece el  [e1,..., el ] is obtained by
computing the joint probability of the individual events in the piece:
 
l


p el   p ei | ei  1
i 1
2) For each class a separate model is built.
3) The predicted class of a piece is the class whose model generates
the piece with the highest probability.
Folk music classification - predict the class label of an unseen folk tune
[1] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2009
Europa-6 collection
Folk music classification - predict the class label of an unseen folk tune
[1] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2009
Classification accuracies of the global feature sets on the Europa-6 collection,
obtained by 10-fold cross validation.
With a pentagram model of a linked viewpoint of melodic
interval and duration, the obtained classification accuracy
is 72.7%
Folk music classification - predict the class label of an unseen folk tune
[5] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2012
Objective of the study:
•This investigation of the performance of three string
methods
•Compare the performance of the string methods with the
global feature models and event feature models
•Conclusions on the robustness of each feature model
String methods rely on a sequential music representation
which views a piece as a string of symbols. A pairwise
similarity measure between the strings is computed and
used to classify unlabeled pieces.
Folk music classification - predict the class label of an unseen folk tune
[5] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2012
Excerpt of the Scottish jig “With a hundred pipers”, illustrating the
difference between global features, event features and the string
representation.
Folk music classification - predict the class label of an unseen folk tune
[5] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2012
String methods:
(1) Sequence alignment
• Estimation of the minimal cost of a
transformation of one sequence into the other by
means of edit operations, such as substition,
insertion and deletion.
• Often referred to as “edit distance”, which is in
fact the Levenshtein distance.
Folk music classification - predict the class label of an unseen folk tune
[5] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2012
String methods:
(2) Compression based distance
d  x, y  
max  K  x | y  , K  y | x  
max  K  x  , K  y  
K(x) is the Kolmogorov complexity of string x
K(x|y) is the conditional complexity of string x given string y.
How much information is not shared between the two strings
relatively to the information that they could maximally share.
Folk music classification - predict the class label of an unseen folk tune
[5] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2012
String methods:
(3) String subsequence kernel (SSK)
Computes a similarity measure between strings based on
the number and form of their common subsequences.
Given any pair of two strings, SSK will find all common
subsequences of a specified length k, also allowing noncontiguous matches, although these are penalized with a
decay factor   1,0 .
Folk music classification - predict the class label of an unseen folk tune
[5] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2012
String methods:
(3) String subsequence kernel (SSK)
SSK(k = 2, ‘ismir’,‘music’) = λ^5 + λ^6,
Folk music classification - predict the class label of an unseen folk tune
[5] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2012
Dance-9 collection:
Folk music classification - predict the class label of an unseen folk tune
[5] Ruben Hillewaere, Bernard Manderick, Darrell Conklin, 2012
Results:
Folk songs
Repeating parts - stanzas
Segmentation
Popular music
Folk music
+
-
+
• Professional
productions
• Complex
structure
• Similar stanzas
• Repetitions
-intro
-Verse
-Bridge
-chorus
• Inaccurate singing
of performers
• Variable tempo
throughout the song
• Presence of noise
• Forget parts of
lyrics or melody
• Switch to speaking
Finding repeating stanzas in folk songs
[6] Bohak C. and Marolt M., 2012
Finding repeating stanzas in folk songs
[6] Bohak C. and Marolt M., 2012
Preprocessing
Detecting vocal pauses
According to signal energy
According to signal envelope
According to relative difference of pitch
Finding repeating stanzas in folk songs
[6] Bohak C. and Marolt M., 2012
Preprocessing
• The input audio signal is mixed from stereo
to a single channel
• The sample rate reduced to 11025 Hz
• The amplitude is normalized
Finding repeating stanzas in folk songs
[6] Bohak C. and Marolt M., 2012
Preprocessing
Detecting vocal pauses
According to signal energy
• Energy is below an experimentally determined threshold.
• Energy is computed on 200 ms long frames and the
threshold is set to 1 
E
,
120
where E is the average energy of the signal.
• Consequent frames with energy values below the specified
threshold are merged into one vocal pause.
• Vocal pauses shorter than 2  0.7 times the average
detected vocal pause length are ignored
Parameters ξ1 and ξ2 were determined experimentally
Finding repeating stanzas in folk songs
[6] Bohak C. and Marolt M., 2012
Preprocessing
Detecting vocal pauses
According to signal envelope
•The amplitude envelope of a signal is obtained by
filtering the full-wave rectified signal using 4th order
Butterworth filter with a normalized cutoff frequency of
0:001
•Vocal pauses are parts of the signal where the
envelope falls below the threshold ξ3 = -60dB
Finding repeating stanzas in folk songs
[6] Bohak C. and Marolt M., 2012
Preprocessing
Detecting vocal pauses According to relative difference of pitch
•Detection of fundamental frequency (YIN algorithm [7]).
•Smooth fundamental frequencies with a low-pass filter.
•Parts of the signal that differ more than 20 semitones from the
average signal frequency are selected as vocal pauses.
•endings of vocal pauses are used as candidates for stanza
Beginnings.
Finding repeating stanzas in folk songs
[6] Bohak C. and Marolt M., 2012
Finding candidates for stanza boundaries
• Calculate 12 dimensional chromagrams
• A distance function between each pair of 12 dimensional
chroma vectors (RMS) distance
 a  b  
2
c  a, b  
i
i
i
12
Where:
c is the distance function between two chroma vectors
a and b, ai and bi are i-th elements of chroma vectors
Finding repeating stanzas in folk songs
[6] Bohak C. and Marolt M., 2012
Finding candidates for stanza boundaries
The defined distance function is used by the Dynamic Time
Warping (DTW) algorithm for calculation of the total distance
between the selected stanzas as:
L
c p  p1 , p2    c  p1  l  , p2  l  
l 1
Where:
p1 and p2 are candidate stanza beginnings.
p1(l) and p2(l) are the corresponding chroma vectors
The index l takes values from the first (1) to the last (L)
chroma vector in the selected audio part.
Finding repeating stanzas in folk songs
[6] Bohak C. and Marolt M., 2012
Finding candidates for stanza boundaries
The DTW is used for calculating the total distance
between two stanza candidates:


cmin  d j   DTW  d 0 , d j   min c p  d 0 , d j 
where
cmin is the minimal cost between parts d0 and dj .
Finding repeating stanzas in folk songs
[6] Bohak C. and Marolt M., 2012
Finding candidates for stanza boundaries
The chroma vectors are circularly shifted up to two
semitones up and down to compensate for the outof-tune
singing. We then select the lowest DTW distance as:
distmin  d 0   0,
distmin  d j  
min
f
d j , f [ 2,2]


cmin  d 0 , d jf 
where
d jf represents a rotation of chroma vectors for the
selected stanza candidate from two semitones downwards
to two semitones upwards in steps of one semitone.
Finding repeating stanzas in folk songs
[6] Bohak C. and Marolt M., 2012
Finding candidates for stanza boundaries
Define a fitness function for scoring the candidate stanza
beginnings ki as:
Finding repeating stanzas in folk songs
[6] Bohak C. and Marolt M., 2012
Finding candidates for stanza boundaries
In the defined fitness function, peaks represent the most
likely stanza beginnings, so all peaks above a global
threshold, corresponding to the average value of the
fitness function, are picked as the actual boundaries
between stanzas.
Finding repeating stanzas in folk songs
[6] Bohak C. and Marolt M., 2012
Music of Cyprus
Fones
•
•
•
•
•
•
•
•
•
•
•
Κarpasitissa
Αvgoritissa
Paphididji
Lyshiotissa
Μariniotou
Τyllirkotissa
Ishia
Κomitissa
Αkathkiotissa
Nekalisti
Pegiotoua
Dances
•
•
•
•
•
Zeimpekikos
Kartsilamas
Kalamatianos
Syrtos
Arapies
Religious
Weak category
with no
sub-categories
Music of Cyprus
Preprocessing
• Fundamental frequency detection (YIN).
• Eliminate noise with an aperiodicity threshold.
• Eliminate silence with a loudness threshold.
• Octave/fifth errors: A common problem of frequency
detection algorithms is the wrong octave detection, also
referred to as octave errors, which implies that the
fundamental frequency is confused with its multiples and/or
other harmonics. To correct these errors a moving window
was applied to detect and correct unexpected melodic
jumps in the estimated pitch trajectory.
• Smoothing
Music of Cyprus
Preprocessing
Pitch track before pre-processing
Music of Cyprus
Preprocessing
Pitch track after pre-processing
Music of Cyprus
Segmentation
Detection of vocal pauses
Music of Cyprus
Segmentation
Detection of all peaks
Music of Cyprus
Segmentation
Detection of notes based on the difference of the peaks
Music of Cyprus
Repetition
[1] Hillewaere R., Manderick B., Conklin D., Global feature versus event
models for folk song classification. 10th International Society for Music
Information Retrieval Conference (ISMIR), 2009.
[2] D. Mullensiefen: FANTASTIC: Feature ANalysis Technology
Accessing STatistics (In a Corpus): Technical Report v0.9, 2009.
[3] B. Jesser: Interaktive Melodieanalyse, Peter Lang, Bern, 1991.
[4] C. McKay and I., Fujinaga., Automatic genre classification using large
high-level musical feature sets. Proceedings of the International Conference on
Music Information Retrieval, pp. 525–530, 2004.
[5] Hillewaere R., Manderick B., Conklin D., String methods for folk tune genre
Classification. 13th International Society for Music Information Retrieval Conference
(ISMIR), 2012.
[6] Bohak C. and Marolt M., Finding repeating stanzas in folk songs.
13th International Society for Music Information Retrieval Conference
(ISMIR), 2012.
[7] Cheveigne A. and a Kawahara H. YIN, a fundamental frequency estimator
for speech and music. The Journal of the Acoustical Society of America,
111(4):1917–1930, 2002.