D6_Electronic_flyer

Download Report

Transcript D6_Electronic_flyer

Intelligent system for the identification of similarities and differences between
the Greek-Cypriot, Greek, Turkish-Cypriot, and Turkish folk music
Activities:
• A web platform was created to the internet for the information of the community regarding the project objectives, news, latest developments and result. http://www.cs.ucy.ac.cy/folk/
• Six meetings were organized by the Project Coordinator and the members of the consortium of the project in order to for the organization and the monitoring of the project.
• Collaboration between prof. Nicolai Petkov (University of Groningen) and prof. Andreas Spanias (Arizona state University) was confirmed.
• Frequent meetings were held with the Cypriot music expert and member of our project team, Michalis Terlikkas where he provided wealth of material he collected during his research in the
previous years.
• Meetings were held with the Cypriot musician Giannis Zavros where he performed traditional tunes on the lute, violin and pithkiavli. Folk musician Andreas Gristakkos was recorded and 20 GreekCypriot folk songs were isolated.
• A meeting with Alev Muezzinoglu, a musicologist and musician from the Turkish-Cypriot community was organized.
• The state of the art on signal processing techniques as well as computational intelligence techniques for music analysis and classification were studied in an internal report and presented to the
members of the group.
• We thoroughly studied the development of a system that will identify repeated patterns in the music of Cyprus (COSFIRE).
• We studied and implemented mid-level and specific to “fones” features that describe the melodies based on several mathematical functions.
• A system that identifies the positions in a song where only instruments are present and the positions that there is a singing voice was developed.
• The findings of our studies were presented to the scientific community in to several conferences and workshops. The published papers and presentations are listed in the project’s website.
Publications:
[1] Neocleous A., Petkov N., Schizas C., “Review of Signal processing methods and techniques for feature extraction in music”, 5th Cyprus Workshop on Signal processing and informatics”,
Nicosia, Cyprus, 2012.
http://dsp-conferences.info/cyprus%20workshop-2012-index.htm
[2] Neocleous A., Panteli M., Petkov N., Schizas C., “Identification of Similarities between the Turkish Makam Scales and the Cypriot Folk Music”, HELINA's (Hellenic Institute of Acoustics) 5th
National conference, Corfu, Greece, 2012.
http://conferences.ionio.gr/acoustics2012/en/
[3] Neocleous A., Panteli M., Petkov N., Schizas C., Timbre and tonal similarities between the turkish, western and cypriot monophonic songs using machine learning techniques”, 3rd
international workshop on Folk Music Analysis, Amsterdam, Netherlands, 2013.
http://www.elab-oralculture.nl/fma2013/
[4] Neocleous A., Petkov N., Schizas C., “Finding repeating stanzas in monophonic folk songs of Cyprus”, 6th Cyprus workshop on signal processing and informatics”, 2013.
https://cwspi2012.cs.ucy.ac.cy/
[5] Neocleous A., Panteli M., Ioannou R., Petkov N., Schizas C., “A machine learning approach for clustering western and non-western folk music using low-level and mid-level features”, 6th
International Workshop on Machine Learning and Music, Prague, Czech Republic 2013.
https://sites.google.com/site/musicmachinelearning13/
[6] Neocleous A., Petkov N., Schizas C., “Automated Classification in Vocal/Instrumental parts of Folk Songs”, 7nth Cyprus workshop on signal processing and informatics”, 2014.
https://cwspi.cs.ucy.ac.cy
[7] Neocleous A., Petkov N., Schizas C., “Automated Segmentation of Folk Songs Using Artificial Neural Networks”, 6th international conference in neural computation theory and applications,
Rome 2014.
Deliverables:
D1. First Six-Month Progress Report
D2. Second Six-Month Progress Report
D3. Interim Progress Report
D4. Third Six-Month Progress Report
D5. Fourth Six-Month Progress Report
D6. The present electronic leaflet
D14. Computational intelligent system for folk music identification
D15. Application of the new computational intelligent system for folk music identification
•A state of the art signal processing techniques for music analysis facilitated to the research in developing an intelligent system for the identification of similarities and differences between
the Greek-Cypriot, Greek, Turkish-Cypriot and Turkish folk music was presented to the 5th Cyprus workshop on signal processing and informatics (2012). Nicosia, Cyprus,
http://dsp-conferences.info/cyprus%20workshop-2012-index.htm
•A study comparing the pitch histograms of 20 Greek-Cypriot folk songs with the pitch histograms of 53 Western and the pitch histograms of 53 Turkish songs was done and a paper
describing the methods and results of this work was accepted in the 6th national conference ACOUSTICS 2012 (http://conferences.ionio.gr/acoustics2012/en/) which will take place in Corfu
(8-10 October 2012).
•We studied timbre similarities/dissimilarities between music of Cyprus and western music and music of Cyprus and Turkish music using timbre low level features. An extended abstract of
this work has been submitted in to the International Workshop on Folk Music Analysis that took place in Amsterdam on June 6 and 7, 2013 and accepted for poster presentation.
http://www.elab-oralculture.nl/fma2013/
•An algorithm that identifies repeating patterns in monophonic vocal songs was developed and presented in the 6th Cyprus workshop on signal processing and informatics (2012). Nicosia,
Cyprus,
https://cwspi2012.cs.ucy.ac.cy/
•We studied timbre similarities/dissimilarities between music of Cyprus and western music and music of Cyprus and Turkish music using timbre low level features. An extended abstract of
this work has been submitted in to the International Workshop on Folk Music Analysis that took place in Amsterdam on June 6 and 7, 2013 and accepted for poster presentation.
http://www.elab-oralculture.nl/fma2013/
•An algorithm that identifies repeating patterns in monophonic vocal songs was developed and presented in the 6th Cyprus workshop on signal processing and informatics (2012). Nicosia,
Cyprus,
https://cwspi2012.cs.ucy.ac.cy/
•We studied timbre similarities/dissimilarities between music of Cyprus and western music and music of Cyprus and Turkish music using timbre low level features. An extended abstract of
this work has been submitted in to the International Workshop on Folk Music Analysis that took place in Amsterdam on June 6 and 7, 2013 and accepted for poster presentation.
http://www.elab-oralculture.nl/fma2013/
•An algorithm that identifies repeating patterns in monophonic vocal songs was developed and presented in the 6th Cyprus workshop on signal processing and informatics (2012). Nicosia,
Cyprus,
https://cwspi2012.cs.ucy.ac.cy/
This research was funded by the research grant ΑΝΘΡΩΠΙΣΤΙΚΕΣ/ΑΝΘΡΩ/0311(ΒΕ)/19 from the Republic of Cyprus through the Cyprus research promotion foundation. The research was also
supported by the University of Cyprus.
Identification of Similarities between the Turkish Makam Scales
and the Cypriot Folk Music
Introduction:
Pitch histograms alignment to the highest similarity bin:
In western music, the separation of an octave into notes is done by dividing its frequency
range into 12 equal intervals in a logarithmic scale. The boundaries of these intervals in
music are called semi-tones. In non-western scales and more specifically in makam scales,
the octave is separated into 53 equal distances. The aim of this work is mainly to explore
whether there is similarity between Cypriot folk songs and Turkish folk songs in terms of
tonality. We used pitch histograms for identifying the most common frequencies played
in each song we have tested. Pitch histograms were created for 20 Cypriot folk songs, 53
Turkish songs and 53 western songs. Correlation coefficients distance based technique
has been used for measuring the closest distance between the Cypriot pitch histograms
and the Turkish and Western histograms. The database used for our investigations
showed that 80% of the comparisons between Cypriot and Turkish histograms had more
than 50% similarity.
An evaluation of a system that identifies similarities and dissimilarities in tonality between two
songs, should consider two histograms on whether they are “very similar” if the two histograms
are initially extracted from the same song or melody, played in different octaves. In order to
achieve this, the second pitch histogram must be shifted to the bin that has the highest similarity
with the first pitch histogram. The histograms in this case have to be in the logarithm space in
order to keep equal frequency distances between tones while shifting the histogram.
A systematic calculation of the distance between two histograms is applied for all the possible
correlations of the two histograms by shifting the histogram to be compared from bin one to the
last bin. Figure 2 shows a diagram of the procedure followed for the similarity measure between
two Cypriot and Western/Turkish histograms. After the calculation of the similarity measure
between two histograms when the second histogram is shifted to the last bin, the western /
Turkish histogram is shifted to the bin of the highest correlation and the value of the similarity
measure in this bin is considered to be the degree of similarity.
Data:
The dataset was consisted by a collection of 120 monophonic songs. 20 songs were
Cypriot folk songs, 53 songs were Turkish “makams” and 53 songs were western
monophonic songs and solo improvisations. 17 Cypriot folk songs were digitally recorded
with professional audio equipment, as expressed by a Cypriot folk musician using his selfmade flutes called “pithkiavli”. The 53 makam songs were consisted by six makams with 9
Hicaz, 9 Huseiny, 7 Huzzam, 8 Nihavend, 12 Saba, and 8 Ussak. The 53 western songs
were consisted by songs and solo improvisations with flute, bassoon, clarinet, oboe and
saxophone. The songs that were played with flute were 4 movements from partita for
solo flute by Bach, 7 pieces for solo flute by Leigh Landy, 12 fantasias for solo flute by
Telemann, the song “Syrinx” of Debussy, the song “Soliloquy for Solo Flute Op. 44” by
Lowell Liebermann, the song “Image for solo flute” by Bozza, the song “Danse de la
chevre” by Arthur Honegger, the song “Tango Etude” by Piazzolla, the song Daphnis et
Chloe by Ravel and nine solo improvisations played with flute. In our library we used also
3 monophonic bassoon solos, 3 pieces for solo clarinet from Stravinsky, two monophonic
oboe solos, and seven monophonic saxophone solos. All the Turkish music and all the
western songs were extracted from original audio cd’s, while the western monophonic
solos were downloaded from youtube.
Figure 2: Pitch histograms alignment to the highest similarity bin.
Results:
Pitch histogram computation:
One of the major features for tonal and scale similarity between monophonic songs is
the pitch histogram of each song [1]. The peaks of a pitch histogram shows the most
frequent notes played in a song. In the analysis of the western music, a pitch histogram is
consisted by a vector with length of 12 bins. The 12 bins of the pitch histogram represent
the 12 notes of the western music theory. Each fundamental frequency estimation is
rounded in to the closest frequency that belongs to a western note and it is stored into
respective bin of the histogram. This is considered to be the pitch class histogram of a
song.
The above approach can be applied only in western music where there is a strong
correlation between consecutive tones. In non-western music and particularly in Turkish
music, according to Arel theory [2], the octave is separated in to 53 logarithmically equal
distances thus in Turkish music the frequencies of several tones have different values
than the frequencies of the notes in western music theory. To overcome the limitation of
the histograms that has length of 12 bins, Gedik A. and Bozkurt B., in [3], they propose
that the length of the histogram should have higher dimension in order to capture all
possible frequencies of the tones that belong in the Turkish makam music.
The following tables show the distribution of the distance measure values for the 1060
comparisons between the Cypriot and western songs and the 1060 comparisons between the
Cypriot and Turkish songs respectively. The distribution of the dissimilarity is printed in the left
half of the tables while the distribution of the similarity is shown on the right half of the tables.
The output range of the correlation coefficients was 0 – 1. This range was separated in-to 10
steps and a distribution of the distance measure outputs of each comparison was created. The
range of each step is shown in the first column of the Tables, while in the second column is the
distribution is shown. The third column shows the percentage between the population of each
cell in the distribution column and the overall population of the comparisons.
Logarithm pitch histogram computation
It is well known that the frequency distance between semitones has an exponential
relation. The exponential function in figure 1 shows the derivative of the frequencies
between 100 western consecutive notes. The linear function shows the derivative of the
logarithm of the frequencies of the same 100 notes.
In this work, a higher dimensional vector for storing the distribution of the pitch
estimations of a song was used. The range of the frequencies 20 Hz – 20 KHz is mapped
in to a vector of length 6000 bins. A higher and a lower possible frequency estimation of
100 Hz and 2000 Hz respectively were set as threshold to the fundamental frequency
estimator assuming that the musical instruments and the songs we choose to analyze will
not overcome the fundamental frequency of 2000 Hz. After the limitation of the lower
and the higher possible fundamental frequency estimation, the length of the pitch
histogram vector was reduced to 2602 Bins.
Figure 3: The distribution of the output from the distance measure between the Cypriot songs
and the western songs showed in (light) blue and the distribution of the output from the distance
measure between the Cypriot songs and the Turkish songs showed in (dark) purple
Conclusions / Future work
In this work, 20 Cypriot folk music histograms were compared to 53 western and 53 Turkish
songs. The correlation coefficients distance measure technique was applied between the Cypriot
histograms and the western/Turkish histograms. From the results, we observed that the
hypothesis that Cypriot folk music shares similarities with Turkish music was partially confirmed.
We only explored the potential of similarities regarding tonality. As future work, more distance
measure techniques will be added to our algorithms and we will choose a bigger database in
order to create a more reliable database. Computational intelligence approaches will be also
applied.
References
Figure 1: The derivative of the frequencies of the notes in the western music theory
(exponential) and the derivative of the logarithm of the frequencies of the notes in the
Western music theory (linear) for 100 notes.
[1] G. Tzanetakis, A. Ermolinskyi, and P. Cook, “Pitch Histograms in Audio and Symbolic Music
Information Retrieval,” Journal of New Music Research, vol. 32, no. 2, pp. 143–152, 2002.
[2] Arel, H.S, Tu¨rk Musikisi Nazariyatı Dersleri, (1930).
[3] Gedik A. and Bozkurt B., Pitch-frequency histogram-based music information retrieval for
Turkish music” Signal processing, 90, 1049-1063 (2010).
Timbre and Tonal Similarities Between Turkish, Western and Cypriot Monophonic
Songs Using Machine Learning Techniques
Introduction:
Musical instruments:
Explore timbre and tonal similarities between Turkish, Western and Cypriot songs.
Little scientific information exists on possible influence of eastern musical culture in folk music of
Cyprus. Kallinikos [1] Tobolis [2] identify and report Similarity between Cypriot folk music and Byzantine
music while Tobolis provides transcriptions and descriptions of Cypriot folk tunes in both Byzantine and
western notation for majority of the Cypriot folk tunes. This work explores possible
similarities/dissimilarities between the folk music of Cyprus and Western/Turkish music in a low-level
and mid-level feature space using machine learning techniques.
Lute
Tamboucha
Violin
Pithkiavli
Data:
127 monophonic songs:
37: Cypriot folk songs performed by musicians Andreas Gristakkos and Giannis Zavros
47: western songs and solo improvisations
43 : Turkish Makam songs
Features:
Timbre Features
1. Zero crossing rate
2. Spectral centroid
3. Roll off
4. Entropy
5. Mel frequency cepstrum coefficients
(13 coefficients)
Methods:
Training
•Turkish songs
•Western songs
Validation
Turkish songs
Tonal features:
From pitch histograms aligned to the highest peak, 7 peak locations and
amplitudes are chosen
Western songs
Feature extraction
Cypriot songs
Audio database
Neural Networks
K-nearest neighbours
Pitch track extraction
Feature extraction
Support vector machines
Pitch histogram computation
Align histogram to tonal
Model
Get 14 features
(peak locations and amplitudes)
Results:
Tonal features:
Timbre features:
K-NN
K-NN
Western
Turkish
Cypriot
Predicted class
Western
Turkish
10
3
0
11
15
22
K-NN
Actual
class
Actual
class
Actual
class
Western
Turkish
Cypriot
Predicted class
Western
Turkish
13
0
1
10
8
29
Tonal and timbre features:
Western
Turkish
Cypriot
Predicted class
Western
Turkish
13
0
0
11
30
7
Conclusions:
Only the models used both timbre and tonal features were able to discriminate completely western and Turkish music. These models classified the majority of Cypriot music as western
music. Timbre analysis showed that Cypriot music is more likely to share similarities with Turkish music, while Tonal analysis showed that that there is not any significant similarities
between Cypriot and Western or Turkish music.
References
[1] Kallinikos, T. (1951). Kypriaki laiki mousa.
[2] Tobolis S. (1980). Traditional Cyprian songs and dances.
A machine learning approach for clustering western and non-western folk music
using low-level and mid-level features
Introduction:
Data:
The aim of this work is to explore timbre and tonal similarities between folk music of Cyprus,
western music and non-western music from, especially, East Mediterranean countries, using a
computational approach. Models with K-means and self organizing maps (SOM) were created
for clusters 1,2,…,10 in order to inspect the robustness of the results. We used the elbow
method [1] for the identification of the optimal number of clusters for the particular problem.
We built models using three different feature sets. The first feature set was consisted with lowlevel features, the second feature set was consisted with mid-level features and the third feature
set was consisted with low-level and mid-level features.
242 monophonic songs:
37 Greek Cypriot folk songs performed by musicians Andreas Gristakkos and Giannis Zavros
40 Turkish Cypriot performed by the Turkish-Cypriot musician Enver Kavaz
43 Turkish Makam songs
40 religious and folk songs from Iran
47 western songs and solo improvisations
12 songs from Syria
24 recordings were folk, religious and improvisational music from other Arab countries including
Yemen, Oman, Saudi Arabia, Egypt, and Morocco
Methods:
We used a frame-based approach to segment the audio signal into frames of 30ms, resulting in
30000 frames per second. For each frame, 25 low-level features have been extracted and from
each feature vector we compute the mean and the standard deviation. We use these values as
global features for describing the audio signals. The low level features used are the following: 1)
Zero crossing rate, 2) Spectral centroid, 3) Spectral brightness, 4) Spectral spread, 5) Spectral
skewness, 6) Spectral kurtosis, 7) Spectral roll-off, 8) Spectral entropy, 9) Spectral flatness, 10)
Spectral roughness, 11) Spectral regularity, 12) Spectral inharmonicity, 13-25) 13 MFCC
coefficients. A set of 13 mid-level features has also been extracted gathering information from
the pitch histograms. The pitch histograms are computed for a range of two octaves with bin
resolution of 1200 per octave. Histogram peaks are detected and each histogram is aligned to the
peak of highest amplitude. From the aligned histogram, we extract the 7 highest peaks and use
their locations and amplitudes to describe the tonality of every song. Note, the location of the
highest peak corresponds always to the first bin of the aligned histogram, and thus omitted. We
trained the database with SOM and the k-means algorithm using Euclidean distance and for
k=2,3,..,10. The two methods k-means and SOM had similar results, thus we present the results
from the k-means algorithm. The elbow method has been applied in order to decide the optimal
number of the clusters. For each cluster map we compute the quantization error:
Audio database
Audio database
Pitch track extraction
Segmentation
Into frames
Pitch histogram computation
Align histogram to tonal
Get 14 features
(peak locations and amplitudes)
Feature extraction
- Low level features
- Mid level features
where μ are the prototypes and x are the data points. Then we consider the function :
Clustering
- k-means
- Self organizing maps
k=k+1
where d is the number of dimensions. The function R(k) simulates the behaviour of the
quantization error for a data set that is uniformly distributed in the domain occupied by the real
data set. Next, we consider the function:
Figure 1: The function D(k). The
optimal number of clusters for this
problem are 7.
Computation of the
Quantization error J(k)
The optimal numbers of k are chosen where the function D(k) reaches the maximum value.
Figure 1 shows the plot of this function, where we observe that the optimal clusters for this
problem are 7.
Table 1: Results of the k-means algorithm using the low-level and mid-level features.
Results:
k=7
Table 1 shows the results from the clustering using the k-means algorithm for 7 clusters using the
Euclidian distance. Low-level and mid-level feature sets were used in the results shown in table 1.
We observe that Arabic music and music from Iran were clustered in the same cluster, while
music from Turkey and Syria were cluster together in another cluster. Music from western was
separated in to two clusters and Music from Cyprus was separated in to two clusters. The first
cluster contains the Greek Cypriot music while the other contains the Turkish Cypriot music. The
models built with low-levels features yield in to similar results and the models built with midlevel
features were not able to create a meaningful cluster map.
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7
Arabic
Iran
Syria
0
0
0
18
30
2
1
9
8
0
0
0
4
0
2
1
0
0
0
1
0
Gr Cypriot
0
0
0
0
6
27
4
Turkey
West
0
1
7
2
34
5
0
0
0
21
0
0
3
17
T Cypriot
0
0
4
31
0
0
4
Conclusions:
This work focuses on the identification of similarities/dissimilarities between western and non-western music. Tonal and timbre features were used for our analysis. Results indicate that
the features employed in this study are able to capture particularities of each music tradition. Limitations of this work are considered such as the different amount of recordings per
music tradition, and the use of global features that bypass timbre and tonal nuances evolved throughout the time series. It is left to future work to model these explicitly and expand the
features and music dataset.
Reference
[1] R. Tibshirani, G. Walther, T. Hastie.: Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology),
63(2), 411-423, (2000)
Automated Segmentation of Folk Songs
Using Artificial Neural Networks
Introduction:
Two different systems are introduced, that perform automated audio annotation and
segmentation of Cypriot folk songs into meaningful musical information. The first system
consists of three artificial neural networks (ANNs) using timbre low-level features. The output
of the three networks is classifying an unknown song as “monophonic” or “polyphonic”. The
second system employs one ANN using the same feature set. This system takes as input a
polyphonic song and it identifies the boundaries of the instrumental and vocal parts. For the
classification of the “monophonic – polyphonic”, a precision of 0.88 and a recall of 0.78 has
been achieved. For the classification of the “vocal – instrumental” a precision of 0.85 and
recall of 0.83 has been achieved. From the obtained results we concluded that the timbre
low-level features were able to capture the characteristics of the audio signals. Also, that the
specific ANN structures were suitable for the specific classification problem and
outperformed classical statistical methods.
Methods:
The main idea of our method is illustrated in Figure 1. The first system takes as input an
unknown song and predicts if it is monophonic or polyphonic. The second system takes a
polyphonic song and predicts the boundaries of parts of the song that only instruments are
performing (instrumental parts) and parts in which a singing voice is present (vocal parts).
Each audio signal was segmented into a sequence of overlapping audio frames of length
2048 samples (46 ms) overlapping by 512 samples (12 ms). For each of these audio frames
we extracted the following audio features: Zero Crossing Rate, Spectral Centroid, MFCC (13
coefficients). The training set was chosen in such a way that all the musical instruments that
were of our interest for classifying them were present. The positions of the vocal parts and
the instrumental parts were manually annotated to the training data. For the first system the
mean and the standard deviation values of each feature are calculated and are used to build
three feed-forward ANNs. Each of them has 20 neurons in the hidden layer and was trained
for 200 epochs. The ANNs were built using monophonic songs for the first class and
polyphonic songs for the second class. The difference between the three ANNs is that the
instrument that is performing in the monophonic songs is different for each network. This
system classifies an unknown song into the class “monophonic” or “polyphonic”. Both
systems 1 and 2 require audio frame segmentation and feature extraction.
Classification Into “Vocal” or “Instrumental”
For every audio frame of a song the ANN gives a prediction value in the range 0 or 1. One
example of the output of the ANN is shown in Figure 2 with continuous black line. The vocal
parts and the instrumental parts are annotated manually. Even though in this example it is
shown that most of the output values correspond to the correct class (if we set a threshold),
some of the frames are misclassified.
In a first step we divide the frame sequence in groups of 100 frames each and we compute their
vector mean as shown with red dots in Figure 2. These values are then converted into binary
vales by using an appropriate threshold. The threshold is calculated as the mean of the mean
values and is shown with green line in the same figure. In this example the threshold is 0.6. The
mean values that fall above the threshold are classified as “instrumental” while the values that
fall below the threshold are classified as “vocal”.
Further processing was needed in order to correct additional misclassifications. One example of
a misclassified sample is encircled in figure 2. In this example, the encircled output value exceeds
the threshold and the system wrongly classifies that position as instrumental. For solving such
misclassification problems, we introduced the following rule: Each sample of the quantized
vector is tested with the classes that belong to the frames around it. To consider a classification
of a sample as true, the class of the previous frame has to be the same with the class of the
following two frames. Regardless on what the classification of the testing frame is, after we
apply this rule, the classification may change. In Figure 2. the black continuous line shows the
output of the ANN for a chunk of 30 seconds of a polyphonic song. Red dots show the mean
values for groups of 100 frames each. Blue continuous line shows the binary quantization of the
mean values with respect to the threshold. Yellow circle shows an example of a misclassified
value.
Figure 2: Black continuous line shows the output of the ANN for a chunk of 30 seconds of a
polyphonic song. Red dots show the mean values for groups of 100 frames each. Blue
continuous line shows the binary quantization of the mean values with respect to the
threshold. Yellow circle shows an example of a misclassified value.
Evaluation and results
Figure 1: (a) System 1 takes as an input an unknown song and classifies the song as
“monophonic” or “polyphonic”. (b) System 2 takes as an input a polyphonic song and
segments it into “vocal” and “instrumental” parts.
Classification Into “Monophonic” or “Polyphonic”
The validation set contained 74 songs of a total duration of 230 minutes. 46 songs were
monophonic and the remaining 28 were polyphonic. For the classification of the “monophonic –
polyphonic”, we call a “false positive” prediction when a song is annotated as “monophonic” and
the prediction of the system is “polyphonic”. We present our results in terms of precision and
recall. ANNs achieved a precision of 0.88 and recall of 0.78. SVMs gave precision of 0.85 and
recall of 0.81 and Bayesian Statistics precision of 0.71 and recall of 0.69.
For the classification of the “vocal – instrumental” we call a false positive if a part of the signal
was annotated as “vocal” but the prediction of the system was “instrumental”. Figure 3 shows an
example on how we define the terms false positive, false negative and true positive for the
specific classification problem. The audio signal is plotted with black continuous line. The red
vertical lines indicate the limits in the audio signal where only instruments are performing, while
the green vertical lines indicate an example of the limits where a prediction was done from our
system. We call false positive the duration of the signal that the ground truth is annotated as
“vocal” and the prediction was “instrumental”. ANNs achieved a precision of 0.85 and recall of
0.83. SVMs gave precision of 0.86 and recall of 0.82 and Bayesian Statistics precision of 0.76 and
recall of 0.72.
For the classification into the two classes “monophonic” or “polyphonic” we built three ANN
using the mean and the standard deviation of each feature. In total, 32 features were used to
train the ANNs. The first ANN is called “male vocal – polyphonic” and is trained with 720
seconds of monophonic male singing performances which form the first class and 115
seconds of polyphonic music which form the second class. The second ANN is called “female
vocal – polyphonic” and is trained with 720 seconds of monophonic female singing
performances (1st class) and the same 115 seconds of polyphonic music (2nd class). The
third ANN is called “pithkiavli – polyphonic” and is trained with 600 seconds of monophonic
performances by the instrument “pithkiavli” (1st class) and 115 seconds of polyphonic music
(2nd class). The output target for the polyphonic music was set to 1 and the output target for
the class “female vocal”, “male vocal” and “pithkiavli” was set to 0.
The classification is done with the following procedure: an unknown song is represented
numerically by a vector of 32 that is fed to the three ANN “male vocal - polyphonic”, “female
vocal - polyphonic” and “pithkiavli - polyphonic”. We quantize the outputs of the models to 0
or 1 by using a threshold of 0.5. We classified a song as “monophonic” if the binary output of
at least two models is 0; otherwise the song is classified as “polyphonic”.
Figure 3: The interpretation of the terms “false positive”, “True positive” and “False negative”.
Automated Identification of Repeating Melodies
Introduction:
Response of a COSFIRE filter
Repetition of melodic patterns is an important aspect of music. The identification of such
patterns is an essential step in musicological analysis. The verse of a song is an example of
such a repeating pattern in popular music. In folk music repeating patterns are called stanzas.
In this work we will focus our experiments and the validation of our system using monophonic
songs of eastern mediterranean folk music. A melodic pattern is called ”motif” if it is
frequently being repeated in a song. In music theory, a ”motif” can be defined as the smallest
melody with important thematic identity. The identification of such motifs will lead to an
explicit understanding of the musical structure of a piece.
In Fig. 1a the black continuous line shows the raw values of the sound pressure of a song in
normalized units over time for a duration of 52 seconds. The vertical lines indicate the
positions in the signal where musical events of interest, e.g. motifs, were manually annotated.
In this example we illustrate four motifs. The sequence on how the repetition of the motifs
appear is often characterised with capital letters. In this example, the melody of the first motif
is being repeated every odd motif and it is named with the letter ”A”. Similarly, the letter ”B”
is given to the second one, where it is being repeated every even motif.
For each point of the prototype we compute a distance measurement by a gaussian function,
whose sigma is a linear function of the distance from the centre of the filter. We introduce the
parameter ”alpha” who is responsible for allowing frequency tolerance between the prototype
and the tested signals. The parameter alpha is a value in the range 0 - 1. This value is added to
the standard deviatio for every point away from the centre of the COSFIRE filter. Therefore, the
frequency in the centre of the filter has less tolerance in the distance function with respect to
the last extremes points. The response of the proposed COSFIRE detector is computed by
shifting the Gaussian distance values to the centre of the filter. Finally, we take the geometric
mean of these distance values. In Fig. 3 we present an example on how the responses are
affected with different values of alpha. In the upper plot, a training motif and a test motif are
presented. In this example we show the prototype of the 43rd COSFIRE filter in the training
motif and the response area in the test motif. The prototype used as input for the configuration
of the filter is shown in red. Respectively, the response area in the test motif is shown with
green. In the second plot of the same figure we present the responses of the filter only in the
response area and for three different values of the alpha parameter. It is shown that a higher
value of alpha ie. a = 1 returns higher response and thus raising the tolerance of the degree of
similarity between the prototype and the test signals..
Figure 1: Example of repeating melodies in a song. (a) Audio signal. (b) Pitch track.
We describe a novel algorithm for the automated identification of repeating melodic patterns
in monophonic i.e. with a single melody - vocal folk tunes, their segmentation and labeling.
We propose a method that identifies and segments an audio file into musical events such as
the start and end points of repeating patterns in vocal monophonic melodies using only one
feature, the fundamental frequency vector. This method is based on a modification of the
COSFIRE approach [1], a novel technique that has been introduced for trainable visual pattern
recognition. Our approach uses special adaptive filters that are trained with the pitch
trajectories.
Methods:
Our methodology is as follow: First, we use an automated method for segmenting the audio
signal into possible motifs. After the segmentation is done, we divide a motif that is called
the training motif in groups of 121 consecutive and overlapping frames separated by 20
frames. We introduce the term ”prototype” to describe one group of 121 frames. In Fig. 2
we illustrate an example where the encircled part of the signal which is located in motif 1 is
one such prototype. We use the values of the 121 fundamental frequencies to configure a
COSFIRE filter for each such prototype. Typically, a motif consists of around 100 prototypes..
For instance, the first motif shown in Fig. 2, consists of 106 prototypes. Next, we apply these
filters to another motif that we call the test motif. The responses of the filters will return
values in the range 0-1. Then we binarize their responses in such a way that an output 1 of a
given COSFIRE filter indicates the presence of a frame group in the test motif that is similar
to the frame group from the training motif that has been used to configure the concerned
COSFIRE filter. For instance, the COSFIRE filter that has been configured with the prototype
shown selected in Fig. 2 will detect the presence of a similar frame group in the response
area. If the COSFIRE filters generate binarized outputs of value 1 for the test motif in the
same order as for the training motif, we conclude that the test motif is a repetition of the
training motif. In practice, we apply a given COSFIRE filter not to the whole test motif but
only to a part of it that spans a fraction of 0.2 of the test motif around the same relative
position as the one of the frame group that has been used to configure that COSFIRE filter.
Figure 2: A prototype is used to construct a COSFIRE filter. The response area is shown.
Configuration of a COSFIRE filter
We configure the proposed COSFIRE filter using as input the fundamental frequency and the
point position for a group of 121 frames of the training motif around a frame which is called the
centre of the COSFIRE filter. We call this group of 121 frames prototype.
Each COSFIRE filter is applied in a particular segment of the test motif and not to the whole
signal. The idea is that if the training and the test motifs are similar then we should expect high
response in a relevant position with respect to the center of the filter. First we normalize the
time domain of the training motif in the range 0 - 1 by dividing the temporal values of each
position by the total duration. Then we use this number to multiply with the duration of the
test motif and to find the relevant position. Every COSFIRE filter is applied in 200 points around
the relative position.
Figure 3: (a) A training motif and a test motif. The prototype used to construct the COSFIRE
filter is shown with red in the training motif. Respectively the response area is shown with
green in the test motif. (b) The response of a COSFIRE filter for three different values of the
alpha parameter.
Classification into ”similar” / ”dissimilar”
The output of every COSFIRE filter is binarized with a threshold. Responses above that
threshold will be binarized as positive responses. It is possible that we get more than one
response when applying a filter. After the validation of the entire test signal with all of the
configured COSFIRE filters we simply sum the number of the positive responses and we
normalize them with the total number of the filters. If the normalized value is above a certain
threshold then the test motif is considered as similar.
Dynamic time warping
We applied the Dynamic Time Warping algorithm to the data and we computed the cost
function for pairs of motifs. The output of the cost function was used to classify the motifs
that were tested as ”similar” or ”dissimilar”. The results of the DTW in terms of precision and
recall were comparable with the results of COSFIRE method. The major disadvantage of DTW
is the long execution time.
Results
9 monophonic folk songs were used for validating the COSFIRE and the DTW methods. In
total, 123 motifs were tested and validated. In table 1 we present the data and their
properties. The database contains monophonic folk songs of Cyprus. Each song was
performed by different person in a different location and time period. Some of the songs
appear to have the same name, although the content of the melodies is different.
We present the results for the two methods COSFIRE and DTW in terms of Precision (P) and
Recall (R) . The different values of the P and R for the COSFIRE were obtained by changing
the parameter alpha. We used 17 different values of the alpha parameter in the range 0.1 0.9. P decreases and R increases with increasing values of alpha. The harmonic mean of the
precision and recall reaches a maximum at a recall R of 82,7% and a precision P of 86,8% for
alpha = 0,5.
For the DTW method the classification was made by setting a threshold on the cost function
of the DTW output. We used 17 different values of the threshold in the range 0-0,8. The
harmonic mean of the precision and recall reaches a maximum at a recall R of 79,6% and a
precision P of 87,2% for threshold = 0,4. The maximum harmonic mean for the COSFIRE has
a higher value in comparison with the one of DTW with values 0.874 and 0.8325
respectively. From these values we conclude that the COSFIRE performs better than the
DTW.
The major difference between the two methods COSFIRE and DTW is the time they take to
execute a given song. The total time of execution of the DTW for nine songs is 4123 seconds
while the COSFIRE took 274 seconds for the same songs.
Reference
G. Azzopardi and N. Petkov, Trainable COSFIRE filters for keypoint detection and pattern
recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, in print, 201.
System for folk music identification
Introduction:
We present a prototype of the application tool that it was
built in a graphical interface under the program “Intelligent
system for the identification of similarities and differences
between the Greek-Cypriot, Greek, Turkish-Cypriot, and
Turkish folk music”. The application interacts with buttons,
figures and explanations about its functionality. The main
purpose is the analysis of folk music of Cyprus, western and
non-western music. As soon as we finalize and complete all
of the functionalities, the application will be uploaded to
the projects’ website. With this interface, we provide the
possibility for researchers to use our algorithms, validate
our system and extract their conclusions.
Segmentation - Process one song
The second option is “Explore songs”. A plot of the waveform
of the song is shown on the top. The vocal parts are shown
with black and the instrumental parts with yellow colours
respectively. Furthermore, a number of buttons are set to
appear dynamically with respect to the number of the
vocal/instrumental parts. Each button is set to play the
respective vocal or instrumental part.
Tonal analysis – Explore songs
In the option “Explore songs”, two additional pop-up menus
are appear. The user can choose two songs from our
database. These songs will processed and the audio signal,
the pitch track and the pitch histogram of each song will
appear in the first two boxes. The third box shows a
comparison between the two histograms.
Home tab
Initially the application starts with the “Home” tab as shown
in figure. The title together with the explanation of the
project is shown. The user has to press the button “next” to
proceed to the main view.
Segmentation - Results
In the option “Results” we present our results and conclusions
about how the system works in terms of precision and recall.
The procedure on how the true positives, false positives and
false negatives are calculated is also explained.
Main view
In the “main view”, five buttons appear on the top as shown
in figure. The “Home” tab returns to the initial view. The
following four tabs are listed as “Segmentation”, “COSFIRE”,
“Tonal” and “Timbre Similarities”. All of these tabs are based
on a published work we have done. The work of “COSFIRE” is
not yet published, although we intent to submit our work and
findings in an academic journal. So far, we have worked in the
tabs “Segmentation” and “Tonal”.
Tonal analysis
In the “Tonal” tab we present the work that was published in:
Neocleous A., Panteli M., Petkov N., Schizas C., “Identification
of Similarities between the Turkish Makam Scales and the
Cypriot Folk Music”, HELINA's 5th National conference, Corfu,
Greece, 2012. In the main view, in a similar manner with the
“Segmentation” tab, the user can explore the database, to
process each song separately, to load a song from a database
and to see our results.
Segmentation
This work has been accepted in: Neocleous A., Petkov N.,
Schizas C., “Automated Segmentation of Folk Songs Using
Artificial Neural Networks”, 6th international conference in
neural computation theory and applications, Rome 2014,
where it will be orally presented. When the “Segmentation”
button is pressed, a pop-up menu appears with two options,
“Process one song” and “Demo”.
Tonal analysis – Show database
In the option “Show database” we present the names of the
database we used for this experiment. 20 Cypriot, 53 western
and 53 MAKAM monophonic songs were used. 17 of the
Cypriot songs were specifically recorded by the members of
our group to be used for our investigations.
Segmentation - Process one song
The first option is “Process one song” where the user is asked
to load a song in to the system as shown in figure. The song is
being analyzed and eventually it returns information whether
it is “monophonic” or “polyphonic”. Moreover, if the song is
polyphonic, the system proceeds with further analysis such as
the identification of vocal and instrumental parts.
Tonal analysis – Results
In the option “Results” we present our results as explained in:
Neocleous A., Panteli M., Petkov N., Schizas C., “Identification
of Similarities between the Turkish Makam Scales and the
Cypriot Folk Music”, HELINA's (Hellenic Institute of Acoustics)
5th National conference, Corfu, Greece, 2012.