Melody Recognition with Learned Edit Distances

Download Report

Transcript Melody Recognition with Learned Edit Distances

Melody Recognition with
Learned Edit Distances
Amaury Habrard
Laboratoire d’Informatique Fondamentale CNRS
Université Aix-Marseille
José Manuel Iñesta, David Rizo
Dept. Software and Computing Systems,
Universidad de Alicante
Marc Sebban
Laboratoire Hubert Curien UMR CNRS 5516
Outline
Introduction
 Symbolic music encoding
 Edit distance and learning
 Experiments
 Conclusions

Introduction: task

Given an interpretation of a monophonic melody
retrieve the name of the song from a set of
known prototypes using a similarity measure
d(q, p1)=9
p1
d(q, p2)=5
p2
Query q
d(q, p2)=7
Which song?
pn
Approaches to solve the task



Geometric methods [Alloupis et al. 2006,
Lëmstrom et al. 2003]
Language models [Doraisamy et al. 2003]
Edit distance-based methods [Mongeau and
Sankoff 1990, Rizo and Iñesta 2003, Gratchen
et al. 2005]

These methods seem to perform the best
Edit distance between strings

String coding of a melody:
Different possible encodings for pitch and
rhythm
 E.g. intervals from tonic mod 12

C Major
7, 7, 9, 7, 0, 11,
7, 7
Similarity between musical
sequences

Definition

Given an edit script e = e1 . . . en as a sequence of edit
operations ei = (bi | ai) to transform a input data X into
an output Y
• ei = (ai, bi) ∈ (Σ ∪ {λ}) x (Σ ∪ {λ})

Cost of edit script

Distance between X and Y
d(X, Y ) = mine ∈S(X,Y ) π(e)
Tree representation
 Based on the logarithmic nature of music notation
 Each tree level is a subdivision of the upper level
whole
4 beats
half
2+2
quarter
4×1
eighth
8×½
..............
.......
....
 Leaf labels can be any pitch magnitude
 Rests are coded the same way as notes
 Duration is implicitly coded in the tree structure
 Finally labels are propagated bottom-up selecting most important note
Similarity of melodies

Use tree-edit distance

Currently using Selkow distance
Musical meaning of edit costs

Editing costs depend on the note and its context
in terms of music knowledge
• E.g. deleting a non diatonic note has lower cost than
deleting a note of the scale

Find the best editing cost matrix

Current approaches in music applications:
•
•
•
•

Unit costs: it does not reflect musical concept
Costs fixed manually based on musical expertise
Brute force: learning time
Genetic systems: not understandable costs
Our proposal:
• Probabilistic framework: learns matrix of edit probabilities,
keeps understandability
Stochastic Edit Similarity (1/3)
Learning the parameters of an edit
distance requires the use of an inductive
principle. In the context of probabilistic
machines, the maximization of the
likelihood is often used.
 Solution: to learn the edit parameters in a
probabilistic framework.

Stochastic Edit Similarity (2/3)

Definition
Given an edit script e = e1 . . . en as a sequence
of edit operations ei = (bi | ai) to transform a
input data X into an output Y
 a probabilistic edit script has a probability πs(e)
such that:

Stochastic Edit Similarity (3/3)

Definition
p(Y |X ) is the sum of the probabilities of all
edit scripts transforming X in Y.
 Let S (Y | X ) be the set of all scripts that
enable the emission of Y given X.

How to learn p(ei)?
• Stochastic conditional transducer
•Strings: Oncina and Sebban (2006)
•Trees: Bernard, Habrard and Sebban (2006)
Experiments

Corpus
8-12 bar fragments of 20 worldwide well
known tunes
 For each song 20 different interpretations by
5 different players
 Total: 420 prototypes, 20 classes

Experiments

Two experiments

Compare classification performance of the
probabilistic approach to:
• Fixed-costs systems
• Edit costs learned with genetic algorithms

Compare understandability of learned matrix
to:
• Edit costs learned with genetic algorithms
Genetic system


Conventional genetic system
Objective of the algorithm:


Costs represented by a 13x13 matrix




find the best editing costs to be used in the classical
edit distance
13 = 12 (pitches = | Σ | ) + 1 (λ)
No e(λ, λ) operation
168 editing costs
Individuals are represented by chromosomes
with 168 genes

Genes \in [0,2] C |R
Genetic system

Fitness function

Average precision-at-n for all prototypes of learning
set, using leave-one-out
• precision-at-n: number of relevant hits among the n closest
melodies, n being the number of prototypes in the learning
set with the same class as the query


Classification using 1-NN
JGAP library used, with default setup

Stop criterion: fitness function stabilization (around
100 generations)
Experiment 1: Melody
classification accuracy

Goal: to identify a melody given an
interpretation query


Strings and trees
3-fold cross validation



Train with 2 folds
Test with the remaining fold using leave-one-out
Classified using 1-NN
Experiment 1: Melody
classification accuracy
Both learned schemes improves success rates over fixed
costs systems, more evident on trees
Experiment 2: Analysis of the
Learned Edit Matrices
Strings
Trees
Genetic
By changing the contrast of the image the costs
represent the piano keyboard
Stochastic
Tree costs learned
Conclusions

Algorithms for edit distance learning applied


Improve results over fixed-cost approaches
Probabilistic and genetic approaches compared


Probabilistic more adequate to explain by means of
cost matrices the musical variations and/or noise
Maybe, with a bigger corpus the genetic approach
can generate more understandable cost matrices
• But also the probabilistic approach will improve
Future works
Consider more pitch representations such
as contour, high definition contour, pitch
classes, and intervals to evaluate their
adequacy to the melody matching
problem
 Work on polyphonic music


By means of trees [Rizo et al. 2008]