Melody Recognition with Learned Edit Distances
Download
Report
Transcript Melody Recognition with Learned Edit Distances
Melody Recognition with
Learned Edit Distances
Amaury Habrard
Laboratoire d’Informatique Fondamentale CNRS
Université Aix-Marseille
José Manuel Iñesta, David Rizo
Dept. Software and Computing Systems,
Universidad de Alicante
Marc Sebban
Laboratoire Hubert Curien UMR CNRS 5516
Outline
Introduction
Symbolic music encoding
Edit distance and learning
Experiments
Conclusions
Introduction: task
Given an interpretation of a monophonic melody
retrieve the name of the song from a set of
known prototypes using a similarity measure
d(q, p1)=9
p1
d(q, p2)=5
p2
Query q
d(q, p2)=7
Which song?
pn
Approaches to solve the task
Geometric methods [Alloupis et al. 2006,
Lëmstrom et al. 2003]
Language models [Doraisamy et al. 2003]
Edit distance-based methods [Mongeau and
Sankoff 1990, Rizo and Iñesta 2003, Gratchen
et al. 2005]
These methods seem to perform the best
Edit distance between strings
String coding of a melody:
Different possible encodings for pitch and
rhythm
E.g. intervals from tonic mod 12
C Major
7, 7, 9, 7, 0, 11,
7, 7
Similarity between musical
sequences
Definition
Given an edit script e = e1 . . . en as a sequence of edit
operations ei = (bi | ai) to transform a input data X into
an output Y
• ei = (ai, bi) ∈ (Σ ∪ {λ}) x (Σ ∪ {λ})
Cost of edit script
Distance between X and Y
d(X, Y ) = mine ∈S(X,Y ) π(e)
Tree representation
Based on the logarithmic nature of music notation
Each tree level is a subdivision of the upper level
whole
4 beats
half
2+2
quarter
4×1
eighth
8×½
..............
.......
....
Leaf labels can be any pitch magnitude
Rests are coded the same way as notes
Duration is implicitly coded in the tree structure
Finally labels are propagated bottom-up selecting most important note
Similarity of melodies
Use tree-edit distance
Currently using Selkow distance
Musical meaning of edit costs
Editing costs depend on the note and its context
in terms of music knowledge
• E.g. deleting a non diatonic note has lower cost than
deleting a note of the scale
Find the best editing cost matrix
Current approaches in music applications:
•
•
•
•
Unit costs: it does not reflect musical concept
Costs fixed manually based on musical expertise
Brute force: learning time
Genetic systems: not understandable costs
Our proposal:
• Probabilistic framework: learns matrix of edit probabilities,
keeps understandability
Stochastic Edit Similarity (1/3)
Learning the parameters of an edit
distance requires the use of an inductive
principle. In the context of probabilistic
machines, the maximization of the
likelihood is often used.
Solution: to learn the edit parameters in a
probabilistic framework.
Stochastic Edit Similarity (2/3)
Definition
Given an edit script e = e1 . . . en as a sequence
of edit operations ei = (bi | ai) to transform a
input data X into an output Y
a probabilistic edit script has a probability πs(e)
such that:
Stochastic Edit Similarity (3/3)
Definition
p(Y |X ) is the sum of the probabilities of all
edit scripts transforming X in Y.
Let S (Y | X ) be the set of all scripts that
enable the emission of Y given X.
How to learn p(ei)?
• Stochastic conditional transducer
•Strings: Oncina and Sebban (2006)
•Trees: Bernard, Habrard and Sebban (2006)
Experiments
Corpus
8-12 bar fragments of 20 worldwide well
known tunes
For each song 20 different interpretations by
5 different players
Total: 420 prototypes, 20 classes
Experiments
Two experiments
Compare classification performance of the
probabilistic approach to:
• Fixed-costs systems
• Edit costs learned with genetic algorithms
Compare understandability of learned matrix
to:
• Edit costs learned with genetic algorithms
Genetic system
Conventional genetic system
Objective of the algorithm:
Costs represented by a 13x13 matrix
find the best editing costs to be used in the classical
edit distance
13 = 12 (pitches = | Σ | ) + 1 (λ)
No e(λ, λ) operation
168 editing costs
Individuals are represented by chromosomes
with 168 genes
Genes \in [0,2] C |R
Genetic system
Fitness function
Average precision-at-n for all prototypes of learning
set, using leave-one-out
• precision-at-n: number of relevant hits among the n closest
melodies, n being the number of prototypes in the learning
set with the same class as the query
Classification using 1-NN
JGAP library used, with default setup
Stop criterion: fitness function stabilization (around
100 generations)
Experiment 1: Melody
classification accuracy
Goal: to identify a melody given an
interpretation query
Strings and trees
3-fold cross validation
Train with 2 folds
Test with the remaining fold using leave-one-out
Classified using 1-NN
Experiment 1: Melody
classification accuracy
Both learned schemes improves success rates over fixed
costs systems, more evident on trees
Experiment 2: Analysis of the
Learned Edit Matrices
Strings
Trees
Genetic
By changing the contrast of the image the costs
represent the piano keyboard
Stochastic
Tree costs learned
Conclusions
Algorithms for edit distance learning applied
Improve results over fixed-cost approaches
Probabilistic and genetic approaches compared
Probabilistic more adequate to explain by means of
cost matrices the musical variations and/or noise
Maybe, with a bigger corpus the genetic approach
can generate more understandable cost matrices
• But also the probabilistic approach will improve
Future works
Consider more pitch representations such
as contour, high definition contour, pitch
classes, and intervals to evaluate their
adequacy to the melody matching
problem
Work on polyphonic music
By means of trees [Rizo et al. 2008]