Comparison methods
Download
Report
Transcript Comparison methods
Retrieval Methods for
QBSH (Query By Singing/Humming)
J.-S. Roger Jang (張智星)
[email protected]
http://mirlab.org/jang
Multimedia Information Retrieval Lab
CSIE Dept, National Taiwan University
Retrieval Methods for QBSH
Goal
Find the most similar melody in the database
Challenges
Robust pitch tracking for various acoustic inputs
Input from mobile devices
Input at a noisy karaoke box
Comparison methods should be able to deal with…
Key variations in users’ input (for instance, due to gender
difference)
Tempo variations in users’ input
Reasonable response time, e.g., 5 seconds
Evaluation of QBSH Methods
Two criteria for evaluating QBSH methods
Efficiency: How fast is the system?
Can it deal with a music database of size 100K?
Effectiveness: How accurate is the system?
Top-10 recognition rates (RR) for n queries:
• RR = (1+0+0+1+1…)/n
Top-10 mean reciprocal rank (MRR) for n queries:
• MRR = (1/3+1/inf+1/4+1/2+1/5…)/n
True positive and true negative rates to deal with outof-vocabulary (OOV) problem
Examples of RR and MRR
Test specs
Test result
Query set: 10 clips
DB size: 100
No OOV
GT (groundtruth) of the
query set are within DB
GT ranking: [1 3 8 4 9
21 2 5 8 2]
Top-5 RR
(1+1+0+1+0+0+1+1+0+1
)/10 = 6/10 = 60%
Top-5 MRR
Quiz
candidate!
(1/1+1/3+1/∞+1/4+1/∞+1
/∞+1/2+1/5+1/∞+1/2)/10
= 0.2783
Types of QBSH Approaches
Categories of approaches to QBSH
Histogram/statistics-based
Note vs. note
Edit distance
Frame vs. note
HMM
Frame vs. frame
Linear scaling, DTW, recursive alignment
Linear Scaling (LS)
Concept
Scale the query linearly to match the candidates
Assumption
Uniform tempo variation
Rest handling
Quiz
candidate!
Cut leading and trailing zeros (silence)
All the other zeros (rests) are replaced with the
previous non-zero pitch
Linear Scaling
Scale the query pitch linearly to match the
candidates
Target pitch in database
Compressed by 0.5
Compressed by 0.75
Original input pitch
Stretched by 1.25
Stretched by 1.5
Original pitch
Best match
Strength and Weakness of LS
Strength
Typical mapping path
One-shot for dealing
with key transposition
Efficient and effective
Indexing methods
available
Quiz
Weakness
candidate!
Cannot deal with nonuniform tempo
variations
Shorten or Lengthen a Pitch Vector
Given a pitch vector y of length m, how to
shorten or lengthen it to length n?
x2=interp1(1:m, y, linspace(1, m, n));
Examples
m=7, n=13
m=7, n=9
Quiz
candidate!
Distance Function for LS
Commonly used distance function for LS
Normalized Lp-norm
x1 x2 xn
Lp ( x )
n
p
Characteristics
p
p
1/ p
Quiz
candidate!
Usually p=1 or 2 for LS
Normalization to get rid of length variations
Key Transposition in LS
How to find the best transposed query that has
the smallest distance from the database items:
Best transposition
Query
sˆ arg min L p (q s r )
s
Database item
In practice…
Transposed query
ˆ
p 2 s mean(q r ) mean(q ) mean(r )
p 1 sˆ median(q r ) median(q ) median(r )
Example of Linear Scaling via L1 Norm
linScaling01.m
Semitones
Database and input pitch vectors
70
60
Database pitch
Input pitch
50
0
50
100
150
200
250
300
350
Semitones
Database and scaled pitch vectors
70
60
Database pitch
Scaled pitch
50
0
50
100
150
200
250
300
350
Normalized distance
Distance
4
2
0
0.5
0.6
0.7
0.8
0.9
1
1.1
Scaling factor
1.2
1.3
1.4
1.5
Linear Scaling via L1 and L2 Norm
linScaling02.m
Semitones
Database and input pitch vectors
70
60
Database pitch
Input pitch
50
Semitones
0
50
100
150
200
250
300
350
60
Database and scaled pitch vectors
Database pitch
Scaled pitch via L1 norm
50
Scaled pitch via L2 norm
70
0
50
100
150
200
250
300
350
Distances
Normalized distances via L1 & L2 norm
5
L1 norm
L2 norm
0
0.5
0.6
0.7
0.8
0.9
1
1.1
Scaling factor
1.2
1.3
1.4
1.5
DTW (Dynamic Time Warping)
About DTW
DTW introduction
DTW for QBSH
#1 method for task 2 in QBSH/MIREX 2006
RA (Recursive Alignment)
Characteristics
Combine characteristics
of LS & DTW
#1 method for task 1 in
QBSH/MIREX 2006
A typical mapping path
Modified Edit Distance
Note segmentation
Modified edit distance
d i, j
d i 1, j w(a, ) (deletion)
d i , j 1 w(, b j ) (insertion)
min d i 1, j 1 w(ai , b j ) (replacement ) ,
{d
w(ai k 1 ,...., ai , b j ), 2 k i} (consolidation)
i k , j 1
{d i 1, j k w(ai , b j k 1 ,...., b j ), 2 k j} ( fragmentation)