Comparison methods

Download Report

Transcript Comparison methods

Retrieval Methods for
QBSH (Query By Singing/Humming)
J.-S. Roger Jang (張智星)
[email protected]
http://mirlab.org/jang
Multimedia Information Retrieval Lab
CSIE Dept, National Taiwan University
Retrieval Methods for QBSH
Goal
Find the most similar melody in the database
Challenges
Robust pitch tracking for various acoustic inputs
Input from mobile devices
Input at a noisy karaoke box
Comparison methods should be able to deal with…
Key variations in users’ input (for instance, due to gender
difference)
Tempo variations in users’ input
Reasonable response time, e.g., 5 seconds
Evaluation of QBSH Methods
Two criteria for evaluating QBSH methods
Efficiency: How fast is the system?
Can it deal with a music database of size 100K?
Effectiveness: How accurate is the system?
Top-10 recognition rates (RR) for n queries:
• RR = (1+0+0+1+1…)/n
Top-10 mean reciprocal rank (MRR) for n queries:
• MRR = (1/3+1/inf+1/4+1/2+1/5…)/n
True positive and true negative rates to deal with outof-vocabulary (OOV) problem
Examples of RR and MRR
Test specs
Test result
Query set: 10 clips
DB size: 100
No OOV
GT (groundtruth) of the
query set are within DB
GT ranking: [1 3 8 4 9
21 2 5 8 2]
Top-5 RR
(1+1+0+1+0+0+1+1+0+1
)/10 = 6/10 = 60%
Top-5 MRR
Quiz
candidate!
(1/1+1/3+1/∞+1/4+1/∞+1
/∞+1/2+1/5+1/∞+1/2)/10
= 0.2783
Types of QBSH Approaches
Categories of approaches to QBSH
Histogram/statistics-based
Note vs. note
Edit distance
Frame vs. note
HMM
Frame vs. frame
Linear scaling, DTW, recursive alignment
Linear Scaling (LS)
Concept
Scale the query linearly to match the candidates
Assumption
Uniform tempo variation
Rest handling
Quiz
candidate!
Cut leading and trailing zeros (silence)
All the other zeros (rests) are replaced with the
previous non-zero pitch
Linear Scaling
Scale the query pitch linearly to match the
candidates
Target pitch in database
Compressed by 0.5
Compressed by 0.75
Original input pitch
Stretched by 1.25
Stretched by 1.5
Original pitch
Best match
Strength and Weakness of LS
Strength
Typical mapping path
One-shot for dealing
with key transposition
Efficient and effective
Indexing methods
available
Quiz
Weakness
candidate!
Cannot deal with nonuniform tempo
variations
Shorten or Lengthen a Pitch Vector
Given a pitch vector y of length m, how to
shorten or lengthen it to length n?
x2=interp1(1:m, y, linspace(1, m, n));
Examples
m=7, n=13
m=7, n=9
Quiz
candidate!
Distance Function for LS
Commonly used distance function for LS
Normalized Lp-norm
  x1  x2    xn
Lp ( x ) 

n

p
Characteristics
p
p
1/ p




Quiz
candidate!
Usually p=1 or 2 for LS
Normalization to get rid of length variations
Key Transposition in LS
How to find the best transposed query that has
the smallest distance from the database items:
Best transposition
Query


sˆ  arg min L p (q  s  r )
s
Database item
In practice…
Transposed query
 


ˆ
p  2  s  mean(q  r )  mean(q )  mean(r )
 


p  1  sˆ  median(q  r )  median(q )  median(r )
Example of Linear Scaling via L1 Norm
linScaling01.m
Semitones
Database and input pitch vectors
70
60
Database pitch
Input pitch
50
0
50
100
150
200
250
300
350
Semitones
Database and scaled pitch vectors
70
60
Database pitch
Scaled pitch
50
0
50
100
150
200
250
300
350
Normalized distance
Distance
4
2
0
0.5
0.6
0.7
0.8
0.9
1
1.1
Scaling factor
1.2
1.3
1.4
1.5
Linear Scaling via L1 and L2 Norm
linScaling02.m
Semitones
Database and input pitch vectors
70
60
Database pitch
Input pitch
50
Semitones
0
50
100
150
200
250
300
350
60
Database and scaled pitch vectors
Database pitch
Scaled pitch via L1 norm
50
Scaled pitch via L2 norm
70
0
50
100
150
200
250
300
350
Distances
Normalized distances via L1 & L2 norm
5
L1 norm
L2 norm
0
0.5
0.6
0.7
0.8
0.9
1
1.1
Scaling factor
1.2
1.3
1.4
1.5
DTW (Dynamic Time Warping)
About DTW
DTW introduction
DTW for QBSH
#1 method for task 2 in QBSH/MIREX 2006
RA (Recursive Alignment)
Characteristics
Combine characteristics
of LS & DTW
#1 method for task 1 in
QBSH/MIREX 2006
A typical mapping path
Modified Edit Distance
Note segmentation
Modified edit distance
d i, j
d i 1, j  w(a, )              (deletion)


d i , j 1  w(, b j )              (insertion)


 min  d i 1, j 1  w(ai , b j )              (replacement ) ,
 {d
 w(ai k 1 ,...., ai , b j ), 2  k  i}    (consolidation)
 i k , j 1
{d i 1, j k  w(ai , b j k 1 ,...., b j ), 2  k  j}    ( fragmentation)