Transcript qbt

Query by Tapping
敲擊選歌
J.-S. Roger Jang (張智星)
Multimedia Information Retrieval Lab
CS Dept., Tsing Hua Univ., Taiwan
http://mirlab.org/jang
2016/3/27
1
Query by Tapping
Goal:
Music search based on uses’ tapping (at notes’
onsets) over the microphone/keyboard
Characteristics
Only note duration is used for comparison, note
pitch is discarded.
A hard task for human to recognize (which is
different from query by singing/humming)
Try this…
-2-
Query by Tapping
Goal:
Music search based on uses’ tapping (at notes’
onsets) over the microphone/keyboard
Characteristics
Only note duration is used for comparison, note
pitch is discarded.
A hard task for human to recognize (which is
different from query by singing/humming)
Try this…
-3-
Query by Tapping
Challenges:
Users is unlikely to use the same tempo as the
intended song
Users tend to lose notes instead of gaining ones
We have about 13,000 songs in the database
Major approach:
A distance measure based on dynamic
programming
-4-
Feature Extraction via Microphone
Microphone input:
After frame blocking, energy computation,
and thresholding:
-6-
Performance Evaluation of Onset
Detection
simSequence.m
1
Computed
GT
0.8
precision=3/6=0.5
recall=3/5=0.6
f-measure=2pr/(p+r)=0.5455
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0
1
2
3
4
5
6
-7-
Similarity Comparison with Songs in
Database
A fast method based on IOI ratios
Compute the IOI ratios for both query and db IOI
vectors
Compute the Euclidean distance these two ratio
vectors
-8-
Music Note Alignment
t: test (input) IOI vector
r: reference IOI vector
Alignment
by DP
Normalization
r(3)
t(3)
r(2)
t(2)
t(1)
r(1)
t
r
t
r
t
r
-9-
Normalization
Normalization to have
p
q
i 1
j 1
~
~
t
(
i
)


 rq (i)  1000
~
t  round (1000 * t / sum(t ))
~
r  round (1000 * r (1 : q) / sum(r (1 : q)))
q
(Multiplication of 1000 to guarantee high resolution
in fixed-point computation.)

dist (t , r ) 
min
q  p 2~ p  2
~
D( t , ~
rq )
-10-
Dynamic-programming-based Distance
j
t: test IOI vector of length m
r: reference IOI vector of length n
D (i , j )
Recurrent relation:
r(j-1)
D(i, j ) 
r(j-2)
 D(i  1, j  2)  r ( j  2)  r ( j  1)  t (i  1)  1 


min  D(i  1, j  1)  t (i  1)  r ( j  1)

 D(i  2, j  1)  t (i  2)  t (i  1)  r ( j  1)   
2 

r(2)
D(i,1)  0, i  1 ~ m  1
r(1)
t(1) t(2)
t(i-2) t(i-1)
D(1, j )  0, j  1 ~ n  1
i
-11-
Experimental Environment
269 test wave files of tapping clips
9 contributors (7 males, 2 females)
Wave length: 15 seconds
Wave format: PCM, 11025Hz, 8bits, Mono
Start position: Beginning of a song
Environment
Pentium III 800, 256MB RAM
Database
11,744 MIDI files
-12-
Test Results Using Clips of 15 Seconds
Average response time: 3.42
seconds (29.98 notes)
Recognition rates:
Top-1 (top 0.0085%): 15%
Top-10 (top 0.085%): 51%
Top-100 (top 0.85%): 80%
-13-
Error Analysis
Errors analysis of low-ranked clips
Some users cannot tap consistently through 15
seconds
Feature extraction is not robust enough to handle
noisy input.
Some MIDI files are not faithful rendition of the
original tunes.
Users cannot keep up with short consecutive
notes.
-14-
Recog. Rates w.r.t. Tapping Duration
Top-100 and 1000
curves level off after 10
seconds.
Top-100 curve does not
go up monotonically.
Top-1000
Top-100
Top-10
-15-
Demo
No. of MIDI files: 12982
-16-
Partial List of Songs













All I have to do is dream
You are my sunshine
Beautiful Sunday
Do Re Mi
Feelings
A time for us
Love is blue
Let it be me
My way
Love story
More than I can say
Only you
Rain and tears













Rhythm of the rain
Rose Rose I love you
The sound of silence
Unchained melody
We are the world
Yesterday
I just call to say I love you
Close to you
Mr. Lonely
Ben
Hey Jude
Donna Donna
Sealed with a kiss
-17-
Potential Applications
Interactive toys
Beat-tracking training and games
Song retrieval in noisy karaoke bars
-18-
Conclusions
Our MIR system is the first one with queryby-tapping capability.
Rhythm-based search can be used in
conjunction with pitch-contour-based search
to achieve a better recognition rate.
-19-
Future Work
Search scope expansion
How to retrieve MP3 or CD music directly?
Scale-up by hierarchical filtering method
How to deal with database with 100,000 songs?
What if the user tap from anywhere in the middle
of a song?
-20-