Content-Based Retrieval for Music Collections

Download Report

Transcript Content-Based Retrieval for Music Collections

Comparison of Different Retrieval Models
in Content-Based Music Retrieval
曾元顯、江陳威、白恒瑞、蘇珮君
Presented by Yuen-Hsien Tseng
Digital Media Center
National Taiwan Normal University
•
•
•
•
•
•
Introduction
Key Melody Extraction
Retrieval Models
Demos
Experiments
Conclusions
Introduction
• Desire for Content-based Music Retrieval
– for copyright search
• Industrial interests and economic issues
– for casual users to locate music work
• Entertainment, games, education, life
– for researchers to analyze music collections
• Feedback to music composition theory
• Real-world Applications
– Automatic surveillance of TV or Radio channels for possible
copyright infringement
– Search and buy music via Cell phones (Philips)
– User identification and games (張智星)
• Combining speech and music recognition
Matching Problems in Different Levels
• Audio Signal
– Hi-fidelity Recording
• Search by recording directly from CD (duplicate signal retrieval)
– Noisy or low-quality recording
• Search by recordings from Radio, TV music with cell phones
– Search the same tunes regardless of the same audio
• different pianos, singers(月亮代表我的心,陶吉吉), instruments
(Canon, Pachelbel, piano vs violin), orchestras, directors
• Symbolic Level
– See next page (focus of this work)
• Semantic Level
– Search happy but andante songs
– May need more (manually- or auto-derived) metadata
Major problems
• Vocabulary mismatch: the major failure in IR tasks
– The terms used to search the documents do not occur in the
desired documents (or mismatch the undesired)
•
•
•
•
Key mismatch: melodies may be sung in any key
Pitch error: incorrect perception of the pitch level
Note insertion and deletion: imperfect recall
Fragmentation, Consolidation, etc.: perception error or
different interpretation
• Query formulation: melody, rhythm, chord, etc.
• Result “browsing”: music is for listening, not for reading
Mismatch Examples
• Key mismatch
– happy birthday (by man) :
112143
– happy birthday (by woman) : 5 5 6 5 1 7
• Pitch error
– Correct melody:
112143
– Query given by a user : 1 1 2 1 5 3
• Note insertion or deletion
– one user gives: 5 6 5 3 3 3 2 3 4 3
– another gives: 5 6 5 3 3 4 3 2 3 4 3
• Fragmentation
– original:
31313553132
– may be played as : 3 3 1 3 3 1 3 3 5 5 5 3 1 3 2
Previous Work (1/3)
• Acoustic feature extraction for similarity matching
–
–
–
–
–
loudness, pitch, brightness, bandwidth, harmonicity
all computed from music waveform
search method: mostly query by examples
acoustic features are not familiar to ordinary users
Ref: Pfeiffer, et al [‘96], Wold, et al[‘96], Foote [‘97]
• Hawley [‘90] developed a system:
– allows MIDI keyboard input
– matches tunes from their beginnings
• Ghias et al [‘95] developed a system:
– allows queries from a microphone
– converts into melodic contour of 3 symbols for key-invariance
– approximate matching against 183 songs allowing note deletion,
insertion, and transposition
Previous Work (2/3)
• McNab, et al [‘97]: New Zealand DL’s MELody inDex
–
–
–
–
–
also allows acoustic input by singing, humming, or playing
melody transcription, pitch encoding, approximate matching
allow matches by “interval and rhythm”,“contour and rhythm”, …
match of 9400 tunes by dynamic programming takes 21 seconds
http://www.cs.waikato.ac.nz/~nzdl/meldex
• David Huron [‘93] "Humdrum Toolkit”
– hundreds of music tools including a DP matching tool “simil”
– http://dactyl.som.ohio-state.edu/Humdrum/
• Huron and Kornstaedt's "Themefinder"
– allow searches by pitch, interval, contour, etc.
– http://www.themefinder.org/
• Typke's “Tune Server”
– JavaApplet tune recognizer : convert WAV to Parsons' Code (UDR)
– http://wwwipd.ira.uka.de/tuneserver/ (over 10000 tunes)
Previous Work (3/3)
• More recent update at scholar.google.com
• This work is based on our previous one in SIGIR
1999 with:
– More retrieval models compared
– Experiment re-done and re-confirmed
• Yuen-Hsien Tseng, " Content-Based Retrieval for
Music Collections," Proceedings of the 22nd
International ACM SIGIR Conference on Research
and Development in Information Retrieval - SIGIR
'99, Aug. 15-19, Berkeley, U.S.A., 1999, pp.176-182.
Strategies to Overcome
the Major Problems
• Key-invariant encoding to allow queries in any key
• Error-tolerant retrieval model for random pitch
errors and random note insertion and deletion
• Query suggestion for fragmentation and
consolidation
• Flexible and friendly interface for query formulation
and relevance feedback
• Key melodies extraction for improving retrieval and
browsing
Key Melody Extraction
• Key melodies:
–
–
–
–
–
–
representative fragments of music
same as keywords that give quick overview of documents
memorable parts that people easily recall
meet most user queries
reduce response time due to less data
speed up the process of browsing and selection
• Assumption: key melodies are repeated patterns
– repetition is one of the basic composing rules [Jones, 1974]
• We develop an overlapping repetition identification
– so far no linear algorithm is found [Crawford & Iliopoulos, 1998]
Key Melody Extraction: The Algorithm
• Step 1: Convert the input string into a LIST of 2-note sequence.
• Step 2: Merge adjacent m-notes into (m+1)-notes according to
the merge and accept rules until no sequences to merge
– Merge rule:
For any two adjacent sequences K1 and K2, if both of their occurring
frequencies are greater than a threshold, they can be merged.
– Accept rule:
If the occurring frequency of K1 is greater than a threshold and K1 did not
merge with its neighbors, then accept K1 as a key melody
• Step 3: Filter and sort the results according to some criteria.
• Complexity : O(n m) if looking up occurring frequency is O(1)
– n the length of the melody
– m the length of the longest repeated patterns
Key Melody Extraction: An Example
Assume input string: ” CCCDCCCECC", threshold=1,separator=X.
Step 1: create a list of 2-note
L =(CC:5, CC:5, CD:1, DC:1, CC:5, CC:5, CE:1, EC:1, CC:5)
Step 2: merging
After 1st iteration: merge L into L1=(CCC:2, X:0, CCC:2, X:0),
drop : (CC:5, CC:5, CD:1, DC:1, CC:5, CC:5, CE:1, EC:1),
FinalList : (CC:5)
After 2nd iteration: merge L1 into L2 = (X:0),
drop : (X:0, X:0), FinalList : (CC:5, CCC:2)
After 3rd iteration: merge L2 into L3 = ( ),
drop : (X:0),
FinalList : (CC:5, CCC:2)
Step 3 : filtering
the item CC may be removed since it is a substring of CCC.
Key Melody Extraction: Results
•
•
•
•
MIDI files are collected from the Internet
Each MIDI contains several tracks
Each track contains the staff information of an instrument
Currently only melody from each track is extracted
No. of Songs
No. of Melodies (Tracks)
Average No. of Notes in
Melodies
Average No. of Notes in
Longest repeated Melodies
Ratio of the Key Melody
Length
Chinese Pop Music
135
1052
Classic Music
30
203
746
973
149
135
149/746 = 20%
135/973 = 14%
A Music Retrieval Architecture
MIDI
collections
melody extraction
key melody
strings
MIDI reconstruction
key melody
MIDI
MIDI accessing
melody
strings
key melody
extraction
encoding & indexing
key melody
index
melody
index
query & response
user interface
- query editing
- result browsing
- MIDI uploading - relevance feedback
- query expansion & suggestion
Pitch Profile Encoding
• To allow queries in any key (pitch level)
• Use the following symbols: Pn, Mn, and R
–
–
–
–
R : denotes a repetition of the previous note
P : a ascending note compared to the note that precedes
M : a descending note
n : number of semitones ascending or descending
• Example: Happy Birthday
– Standard Pitch Notation: C4 C4 D4 C4 F4 E4
– Simplified Notation: 1 1 2 1 4 3
– Pitch Profile Encoding : C4 R P2 M2 P5 M1
• Error ratio is 2 -- a drawback : m random errors caused by
pitch error or note insertion or deletion in the original string
results in at most 2m errors in the encoding string
Retrieval Model (Vector Space)
• Vector space model:
– Documents and queries are presented by vectors
– Each element in a vector is determined by an indexing scheme
– The value of each element is determined by a weight scheme
• The similarity between document di and query qj :
d q

Sim (d , q ) 
 q
T
i
j
k 1
T
i ,k
k 1
j ,k
j ,k
• Using inverted file, the query evaluation can be
computed in sub-linear time
Indexing and Weight Schemes
• N-gram (n-note) indexing scheme:
– N-gram is often referred to any n consecutive characters
– N-gram method is used in note level (thus, n-note indexing)
– In our n-note approach, all m-notes, where 1<= m <= n, are
all indexed.
• Weighting scheme:
– The shorter m-notes will match the input query in the
presence of random errors, while the longer m-notes will
favor the ones with more accurate melodies.
– The weight of each m-note in queries is given by
tf * ( 2(m-1) + 1 )
– The weight of each m-note in documents takes the value of
1 or 0, indicating its presence in the documents or not.
Retrieval Model:
Basic Dynamic Programming
• DP algorithm: used for comparison (from Sankoff & Kruskal, 1983)
• d[i, j] = min( d[i-1, j] + w(Q[i], 0),
d[i-1, j-1] + w(Q[i], D[j]),
d[i, j-1] + w(0, D[j]) )
– d[i, j] : accumulated distance between sequence Q and D
– w(Q[i], D[j]) : cost of substituting Q[i]with D[j]
– w(Q[i], 0) : cost of inserting Q[i]
(i-1,j-1)
(i-1, j)
– w(0, D[j]) : cost of deleting D[j]
– d[0, 0] = 0
(i, j-1)
( i, j )
– d[i, 0] = d[i-1, 0] + w(Q[i], 0) , 1<= i <= n
– d[0, j] = d[0, j-1] + w(0, D[j]) , 1<= j <= m
• Complexity: O(n * m), n : length of Q, m : length of D
Dynamic Programming: Examples
D
a d e c d e c f
Q
a 0 1 2 3 4 5 6 7
d 1 0 1 2 2 3 4 5
e 2 1 0 1 2 2 3 4
c 3 2 1 0 1 2 2 3
Assume costs of substitution, insertion, deletion are all 1
D
Fail if query
string is not
in the
beginning.
a c d a d e c f
Q
a
d
e
c
0
1
2
3
4
1
0
1
2
3
2
1
1
2
2
3
2
1
2
3
4
2
2
2
3
5
3
2
3
3
6
4
3
2
3
7
5
4
3
2
8
6
5
4
3
Initial
conditions
Sliding-Window Dynamic Programming
• Windowed DP
– Match only document segments of the same length
– Same-length match slides through the document
– Complexity: O(n2 * m), n : length of Q, m : length of D
D
Q
a
d
e
c
a c d a d e c f
3 2 2 0 2
Modified Dynamic Programming
• Change the initial condition
– d[0, 0] = 0
– d[i, 0] = d[i-1, 0] + w(A[i], 0) , 1<= i <= n
– d[0, j] = d[0, j-1] + w(0, B[j]) , 1<= j <= m
=>d[0, j] = 0 , 1<= j <= m
D
a c d a d e c f
Q
a
d
e
c
0
1
2
3
4
0
0
1
2
3
0
1
1
2
2
0
1
0
1
2
0
0
1
1
2
0
1
0
1
2
0
1
1
0
1
0
1
1
1
0
0
1
1
2
1
Initial
conditions
Demos
Search Result: an Example
Experiment (1999, 2000)
• Eight retrieval modes are compared:
Retrieval
mode
1
2
3
4
5
6
7
8
Collection
Key melody
Original
Key melody
Original
Key melody
Original
Key melody
Original
Pitch
Indexing
Encoding
No
2-note
No
2-note
Yes
2-note
Yes
2-note
Yes
3-note
Yes
3-note
Yes
No, match by WDP
Yes
No, match by WDP
Methodology
• Subjects:
–
–
–
–
8 subjects are invited
3 of them had a few years of training on piano
1 had a year of training on guitar
4 had only minimum musical training in school education
• Procedures:
– Each subject is presented a title list of 30 classic music and
135 Chinese pop music
– Queries are made based on their recall of the songs chosen
– Top 100 reuslts are examined. If fails, cause of mismatch is
advised and queries are modified accordingly by subjects
– Before formal experiments, subjects are allowed to practice
the procedure with 2 songs
Queries: for classic music
song type
1
2
3
Lucky
case
4
5
6
7
I
I
M
I
I
I
M
I
M
I
M
I
M
I
query
G5 A5 G5 E5 E5 E5 D5 E5 F5 E5
5 6 5 3 3 3 2 3 4 3
C6 C6 D6 D6 C6 C6 A5 A5 A5 A5 A5 G5 A5 B5 A5
1+ 1+ 2+ 2+ 1+ 1+ 6 6 6 6 6 5 6 7 6
G5 E5 C5 E3 G5 C6
5 3 1 3 5 1+
As5 G5 D5 G5 As5 C6 E6 D6 C6 E6 As5
5 3 1 3 5 1+ 3+ 2+ 7 1+ 3+ 5
As5 A5 As5 D6 C6 As5
1 7- 1 3 2 1
B4 Gs4 Fs4 D4 Fs4 Gs4
5 3 2 1 2 3
Gs6 Gs6 Gs6 Fs6 E6 B6
3 3 3 2 1 5
length
10
15
6
12
6
6
6
Queries: for Chinese pop music
song type
8
Lucky
case
9
10
11
12
13
Need query
suggestion
14
15
16
17
18
19
20
M
M
M
M
M
M
M
M
I
I
M
I
M
I
M
I
M
I
M
I
M
I
M
M
M
I
query
length
E5 D5 C5 C5 D5 E5 G5 A5 B5 C6 C6 A5 G5 E5 G5
3 3 2 2 1 1 1 1 2 2 3 3 5 5 6 6 7 7 1+ 1+ 1+ 1+ 6 6 5 5 3 3 5 5
C6 C6 C6 C6 A5 G5 A5 F5 F5 G5 A5 D5
5 5 5 5 5 5 5 5 3 3 2 2 3 3 1 1 1 1 2 2 3 3 6- 6Fs6 A6 B6 B6 A6 A6 Fs6 E6 D6 E6 Fs6 E6 D6 B5
3 3 5 5 6 6 6 6 5 5 5 5 3 3 2 2 1 1 2 2 3 3 2 2 1 1 6- 6E5 G5 E5 E5 D5 E5 E5
3 3 5 5 3 3 3 3 2 2 3 3 3 3
C6 D6 E6 G6 G6 G6 E6 D6 D6 D6 D6 E6 D6
1 2 3 5 5 5 3 2 2 2 2 3 2
G5 D6 D6 E6 C6
5 2 2 3 1
A5 E5 G5 G5 E5 D5 C5 D5 E5 A5
6 3 5 5 3 2 1 2 3 6
C5 C5 C5 C5 C5 D5 C5 D5 D5 D5 D5 D5 G5 A5
1 1 1 1 1 2 1 2 2 2 2 2 5 6
C5 C5 C5 A5 G5 D5 D5 D5 E5 D5 C5
1 1 1 6 5 2 2 2 3 2 1
C7 D7 E7 E7 E7 E7 E7 E7 E7 D7 C7 D7 E7
1 2 3 3 3 3 3 3 3 2 1 2 3
Fs5 Gs5 B5 Cs6 Cs6 B5 B5 B5
5 6 1+ 2+ 2+ 1+ 1+ 1+
C5 C5 D5 D5 E5 E5 F5 F5 A4 A4
1 1 2 2 3 3 4 4 6- 6G5 D6 D6 E6 C6 G5 D6 D6 E6 C6 E6 E6 B5 D6
1 5 5 6 4 1 5 5 6 4 6 6 3 5
15
30
12
24
14
28
7
14
13
5
10
14
11
13
8
10
14
Hit Positions
Song
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
mode
2 15
6
1
3
7
1
3
3 12
7
1 10
7
1
1
1 77
6 35 52
4
2
20
6
4
3 13 12 14 21
5 96 27 36 22
5 189 27 44 119 40
3
3
26 12
3
1
1 38 13 156 102 85 26 63 43
1 11 85
2
7 49
4
88 63
9
4
3
6 13 245 256 78 421 232 53
8
7 356
4
8 18 53
4
1
5
26
1
1
1 38
9
5 13
8
3 24 19
1 23 63
1
1
8
2
6
87 46
2
1
1
4
8 16 24 12 79 72 30
1 54 222
2
1
3
5
1
7
39
1
1
1 34
1
1 43
1
1 47
4
1
3 106
8
1 254
1
8
61 12
1
1
1
2 36
1 42
1
8 126 17
1
1 269 24
1 149
mode 1
Comparisons btwn any
2 modes: Number of
cases that mode i (row)
outperforms mode j (col)
1
2
3
4
5
6
7
8
2
6
5
9
6
7
7
2
3
4
5
7
8
18 12
10
10
5 4
15 14
12 10
14 13
13 10
15
13
15
9 12 10
4 8 6
1 6 4
2 1 3
13 7
3
4
7 12
5 10 4
9
7
6
3
9
6
8
18
19
17
17
6
No. of times
at top rank
12
6
8
2
12
8
14
13
Experiment (2006)
• Ten retrieval modes are compared
• Two CS students re-do the experiment
Retrieval
mode
1
2
3
4
5
6
7
8
9
10
Collection
Key melody
Original
Key melody
Original
Key melody
Original
Key melody
Original
Key melody
Original
Pitch
Encoding
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Indexing
2-note
2-note
2-note
2-note
3-note
3-note
No, match by WDP
No, match by WDP
No, match by MDP
No, match by MDP
Hit Positions
1
2
1
3
20
2
2
3
3
22
5
4
10
4
5
1
2
6
3
12
7
5
2
3
4
22
85
11
63
3
2
1
1
3
3
49
15
5
6
24
79
4
40
1
1
1
1
1
1
7
8
11
57
1
2
1
1
1
1
9
10
6
32
1
6
1
1
1
1
8
9
6
9
2
7
13
4
7
14
1
20
15
7
8
16
No
6
17
7
26
18
42
39
19
410
104
20
6
38
28
2
146 107 6
53 95 143
5 145 171 114 187 56
1
13
7
31
91
98
2
11
6
18
15
3
7
45
49
8
13
6
17
7
66
10
2
30
7
78
35
73
37
35
1
1
23
58
26
40
1
2
1
1
2
1
1
2
1
1
144
3
34
3
X
X
X
X
X
X
2
X
174
X
9
20
1
X
X
X
50
X
12
X
1
1
1
X
1
X
1
1
115
4
38
3
1
1
4
2
1
1
2
7
58
63
11
29
1
1
3
5
95
52
14
39
1
1
1
1
1
1
3
4
mode
Comparisons btwn any
2 modes: Number of
cases that mode i (row)
outperforms mode j (col)
1
2
3
4
5
6
7
8
9
10
10
1
1
1
2
11
26
28
11 12
8
12
6 8
7 7 6
8 9 15
8 9 12
7 11 10
4 6 7
10 14 14
10 12 16
12
7
1
5
6
7
8
9 10
13 10 10 10 12 7
12 11 11 9 14 5
12 2 5 8 12 4
4 3 7 12 3
16
9 9 12 5
16 6
9 10 4
12 5 5
10 4
7 4 5 2
2
16 9 10 7 14
17 8 12 7 12 4
6
7
2
2
5
2
6
3
7
No of Times
at top rank
7
4
2
2
7
6
9
5
12
10
Search Time Comparison
Mode
1-6
classics
Num. of
Avg
Doc.
Seconds
203- 444
<1
Chinese Pop songs
Num. of
Avg
Doc.
Seconds
1183 -1941
<1
7
444
23.9
1941
281
8
203
108.6
1183
X
9
444
2.7
1941
24
10
203
12
1183
88
Results from the experiments
• Key melody search is better than direct melody search
• Longer n is better for n-note indexing
– longer n yields more index terms for the same collection
• Exact pitch encoding is better
– only for the same n-note indexing
– only suitable for skilled users : only 3 queries are made in
correct key at the first time
• Dynamic programming is slightly better, but requires
a large amount of evaluation time
– In average, 89 seconds for 444 classic key melodies, and
536 seconds for 203 classic melodies
– For n-note indexing with inverted files, only 1 or 2 seconds
Number of Index Terms
Collection
No. of melodies
Classic music
Key melodies
of classic music
Pop music
Key melodies
of pop music
203
No. of terms for No. of terms for
2-note indexing 3-note indexing
4092
23482
444
1287
4458
1183
8143
63213
1941
4720
18440
Conclusions
• Key-invariant encoding allows queries in any key
• Random pitch errors, note insertion, and note deletion
can be overcome by carefully designed indexing,
weighting, and retrieval schemes without resorting to
much time-consuming DP matching
• Burst errors due to fragmentation or other music
factors require query suggestion
– neither DP nor our approach performs well for this case
– 5 out of 13 cases in pop music require query reformulation
due to fragmentation
Future Work
• More types of mismatches to analyze for automatic
query candidate suggestion
• Effective melody extraction for polyphonic melodies
– monophonic melody is assumed in this work, -- no notes
occurs simultaneously
– more difficulty to form a query that has simultaneous notes -more discrepancy between queries and polyphonic melodies
– Uitdenbogerd, et al, [1998] discussed effective algorithms for
extracting monophonic melodies from polyphonic ones
• Automatic key melody extraction allowing variations
– repeated melodies with small variations enrich the melodies
– Bakhmutova, et al, [1997] has presented a semi-automatic
procedure for revealing similar melodies
Related Researches and Projects
• ISMIR since 2000
– International Symposium on Music Information Retrieval
• WOCMAT since 2005
– Workshop On Computer Music and Audio Technology
• Digital archive application
– Data mining in digital music archive
• Free music audio, sound processing tools and musicrelated visualization and mining tools
– http://www.music-ir.org/evaluation/tools.html
• Music IR evaluation since 2005
– http://www.music-ir.org/mirexwiki/index.php/Main_Page
– Test collection: music documents, query sets, and judgment
– Major hurdle: copyright issue
Research Opportunities
• Apply language model in text retrieval to MIR
– May need music composition theory
• Music mining
–
–
–
–
–
–
Key feature extraction
summarization
Clustering (detecting similar inter- or intra-patterns)
Classification (genre, emotion)
Annotation
Aggregated Analysis
• Applications
– Games, entertainment, education, culture renovation, and
researches
– Innovative, multimedia, multimodal, and interdisciplinary