QueryPresentation
Download
Report
Transcript QueryPresentation
Polyphonic Queries
A Review of Recent Research
by Cory Mckay
Overview
Introduction to polyphonic music queries
Review of five recent systems
Conclusions
Melodic Fragments as Queries
Melodic fragments are a useful way of
searching electronic music databases
Convenient for humans to remember
musical phrases
Easy to enter musical fragments
Monophonic: query by humming
Polyphonic: notation software
Search Requirements
Want searches to return all occurrences of a
given set of notes in a database
May also want records that contain similar sets
of notes
Should be able to deal with incomplete or
partially erroneous queries
Ideally, want to search music stored in
Symbolic formats (MIDI, scores, sketches)
Raw audio data
Monophonic Databases
Has been some success with special case
of monophonic databases
Andreas Kornstadt’s Themefinder
Roger J. MacNab’s Meledex
Searching polyphonic databases is much
more difficult
Polyphonic Databases: Problems
Notes may begin simultaneously. Makes it
impossible to outline an unambiguous sequence
of events
Multiple voices, with varying roles and relevance
to particular queries
Hard to deal with both symbolic and raw audio
representations:
Monophonic: can just transcribe audio
Polyphonic: no good transcription system
Current Polyphonic Systems
Wide-spread systems rely on meta-data
This no good for searches of melodic
fragments
Currently no widely accepted contentbased system
A number of papers on topic have recently
been published
Wiggins et al. (2002)
Designed a new algorithm called SIA(M)ESE
For making transposition-invariant queries
Matches a query even if there are events in a
score being searched that separate musical
events in the query
Assumes that both queries and database files
are in symbolic form and are accurate
This limits generality of algorithm
No implementation of the algorithm given
Doraisamy & Rüger (2001)
Polyphonic queries
Partial queries permitted
Symbolic queries
Symbolic records
Pitch and rhythm features
Doraisamy & Rüger (2001)
N-grams are produced by converting
notes into interval-based representation
and grouping intervals into subdivisions of
length n using a gliding window
Leads to transposition-invariant data
N-grams work well with monophonic
queries
Doraisamy & Rüger (2001)
To deal with potential for simultaneous
note onsets in polyphonic music,
constructs exhaustive “melodic strings”:
Divide piece into overlapping windows of n
adjacent onset times
Find all possible combinations of onsets
within each window
Doraisamy & Rüger (2001)
Incorporates rhythmic information as well as
intervals into each n-gram window
Done by calculating ratio of time differences
between adjacent note onsets:
Ratioi = (Onseti+2 – Onseti+1) / (Onseti+1 – Onseti)
Avoids need to quantize events based on a
predetermined beat duration
By using onsets only, avoids needing to
determine the duration of notes, which can be
difficult to detect in raw audio recordings
Doraisamy & Rüger (2001)
N-grams converted into text-based
representations to allow use of text-based
search engines
Interval and rhythmic ratio histograms
constructed in order to search for patterns
in each piece
Doraisamy & Rüger (2001)
Tested using a database of 3096 MIDI
representations of classical music
Studied effects of varying window sizes, bin
ranges and query lengths
95% success rate with window sizes of 4 onset
times, variable bin ranges and queries involving
50 notes
74% with query lengths of ten notes
65% with queries containing errors
Doraisamy & Rüger (2001)
Evaluation:
Performs well under ideal conditions
Databases or queries containing raw audio not
considered
Only transposition invariant searches possible
Undesirably long query lengths necessary
Only classical music records tested
Successful in showing the potential utility of n-grams
Most recent work uses a variant of this n-gram
approach
Doraisamy & Rüger (2002)
Monophonic queries
Partial queries permitted
Symbolic queries
Symbolic records
Pitch and rhythm features
Doraisamy & Rüger (2002)
Focused on monophonic queries because
of query by humming
N-grams particularly appropriate for errorprone queries resulting from query by
humming
One or two mistakes only lead to a few
incorrect n-grams among a larger number
of correct ones
Doraisamy & Rüger (2002)
Used same overall design as previous
system
Includes more sophisticated error models
to test effects of query inaccuracies
Database expanded to include popular
music as well as classical music
80% of the relevant compositions were
returned in first 15 hits, on average
Doraisamy & Rüger (2002)
Evaluation
N-grams are an effective and error-tolerant
tool for searching polyphonic music with
monophonic queries
Improvements still need to be made
Queries still symbolic, so true applicability of
system to query by humming not tested
Pickens et al. (2002)
Polyphonic queries
Full-length queries only
Audio queries
Symbolic records
Harmonic features
Pickens et al. (2002)
Relies on transcription
algorithms to transform
audio queries into
symbolic form
Relies on error tolerance
of searches to
compensate for
transcription errors
Two types of polyphonic
transcription systems
used, including
blackboard
Pickens et al. (2002)
Transcribed queries and database records
analyzed and stored using a harmonic modelling
module
Characterizes pieces by mapping chords to a
probability distribution
Breaks into sequences of independent note sets
Applies smoothing procedure to sets
Markov models created from smoothed sets
Pickens et al. (2002)
Performing searches:
Scoring function used to compare query
models with each model stored in database
Dissimilarity scores produced
Allows search hits to be ranked
Pickens et al. (2002)
Tested with database containing 3150 classical
piano pieces
Queries consisted of full-length audio recordings
On average, searches assigned a rank of
between 2 and 6 to the correct database record
Moderately successful at matching variations of
a piece: on average, 3 of top 5 hits relevant
Pickens et al. (2002)
Evaluation
Can use polyphonic audio queries
Limited by effectiveness of its transcription
systems and error tolerance of the query
system
System only tested on piano music
Full-length queries limit usefulness
Only one feature used
Song, Bae & Yoon (2002)
Monophonic queries
Partial queries permitted
Audio queries
Audio records
Melodic features
Song, Bae & Yoon (2002)
Intended to be used with
query by humming
Avoids disadvantages of
automated transcription
by mapping audio data
directly to “mid-level”
melody-based feature set
description
Contrasts with approach
of first transcribing audio
data and then extracting
features from this highlevel symbolic
representation
Song, Bae & Yoon (2002)
“Mid-level”
representation
produced by
processing audio
frames using a fivestep process
Song, Bae & Yoon (2002)
Instead of making definite decision on which
notes were present, as a blackboard system
would have done, vectors of all possible notes
were kept for each audio segment
A DP-matching method was used during
searches so that potentially error-prone patterns
of different lengths could be compared
Song, Bae & Yoon (2002)
Tested by attempting to match 176 hummed
samples to 92 short extracts (15-20 seconds
long) of popular Korean and Western popular
songs
Resulted in exact matches roughly 43% of the
time and a match in the top ten from 69% to
76% of the time (depending on window size)
Search time varied from 3 to 14 seconds
Song, Bae & Yoon (2002)
Evaluation
Performance relatively poor
Only monophonic queries possible
Only tested using a database of short
recordings
This was only system using audio data for
both queries and database records
Conclusions
No truly viable systems produced yet
Promising approaches have been proposed
Recurring problems:
None of systems can deal with both symbolic and
audio data
None have been tested with both polyphonic and
monophonic queries
Tend to require long queries to achieve good results
Searches do not allow much flexibility (e.g. must be
transposition invariant)
Conclusions
Difficult to compare systems because they
each use different performance evaluation
metrics
Limited data sets used during testing
Conclusions
Possible improvement: use a greater
number of feature classes
Aside from Doraisamy & Rüger, all
systems discussed limited themselves to
one feature class
Harmonic, melodic, timbral and rhythmbased features could all prove useful
Would allow much more flexible searches