BlackboardPolyphTranscrip

Download Report

Transcript BlackboardPolyphTranscrip

Using Blackboard Systems for
Polyphonic Transcription
A Literature Review
by Cory McKay
Outline
•
•
•
•
•
•
Intro to polyphonic transcription
Intro to blackboard systems
Keith Martin’s work
Kunio Kashino’s work
Recent contributions
Conclusion
Polyphonic Transcription
• Represent an audio signal as a score
• Must segregate notes belonging to different
voices
• Problems: variations of timbre within a
voice, voice crossing, identification of
correct octave
• No successful general purpose system to
date
Polyphonic Transcription
• Can use simplified models:
– Music for a single instrument (e.g. piano)
– Extract only a given instrument from mix
– Use music which obeys restrictive rules
• Simplified systems have had success rates
of between 80% and 90%
• These rates may be exaggerated, since only
very limited testing suites generally used
Polyphonic Transcription
• Systems to date generally identify only
rhythm, pitch and voice
• Would like systems that also identify other
notated aspects such as dynamics and
vibrato
• Ideal is to have system that can identify and
understand parameters of music that
humans hear but do not notate
Blackboard Systems
• Used in AI for decades but only applied to music
transcription in early 1990’s
• Term “blackboard” comes from notion of a group
of experts standing around a blackboard working
together to solve a problem
• Each expert writes contributions on blackboard
• Experts watch problem evolve on blackboard,
making changes until a solution is reached
Blackboard Systems
• “Blackboard” is a central dataspace
• Usually arranged in hierarchy so that input is at
lowest level and output is at highest
• “Experts” are called “knowledge sources”
• KSs generally consist of a set of heuristics and a
precondition whose satisfaction results in a
hypothesis that is written on blackboard
• Each KS forms hypotheses based on information
from front end of system and hypotheses
presented by other KSs
Blackboard Systems
• Problem is solved when all KSs are satisfied
with all hypotheses on blackboard to within
a given margin of error
• Eliminates need for global control module
• Each KS can be easily updated and new
KSs can be added with little difficulty
• Combines top-down and bottom-up
processing
Blackboard Systems
• Music has a naturally hierarchal structure
that lends itself well to blackboard systems
• Allow integration of different types of
expertise:
– signal processing KSs at low level
– human perception KSs at middle level
– musical knowledge KSs at upper level
Blackboard Systems
• Limitation: giving upper level KSs too much
specialized knowledge and influence limits
generality of transcription systems
• Ideal system would not use knowledge above the
level of human perception and the most
rudimentary understanding of music
• Current trend is to increase significance of upperlevel musical KSs in order to increase success rate
Keith Martin (1996 a)
• “A Blackboard System for Automatic
Transcription of Simple Polyphonic Music”
• Used a blackboard system to transcribe a fourvoice Bach chorale with appropriate segregation
of voices
• Limited input signal to synthesized piano
performances
• Gave system only rudimentary musical
knowledge, although choice of Bach chorale
allowed the use of generally unacceptable
assumptions by lower level KSs
Keith Martin (1996 a)
• Front-end system used short-time Fourier
transform on input signal
• Equivalent to a filter bank that is a gross
approximation the way the human cochlea
processes auditory signals
• Blackboard system fed sets of associated
onset times, frequencies and amplitudes
Keith Martin (1996 a)
• Knowledge sources made five classes of
hierarchally organized hypotheses:
–
–
–
–
–
“Tracks”
Partials
Notes
Intervals
Chords
Keith Martin (1996 a)
• Three types of knowledge sources:
– Garbage collection
– Physics
– Musical practice
• Thirteen knowledge sources in all
• Each KS only authourized to make certain
classes of hypotheses
Keith Martin (1996 a)
• KSs with access to upper-level hypotheses can put
“pressure” on KSs with lower-level access to
make certain hypotheses and vice versa
• Example: if the hypotheses have been made that
the notes C and G are present in a beat, a KS with
information about chords might put forward the
hypothesis that there is a C chord, thus putting
pressure on other KSs to find an E or Eb.
• Used a sequential scheduler to coordinate KSs
Keith Martin (1996 b)
• “Automatic Transcription of Simple
Polyphonic Music: Robust Front End
Processing”
• Previous system often misidentified octaves
• Attempted to improve performance by
shifting octave identification task from a
top-down process to a bottom-up process
Keith Martin (1996 b)
• Proposes the use of log-lag correlograms in front
end
• Models the inner hair cells in the cochlea with a
bank of filters
• Determines pitch by measuring the periodic
energy in each filter channel as a function of lag
• Correlograms now basic unit fed to blackboard
system
• No definitive results as to which approach is better
Kashino, Nadaki, Kinoshita and
Tanaka (1995)
• “Application of Bayesian Probability Networks to
Music Scene Analysis”
• Work slightly preceded that of Martin
• Used test patterns involving more than one
instrument
• Uses principles of stream segregation from
auditory scene analysis
• Implements more high-level musical knowledge
• Uses Bayesian network instead of Martin’s simple
scheduler to coordinate KSs
Kashino, Nadaki, Kinoshita and
Tanaka (1995)
• Knowledge sources used:
–
–
–
–
–
–
Chord transition dictionary
Chord-note relation
Chord naming rules
Tone memory
Timbre models
Human perception rules
• Used very specific instrument timbres and musical
rules, so has limited general applicability
Kashino, Nadaki, Kinoshita and
Tanaka (1995)
• Tone memory: frequency components of
different instruments played with different
parameters
• Found that the integration of tone memory
with the other KSs greatly improved
success rates
Kashino, Nadaki, Kinoshita and
Tanaka (1995)
• Bayesian networks well known for finding good
solutions despite noisy input or missing data
• Often used in implementing learning methods that
trade off prior belief in a hypothesis against its
agreement with current data
• Therefore seem to be a good choice for
coordinating KSs
Kashino, Nadaki, Kinoshita and
Tanaka (1995)
• No experimental comparisons of this
approach and Martin’s simple scheduler
• Only used simple test patterns rather than
real music
Kashino and Hagita (1996)
• “A Music Scene Analysis System with the MRFBased Information Integration Scheme”
• Suggests replacing Bayesian networks with
Markov Random Field hypothesis network
• Successful in correcting two most common
problems in previous system:
– Misidentification of instruments
– Incorrect octave labelling
Kashino and Hagita (1996)
• MRF-based networks use simulated annealing to
converge to a low-energy state
• MRF approach enables information to be
integrated on a multiply connected hypothesis
network
• Bayesian networks only allow singly connected
networks
• Could now deal with two kinds of transition
information within a single hypothesis network:
– chord transitions
– note transitions
Kashino and Hagita (1996)
• Instrument and octave identification errors
corrected, but some new errors introduced
• Overall, performed roughly 10% better than
Bayesian-based system at transcribing 3part arrangement of Auld Lang Syne
• Still only had a recognition rate of 71.7%
Kashino and Murase (1998)
• Shifts some work away from blackboard system
by feeding it higher-level information
• Simplifies and mathematically formalizes notion
of knowledge sources
• Switches back to Bayesian network
• Perhaps not truly a blackboard system anymore
• Has very good recognition rate
• Scalability of system is seriously compromised by
new approach
Kashino and Murase (1998)
• Uses adaptive template matching
• Implemented using a bank of filters
arranged in parallel and a number of
templates corresponding to particular notes
played by particular instruments
• The correlation between the outputs of the
filters is calculated and a match is then
made to one of the templates
Kashino and Murase (1998)
• Achieved recognition rate of 88.5% on real
recordings of piano, violin and flute
• Including templates for many more instruments
could make adaptive template matching intractable
• Particularly a problem for instruments with
– Similar frequency spectra
– A great deal of spectral variation from note to note
Hainsworth and Macleod (2001)
• “Automatic Bass Line Transcription from
Polyphonic Music”
• Wanted to be able to extract a single given
instrument from an arbitrary musical signal
• Contrast to previous approaches of using
recordings of only one instrument or a set of
pre-defined instruments
Hainsworth and Macleod (2001)
• Chose to work with bass
– Can filter out high frequencies
– Notes usually fairly steady
• Used simple mathematical relations to trim
hypotheses rather than a true blackboard
system
• Had a 78.7% success rate on a Miles Davis
recording
Bello and Sandler (2000)
• “Blackboard Systems and Top-Down Processing
for the Transcription of Simple Polyphonic
Music”
• Return to a true blackboard system
• Based on Martin’s implementation, using a
conventional scheduler
• Refines knowledge sources and adds high-level
musical knowledge
• Implements one of knowledge sources as a neural
network
Bello and Sandler (2000)
• The chord recognizer KS is a feedworard network
• Trained using the spectrograph of different chords
of a piano
• Trained network fed a spectrograph and outputs
possible chords
• Can therefore output more than one hypothesis at
each iteration
• Gives other KSs more information and allows
parallel exploration of solution space
Bello and Sandler (2000)
• Could automatically retrain network to recognize
spectrograph of other instruments with no manual
modifications needed
• Preliminary testing showed tendency to
misidentify octaves and make incorrect
identification of note onsets
• These problems could potentially be corrected by
signal processing system that feeds blackboard
system
Conclusions
• Bass transcription system and more recent
work of Kashino useful for specific
applications, but limited potential for
general transcription purposes
• True blackboard approach scales well and
appears to hold the most potential for
general-purpose polyphonic transcription
Conclusions
• Use of adaptive learning in knowledge
sources seems promising
• Interchangeable modules could be
automatically trained to specialize in
different areas
• Could have semi-automatic transcription,
where user chooses correct modules and
system performs transcription using them