On Recognizing Music Using HMM

Download Report

Transcript On Recognizing Music Using HMM

On Recognizing Music Using
HMM
Following the path craved by
Speech Recognition Pioneers
Outline
Aim of this project
HMM Speech Recognition Paradigm
Structure of musical tones
Designing a HMM based Music
Recognizer using HTK
Aim of this project
Recognize different types of steady
state musical instrument
Piano,Guitar, Flute, Trumpet (String and
Wind)
Not Drums, Cymbals, Gongs (Percussion)
Design this recognizer based on
methods used in Speech Recognition
HMM Speech Recognition
Paradigm
Different types of systems
Isolated word based
Phoneme based
Discrete or continuous
Feature Analysis Options
Linear Prediction Analysis
Filterbank Analysis
HMM topology definition
Initialization and training of the HMM
Recognition and Evaluation
Types of systems
Phoneme based recognizer
A set of sounds that is sufficient to
compose speech in a language, each
modeled using a HMM
Not relevant to music
Isolated word based recognizer
Each vocabulary is modeled using a HMM
We treat each instrument as a music
vocabulary, and hope to recognize it
Discrete or Continuous
System
Concerns the visible observations emitted by
an HMM - discrete symbols or continuous
signals?
Continuous Model
The emitting state follows a probability density
function so as to capture the details of a signal
Discrete model
The emitted observations are limited into a set of
distinct symbols
Feature Analysis
Linear Prediction Analysis
A transfer function that models the shape of the
vocal tract
Models how voice is produced
Filterbank Analysis
Use Fourier Transfer to decompose waves into
sine wave components
Similar to the mechanism of the cochlea in the ear
Models how voice are heard
Initialize and Training the
HMM
Viterbi Algorithm
Use it to generate the MPE (Most Probable
Explanation) of a training sound, we can find
which vector belongs to which state
Update the observation probability distribution with
attributes of the vector and the state transition
matrix by counting frequency of vectors being in a
state
Repeat until converge
Baum-Welch Algorithm
Use it to find the probability of vector belongs to a
state
Do not give a definite answer but will smooth the
transition between states
Design and Justification of the
HMM Music Recognizer
Structure of musical tones
Simpler Structure
Model information to consider
Design
Modeled on he isolated word based system
Semi-Continuous System: Tied Mixture System
Filterbank Analysis

Increase the number of dimension in feature analysis
(typically 13 in speech)
Left-right 2 state HMM
Training is the same as in speech
Results
Implementation
Planned to do the above model using HTK
Cannot find enough training sample (need $$$ to
buy them)
Pending Questions
What should be the dimension size in feature
analysis
The 2 state model is very coarse, what is a good
HMM structure
Automatic structure learning
Summary
Outlined the HMM Speech Recognition
Paradigm
Outlined a feasible method of how
music can be recognized based on this
technique
Outlined further questions
THANK YOU!
Q&A