An Algorithm for Audio Key Finding (Using Low

Download Report

Transcript An Algorithm for Audio Key Finding (Using Low

An Algorithm for Audio Key
Finding (Using Low-Dimensional
Spaces)
Samir Sharma
2/8/2011
Ozgur Izmirli
Associate Professor, Computer Science
Connecticut College
New London, Connecticut
http://cs.conncoll.edu/izmirli/
Works on Audio Key Finding
• An Algorithm for Audio Key Finding
– 6th International Symposium on Music Information
Retrieval (ISMIR2005), London, UK.
– Ranked first in the audio key finding session of Music
Information Retrieval Evaluation Exchange (MIREX2005)
• Audio Key Finding Using Low-Dimensional Spaces
– 7th International Symposium on Music Information
Retrieval (ISMIR2006), Victoria, Canada.
– Extended the previous work to determine how compactly
key information can be represented without sacrificing
accuracy of decision
Audio Key Finding Algorithm
•
Composed of three stages
1) chroma template calculation
2) chroma summary calculation
3) key estimation
Chroma Template Calculation
• Purpose: to have an ideal set of reference
chroma patterns for each of the 24 possible
keys
• templates serve as prototypes to which
information from audio input is compared
Chroma Template Calculation
• templates are created from recordings of piano
– University of Iowa Musical Instrument Samples
• a single known note per file
• a total of 51 notes from A1 to B5
– spectral analysis
• 50% overlapping windows in time
• use Hann windows in time to reduce sidelobes in
frequency domain
• 4096-point FFT
Chroma Template Calculation
• sample time waveform of C4 note
Chroma Template Calculation
• sample spectrum of C4 note
Chroma Template Calculation
• Pitch Distribution Profile (PDP)
– incorporated into the calculation of templates to
approximate the distribution of pitches expected in
a given key
– use composite PDP formed from Temperley1 and
diatonic profile
1Temperley,
“The Cognition of Basic Musical Structures”, 2001
Chroma Template Calculation
• C major example
1
0
6
3
2
4
5
8 10
7
9
11
Chroma Template Calculation
• PDP is invariant under transposition so the
profiles for other keys can be obtained by
rotation
• a PDP is also created for the minor keys
Chroma Template Calculation
• templates are created by weighted sums of the
spectra from individual notes
– spectra are weighted by the profile element for the
corresponding chroma
– spectra also weighted to account for registral
distribution of notes (?)
• result is a 12-element chroma template vector;
one vector for each of the 24 keys
Chroma Template Calculation
major scale
minor scale
Xi: spectrum of ith note (A1 is i = 0, B5 is i = 50); f(i) = 1 - 0.14i0.5
Pe: profile weighting (M: major, m: minor);
Y: binning function
Cn: chroma template vector for nth (scale/chroma) pair (starts at ‘A’)
Chroma Summary Calculation
• Purpose: to develop a vector representation of
the input audio signal
– vector captures the spectral information of the
input audio signal so that it can be compared with
set of templates
– assumption: musical pieces input to this algorithm
start in the same key as designated by the
composer
Chroma Summary Calculation
• starting point of music within input audio
signal is determined by comparing signal
energy to a threshold
• frame in which music starts is referred to as
the pivot point
• all analysis of the input audio signal begins
from this pivot point
Chroma Summary Calculation
0) initialize window size to be one frame
1) compute spectral information of the audio signal
within that window
2) bin the spectrum to create a summary chroma
vector of length 12
3) IF maximum window size
•
done
ELSE
•
increase window size and go to 1)
 Result is a sequence of summary chroma vectors –
one for each window size
Key Estimation
•
For each chroma summary vector, do:
1) calculate correlation coefficient with each of 24
precalculated chroma templates
2) pick the key associated with the template that
produced the maximum correlation coefficient
3) calculate confidence level as difference between
two highest correlation values divided by highest
correlation value
 Result is a key estimate and confidence value
for each window
Key Estimation
• Define set of plausible keys as those having
been chosen at least once in the individual
window estimates
• Of plausible keys, sum the confidence values
over all windows
• Key estimate is the one that has the maximum
total confidence
MIREX Evaluation
• Music Information Retrieval Evaluation
Exchange
– contest in 2005 for evaluation and comparison of
algorithms in many areas of music information
retrieval
– audio key finding contest used datasets of 1252
pieces
• 96 pieces were available for testing before contest
– resulting score was 89.55%
• range of algorithms was 79.1% - 89.55%
Dimensionality Reduction
• Purpose: determine the minimum number of
dimensions for audio key finding without
sacrificing accuracy of decision
– 12-element vector significantly reduces
dimensionality but question remains how much
further this can be reduced
Dimensionality Reduction
• approach is to start with 12-dimensional PCP
chroma representation and explore effects of
dimensionality reduction
– uses principal component analysis (PCA) to reduce
dimensionality
Principal Component Analysis
• can be used to find patterns in a set of data
• these patterns can then be used to compress the
data (i.e. to more compactly represent the data
with little to no loss)
• project the data onto a new set of coordinate
axes
– axes are the eigenvectors of the covariance matrix
of the data
Dimensionality Reduction
• PCA applied to the summary chroma vector
and the 24 chroma templates = 25 total vectors
• 25 data points are projected onto the first m
axes
– m is the level of dimensionality reduction
• key estimate is that whose template point is
closest to the summary vector point in the new
m-dimensional space
– confidence value calculated using distances
Evaluation
• evaluated using a collection of 152 pieces
Conclusion
• Izmirli proposes a relatively straightforward
approach for audio key finding
– accuracy is due in large part to how well the
composite pitch distribution profiles reflect the key
of the musical piece in question
– placed first in MIREX 2005
• Extension to reducing the dimensionality of
the space in which the decision is made
– uses principal component analysis (PCA)