Contentretrieval
Download
Report
Transcript Contentretrieval
Content-based
retrieval of audio
Francois Thibault
MUMT 614B
McGill University
Overview
Need effective ways to browse by content
through audio databases of growing sizes
Using descriptive sound parameters or
query by example systems
Determine similarity to query in order to
rank search results by relevance
(AudioGoogle)
Feature selection is the sinews of war…
Cheng Yang Approach (1)
Audio files preprocessed to identify local
peaks in signal power (n = 100-200/min)
Spectrogram computed using STFT of
2048 samples with Hamming window of
1024 samples and overlap factor of 2
Spectral vector extracted around each
peak makes up (n, 180, k<<2048) feature
space (200-2000Hz range only)
Yang Approach (2)
Given an example query, compute
the feature vector for the query and
look for similar audio in database
Compute minimum distance
between query and database
feature sets saving time using
dynamic programming techniques
(use results from previous pairs)
Linearity filtering to favor timescaled version compared to error
orientation disagreement
Yang’s Results
Use database of 120
song excerpts (~1min)
Good performance with
varying tempos, audio
quality, performance
variations
Poor performance with
transposed versions
Slow response, improved
with indexing schemes
Jonathan Foote Approach
Calculate feature vectors of audio examples of
desired classes (12 MFCCs plus energy)
Supervise training of quantized tree (partition
feature space in maximally different class
populations)
Parameterized data is quantized using the tree
for subsequent retrieval (creates template)
To retrieve similar audio content, template is
constructed for query audio, compared with
corpus templates using cosine distance
measure
Foote’s Results
Good way of measuring subjective
qualities of sound, without using targeted
features
Not as accurate to other techniques using
psycho-acoustic knowledge in finding
similar timbres (e.g. instruments)
Sensitive to pitch (will often return different
timbres of same pitch)
Erling Wold et al. Approach
(1)
Implemented several approaches in Muscle Fish
software
More particularly, specify explicit perceptual
features (loudness, pitch, brightness, bandwidth,
harmonicity)
Statistics of corresponding acoustic correlates
calculated for entire sample (mean, variance,
autocorrelation) form a-vector
For training set, mean vector calculated and
covariance matrix built from the examples and
becomes systems model
Wold Approach (2)
Use a weighted Euclidean distance for
classification and similarity measurements
Distance compared to threshold to decide if
objects belong to the same class (optional)
Wold Approach (3)
Segmentation is required beforehand,
achieved using same features, detecting
strong discrepancies
Wold and Foote comparison
What I retain: Wold has proven that it is possible to use
statistical methods for flexible classification