Pitch Recognition with Wavelets
Download
Report
Transcript Pitch Recognition with Wavelets
Pitch Recognition
with Wavelets
1.130 Final Presentation
by Stephen Geiger
What is pitch recognition?
Well, what is pitch? . . .
How HIGH or LOW a sound is
Which note?
Perceived Frequency
Relationship Between
Pitch and Frequency
Pitch
Fundamental
Frequency
For Example:
For Middle C:
Frequency = 262 Hz
MATLAB CODE:
fs = 22050;
% Sampling Frequency.
f = 262;
% Fundamental Freq of Middle C.
t=0:1/(fs):1;
% Time range of 0 to 1 seconds.
sound(cos(2*pi*f*t)/2,fs); % Make some noise!
For an A Scale:
A =
A#=
B =
C =
C#=
D =
D#=
220*2^(0/12)=
220*2^(1/12)=
220*2^(2/12)=
220*2^(3/12)=
220*2^(4/12)=
220*2^(5/12)=
220*2^(6/12)=
220
233
247
262
277
294
311
Hz
Hz
Hz
Hz
Hz
Hz
Hz
E =
F =
F#=
G =
G =
A =
220*2^(7/12) =
220*2^(8/12) =
220*2^(9/12) =
220*2^(10/12)=
220*2^(11/12)=
220*2^(12/12)=
330
349
370
392
415
440
Hz
Hz
Hz
Hz
Hz
Hz
An Octave Up:
For C5:
Frequency = 524 Hz
MATLAB CODE:
fs = 22050;
% Sampling Frequency.
f = 524;
% Fundamental Freq of C5.
t=0:1/(fs):1;
% Time range of 0 to 1 seconds.
sound(cos(2*pi*f*t)/2,fs); % Make some noise!
A Sum with 2 Frequencies:
Frequency = 262 Hz
and
Frequency = 524 Hz
MATLAB CODE:
fs = 22050;
% Sampling Frequency.
f1 = 262;
% Fundamental Freq of Middle C.
f2 = 524;
% Fundamental Freq of C5.
t=0:1/(fs):1;
% Time range of 0 to 1 seconds.
sound((cos(2*pi*f1*t)+ . . .
0.25*cos(2*pi*f2*t))/2,fs);
Freq in a Piano - Middle C
Frequency, Hz
FFT of a Oboe Middle C
Frequency, Hz
Mono vs. Poly
Monophonic
one note at a time
(e.g. trumpet)
Polyphonic
Creates a problem for
pitch recognition.
(especially octaves!)
multiple notes at a time
(e.g. piano, orchestra)
Some Existing Methods
Time Domain – Pitch Period estimation
With wavelets.
With auto-correlation function.
Freq. Domain – Find Fundamental
Auditory Scene Analysis
Blackboard Systems
Neural Networks
Perceptual Models
What applications are there?
Transcription
Modeling
Speech
Besides
of Music
of Musical Instruments
Analysis
its an Interesting Problem
My Work . . .
A Novel Wavelet Approach
Based on an observation made by
Jeremy Todd, that:
For a piano playing these notes, a CWT
could be used to identify a ‘G’
with certain scale/wavelet combinations.
Even with some polyphony !
Finding a G in a C Scale
Original
Signal
CWT @
Specific
“Scale”
The Continuous Wavelet
Transform
Definition of a CWT:
C a ,b
1
t b
f (t )
dt
a
a
Where:
a = scaling factor
b = shift factor
f(t) = function we start with
(t) = Mother wavelet
What is Scale?
LOW SCALE
Compressed Wavelet
Lots of Detail
High Frequency
(You are here)
HIGH SCALE
Stretched Wavelet
Coarse Features
Low Frequency
(And here)
Gaussian 2nd Order Wavelet
Initial Work
Took an empirical approach.
Ran a number of CWT’s at varying
scale, and looked at the results.
Picked out a CWT scale for each note in
the C scale.
Finding Notes in a C Scale
Original
Scale: 594
530
472
446
394
722
642
606
Finding Notes w/ Polyphony
Original
Scale: 594
530
472
446
394
722
642
606
More Complex Polyphony
Original
Scale: 594
530
472
446
394
722
642
606
Testing with different timbre
Original
Scale: 594
530
472
446
394
722
642
606
Why does this work?
The scale parameter
in the CWT affects
frequency response.
However, our “scales” that
work don’t seem to follow
a clear pattern.
Training Algorithm
Again, took an empirical approach.
Ran CWT’s at varying scales,
on sample files containing one note.
Picked out scales, where:
maximum of the CWT for
one note >> other notes
(and collected results).
Results of
Training Algorithm
...
Longer C Scale –
Trained on 3 Octaves of Notes
*From Right Hand of Prelude in C, Op. 28 No. 1
A Fragment by Chopin*
Training on a ‘Real’ Guitar
Only able to find 5 of 8 pitches for C Scale
training case. (With limited attempt).
Results on a test file were not completely
accurate.
Expected to be a more difficult case than a
piano.
Could merit a more thorough try.
Entire 88 K on a P
Work in progress.
It takes a long time to run many
CWT’s on 88 different sound files.
Initial results able to
identify notes 70-88.
Frequency Response
Revisited
Frequency Response of a 2nd Order Gaussian Wavelet
Resulting Scales for
22 Piano Notes
2500
2000
1500
SCALE
1000
500
0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
NOTE NUMBER
15
16
17
18
19
20
21
22
Resulting Scales for
8 Sinusoidal Notes
14000
12000
10000
SCALE
8000
6000
4000
2000
0
0
1
2
3
4
5
NOTE NUMBER
6
7
8
Conclusions
The novel wavelet approach isn’t perfect.
Requiring “training” is a handicap.
Most likely not suited to sources with
varying timbre. (e.g. guitar, voice)
Some interesting results.
The mechanism of detection could be
further investigated and better understood.