Transcript Chapter 13

Computer Science 121
Scientific Computing
Winter 2012
Chapter 13
Sounds and Signals
Background: Sounds and Signals
●
●
Recall transducer view of computer: convert
input signal into numbers.
Signal: a quantity that changes over time
– Body temperature
– Air pressure (sound)
– Electrical potential on skin (electrocardiogram)
– Seismological disturbances
●
We will study audio signals (sounds), but the
same issues apply across a broad range of
signal types.
13.1 Basics of Computer Sound
>> [x, fs, bits] = wavread(‘FH.wav');
>> size(x)
ans = 41777
1
>> [max(x) min(x)]
ans = 0.9922
-1.0000
>> fs
fs = 11025
>> bits
bits = 8
>> sound(x, fs)
13.1 Basics of Computer Sound
13.1 Basics of Computer Sound
13.1 Basics of Computer Sound
• x contains the sound waveform (signal) –
essentially, voltage levels representing
transduced air pressure on microphone.
• fs is the sampling frequency – how many time
per second (Hertz, Hz), did we measure the
voltage?
• bits is the number of bits used to represent
each sample.
Questions
• Why does the sound waveform range from -1 to +1?
– These values are essentially arbitrary. One nice
feature of a ±x representation is that zero means
silence.
• What role does the sampling frequency play in the
quality of the sound?
– The more samples per second, the closer the sound is
to a “perfect” recording.
• What happens if we double (or halve) the sampling
frequency at playback, and why?
• What is it about the waveform that determines the sound
we're hearing (which vowel), and the speaker's voice?
Questions
• What is it about the waveform that determines the
sound we're hearing (which vowel), and the
speaker's voice?
–Most of this information is encoded in the
frequencies that make up the waveform –
roughly, the differences between locations of
successive peaks – and not in the actual
waveform values themselves.
–We can do some useful processing on the
“raw” waveform, however – e.g., count
syllables:
Syllable Counting by Smoothing and
Peak-Picking
function res = syllables(x, fs)
% SYLLABLES(X, FS) counts syllables in speech waveform X by peak-picking
% on smoothed rectified signal. FS is sampling rate.
% how much higher a peak must be than its neighbors
DIFF = .001;
% size of moving-average "window" around each point, empirically determined
winsize = fix(fs / 20);
% rectify signal
x = abs(x);
% create smoothed signal from rectified
y = zeros(1, fix(length(x)/winsize));
for i = winsize:winsize:length(x)-winsize
y(fix(i/winsize)) = mean(x(i-winsize+1:i+winsize));
end
plot(y)
hold on
% pick peaks in smoothed
peaks = find((y(2:end-1)-y(1:end-2))>DIFF & (y(2:end-1)-y(3:end))>DIFF) + 1;
plot(peaks, y(peaks), 'ro')
res = length(peaks);
13.2 Perception and Generation of
Sound
• Sound is the perception of small, rapid
vibrations in air pressure on the ear.
• Simplest model of sound is a function P(t)
expressing pressure P at time t:
P(t) = A sin(2πft + φ)
where A = amplitude (roughly, loudness)
f = frequency (cycles per second)
φ = phase (roughly, starting point)
• This is the equation for a pure musical tone
(just one pitch)
13.2 Perception and Generation of
Sound
–Inverse of frequency is period (distance
between peaks):
13.2 Perception and Generation of
Sound
–E.g., whistling a musical scale:
13.2 Perception and Generation of
Sound (ignore textbook)
• Most real sounds are complicated mixtures of
many frequencies (no pure tones in nature).
• Still, we can learn some basic concepts by
experimenting with pure tones:
>> FS = 10000;
% sampling frequency
>> f = 500;
% sound frequency
>> A = 1.0;
% amplitude
>> t = linspace(0,1,FS);
% 1 sec at 10 kHz
>> Pt = A * sin(2*pi*f*t);
% ignore phase
13.2 Perception and Generation of
Sound (ignore textbook)
>> Pt = A * sin(2*pi*f*t);
>> plot(t, Pt)
>> xlim([0 .01])
% plot from 0 to .01 sec
Multiplying the frequency by k gives us k times as many cycles
in the same amount of time….
>> Pt = A * sin(2*pi*3*f*t); % k = 3
>> plot(t, Pt),xlim([0 .01])
Multiplying the amplitude by a number between 0 and 1 adjusts
the loudness (volume) of the sound:
>> Pt = 0.5 * A * sin(2*pi*3*f*t); % half the loudness
>> plot(t, Pt), xlim([0 .01])
>> ylim([-1 1])
% keep Y axis scaling
13.3 Synthesizing Complex Sounds
(ignore textbook)
• Any sound can (in principle) be expressed as the
sum of a set of pure tones of various frequencies,
amplitudes, and phases.
• People are (arguably) insensitive to phase
distinctions, so we will ignore phase here.
• Consider a sound containing a 500 Hz and a
1200 Hz component at half the amplitude...
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
FS = 10000;
t = linspace(0, 1, FS);
f = 500;
A = 1.0;
Pt = A * sin(2*pi*f*t);
f2 = 1200;
A2 = 0.5;
Pt2 = A2 * sin(2*pi*f2*t);
Pt3 = Pt + Pt2;
plot(t, Pt3), xlim([0 .01])
13.3 Synthesizing Complex Sounds
• More generally, we have the formula
n
P(t) = S Ai sin(2 π fi t + φi )
i=1
• With all φi typically set to zero.
13.4 Transducing and Recording Sound
• Convert sound pressure to voltage, then digitize voltage
into N discrete values in interval [xmin, xmax], by
sampling at frequency Fs.
• This is done by a analog /digital converter.
• Another device must pre-amplify sound to match input
expectations of a/d converter.
• N is typically a power of 2, so we can use bits to
express sampling precision (minimum 8 for decent
quality). This is called quantization.
• For Matlab, xmin, = -1.0, xmax= +1.0
• Various things can go wrong if we don't choose these
values wisely....
13.4 Transducing and Recording Sound
Appropriate preamplification
4
96
64
Voltage
32
0
0
-4
-32
Analog
Digital
-2
0
2
4
6
A/D units
2
-64
8
10
12
-96
14
Preamplification too low
4
0.1
2
0
0
-2
-0.1
-4
0
2
4
6
8
Time (ms)
10
12
14
A/D units
Voltage
Figure 13.5.
A segment of the sound
“OH” transduced to
voltage.
Top: The preamplifier has
been set appropriately so
that the analog voltage
signal takes up a large
fraction of the A/D
voltage range. The
digitized signal closely
resembles the analog
signal even though the
A/D conversion is set to
8 bits.
Bottom: The preamplifier
has been set too low.
Consequently, there is
effectively only about 3
bits of resolution in the
digitized signal; most of
the range is unused.
13.4 Transducing and Recording Sound
Figure 13.6.
Clipping of a signal (right) when the preamplifier has
been set too high, so that the signal is outside of the
−5 to 5 V range of the A/D converter.
13.5 Aliasing and the Sampling
Frequency
• Someone has an alias when they use more than one
name (representation)
• In the world of signals, this means having more than
one representation of an analog signal, because of
inadequate sampling frequency
• Familiar visual aliasing from the movies (when 32
frames per second is too slow)
• Wagon wheel / propeller going backwards
• Scan lines appearing on computer screen
• Inadequate Fs can result in aliasing for sounds too....
13.5 Aliasing and the Sampling
Frequency
13.5 Aliasing and the Sampling
Frequency
1
m=0
m=1
m=2
samples
Amplitude
Figure 13.8.
Aliasing. A set of samples
marked as circles. The three
sine waves plotted are of
different frequencies, but all
pass through the same
samples. The aliased
frequencies are F +m/∆T,
where m is any integer and
∆T is the sampling interval.
The sine waves shown are
m = 0, m = 1, and m = 2.
0
-1
0
1
2
Time ( T)
3
13.5 Aliasing and the Sampling
Frequency
• Nyquist's Theorem tells us that Fs should be at least
twice the maximum frequency Fmax we wish to
reproduce.
• Intuitively, we need two values to represent a single
cycle: one for peak, one for valley:
Aliasing in the Time Domain