Transcript MIDI

Chapter 12
Sound
Multimedia Systems
Key Points
Sound is a complex mixture of physical
and psychological factors, which is difficult
to model accurately.
 Sounds can be characterized by their
waveforms, which plot amplitude against
time.
 CD quality sound is sampled at 44.1 kHz,
using a sample size of 16 bits. Multimedia
productions may have to use lower
sampling rates and smaller sample sizes.

Key Points
The quality of digitized sound can be
improved by dithering — adding a small
quantity of noise to randomize the
quantization error.
 Software can provide the functions of a
recording studio, including multi-track
recording, mixing and effects, on a
desktop computer.
 The most vexatious aspect of recording is
getting the levels right.
 Audio filters are used to remove noise
and unwanted frequency components.

Key Points



Digital versions of established effects, such as
reverb and envelope shaping are used to alter
the quality of sounds. Digital technology permits
new kinds of alteration, including time
stretching and pitch alteration.
Speech data can be compressed using
established technology, including µ-law and Alaw companding and ADPCM.
MPEG-1 Layer 3 audio (MP3) is a lossy method
of audio compression that uses a psychoacoustical model to determine which
information to discard.
Key Points
Each of the three major platforms has its
own sound file format: AIFF for MacOS,
WAV for Windows, and AU for Unix.
RealAudio is used for streaming audio.
 MIDI (The Musical Instruments Digital
Interface) provides a standard for
controlling digital instruments and
communicating between them and
computers running sequencer programs.
 When sound is combined with video,
synchronization must be established and
maintained.

The Nature of Sound

All sounds are produced by the conversion of
energy into vibrations in the air or some other
elastic medium



ex: tuning forks (音叉) and guitars
A good tuning fork produces the clean tines at
a single frequency, most other sound sources
vibrate in more complicated ways.
A single note is composed of several
components at frequencies that are multiplies
of fundamental pitch of the note.
Harmonic



The spectrum of a single note from a musical
instrument usually has a set of peaks at
(approximately) harmonic ratios.
That is, if the fundamental frequency is f, there
are peaks at f, and also at (about) 2f, 3f, 4f, etc.
The pitch of a note refers to the fundamental
frequency with which the source of the tone
resonates.
Frequency Spectrum

Percussive sounds and most natural
sounds do not even have a single
identifiable fundamental frequency, but
can still be decomposed into a collection
of frequency components.

Frequency spectrum: relative amplitudes of its
frequency components
The Nature of Sound

The human ear is able to detect
frequencies in the range between 20 Hz
and 20 kHz

Upper limit decreases with increasing age
We can display the waveform of any
sound by plotting its amplitude against
time
 Figs. 12.1-7
some waveforms for a range of types of
sound

Speech




Speaker repeats “Feisty teenager” twice, then a
more distance responds.
The second time faster and with more emphasis
Record in open air and there is background
noise.
Compress speech: removing the silences
Feisty teenager
Instruments

Figs. 12.2-5
Didgeridoo
Boogie-woogie
Violin, cello and piano
Men grow cold...
Water sounds
A trickling stream
The sea
Stereophony
One of the most useful illusions in sound
perception is stereophony.
 Brain identifies the source of a sound on
the basis of the differences in intensity and
phrase between the signals received from
the left and right ears.

Digitizing Sound

Sampling

The selection of the sampling rate
 If
limiting of hearing is 20 kHz, a minimum rate of
40 kHz is required by the Sampling Theorem.
 The sampling rate of audio CDs is 44.1 kHz
 22.05 kHz is commonly used for Internet
11.025 kHz for speech
 DAT (digital audio tape): 48 kHz
Sampling

How does sampling work in computer system
 Sound
card
 Digital audio inputs are uncommon
 Analog line output of DAT or CD is re-digitalized by
sound card
 Incompatible rate: re-sampling
 It’s called jitter that the intervals between samples
drift
Sampling

If sampling rate = 40 kHz, the inaudible
components will manifest as aliasing when
signal is reconstructed.

A filter is used to remove any frequencies than half
the sampling rate before the signal is sampled.
Digitizing Sound

Quantization

It’s usually 65536 quantization levels for CD
audio
 16

bits
Undersampling a pure sine wave
 An
analogue signal will be coarsely approximated
by samples that jump between just a few quantized
values

Dithering
 When
a small amount of random noise is added to
the analogue signal before sampling
Quantization
Undersampling a pure sine wave
Dithering
Dithering
Dithering

Sampling and dithering on frequency spectrum
Processing Sound
Modern multi-track recording studio
 There is presently no single sound
application that has the de facto status.
 MIDI sequencing
 Multi-track recording
 Video editing packages include some
integrated sound editing and processing
facilities.

Recording and Importing Sound



Sampling rate and sampling size
If level of signal is too low, then
resulting recording will be quiet.
If level is too high, clipping will occur.



Fig. 12.10
Gain control can be used to alter level.
Automatic gain control
Sound Editing and Effects



Interface: timeline
Tracks
Creation of loops



Very short loops are needed to create voices for the
electric musical instruments known as samplers.
Longer loops are used in certain styles of dance
music
Post-production



Correct defects, enhance quality, modify their
character.
Premiere’s effects plug-in format is widely used.
Professional level: Cubase VST, DigiDesign ProTools
Removal of unwanted noise
 Noise gate

Eliminates all samples whose value falls
below a specified threshold
 Specify a minimum time that must elapse
before a sequence of low amplitude samples
counts as a silence and a similar limit before a
sequence whose values exceed the threshold
counts as sound.
 This prevents the gate being turned on or off
by transient glitches (短暫的電磁波干擾).

Noise Gate

Since noise gate has no effect on speaker’s
words, the background noise will cut in and out
as he speaks.




Noise combined with signal
Noise gate: all-or-nothing filtering
Low-pass, high-pass, notch filters
Specialized filters


de-esser: remove the sibilance (絲絲聲) that results
from speaking or singing into microphone placed too
close to performer
Click repairer

Remove clicks from recording taken from damaged or dirty
vinyl records.

Single effect may be used in different ways
depending on values of parameters

Reverb effect
 Small
delay and low reflectivity: inside a small
room
 Longer reverb times: concert hall or stadium
Graphic Equalization

Transforms spectrum of a sound using a bank
of filters, each controlled by its own slider and
each affecting a fairly narrow band of
frequencies.
Envelope Shaping



Changing outline of a waveform
Allow user to draw a new envelope around the
waveform, altering its attack and decay and
introducing arbitrary fluctuations of amplitude.
Fader: a specialized versions of envelope
shaping


Volume to be gradually increased and decreased
Tremolo (顫音)

Cause the amplitude to oscillate periodically from zero to its
maximum value

Time stretching and pitch alteration are
two closely related effects
Analogue recordings can only be achieved by
altering speed at which it is played back, and
this alters the pitch.
 With digital sound, the duration can be
changed without altering the pitch by inserting
or removing samples.

 The

pitch can be altered without affecting duration
Time stretching required when sound is being
synchronized to video or another sound.
Compression
3 minutes, stereo: 25 MBytes
 Huffman coding
 Run-length coding: silence

Speech Compression

Telephone companies, 1960s







Companding: compressing/expanding
non-linear quantization: Fig. 12.11
G.711: -law, North America and Japan, SUN
A-law
ADPCM, adaptive differential pulse code
modulation
Differential pulse code modulation
Linear Predictive Coding

Mathematical model of state of vocal tract as its
representation of speech

2.4 kbps, machine-like quality
Perceptually Based Compression

Threshold of hearing
minimum level at which a sound can be heard



Fig. 12.12, the threshold of hearing
Very low or high frequency sound must be much
louder than a mid-range tone to be heard.
Phycho-acoustical model



Mathematical description of aspects of the way the
ear and brain perceive sounds
Loud tones can obscure softer tones that occur at the
same time
Depends on the relative frequencies of the two tones
Masking

A modification of threshold of hearing curve in region of a
loud tone





Fig.12.13, the threshold is raised in neighborhood of masking
tone
The raised portion, or masking curve is non-linear, and
asymmetrical, raising faster than it falls
Any sound that lies within the masking curve will be inaudible,
even though it raises above the unmodified threshold of hearing.
Because masking hides noise as well as some components of
the signal, quantization noise can be masked.
Where a masking sound is present, the signal can be quantized
relatively coarsely, using fewer bits than would otherwise be
needed, because the resulting quantization noise can be hidden
under the masking curve.
Compression
Use a bank of filters to split signal into
bands of frequencies; 32 bands are
commonly used.
 The average signal level in each band is
calculated, and using these values and a
psycho-acoustical model, a masking level
for each band is computed.

MPEG Audio
3 layers
 Layer 1: 192 kbps for each channel
Layer 2: 128 kbps for each channel
Layer 3: 64 kbps for each channel
 MP3 = MPEG-1 Layer 3
compression rate = 10:1

Formats

AIFF for MacOS
WAV for Windows
AU for Unix
Each can store audio data at a variety of
commonly used sampling rates and sample
sizes.
 Each supports uncompressed or compressed
data with a range of compressors.

Streaming Audio
Sound is delivered over a network and
played as it arrives without having to be
stored on user’s machine first.
 Because of lower bandwidth required by
audio, streaming is more successful for
sound than it is for video.
 Real Networks’ RealAudio
 Streaming QuickTime
 Play on demand

MIDI
The Musical Instruments Digital Interface
 Standard protocol for communicating
between electronic instruments, such as
synthesizers, sampler, and drum machines.
 MIDI allowed instruments to be controlled
automatically by devices that could be
programmed to send out sequences of
MIDI instructions.

MIDI Messages




An instruction that controls some aspect of the
performance of an instrument
Status byte= type of message
one or two bytes giving the values of parameters
Note On, Note Off, Key Pressure
Running status
MIDI data is transmitted using a 10-bit packet that includes a start and stop bit
The MIDI message Note On is followed by two data bytes, as is the Note Off message.
General MIDI and QuickTime
General MIDI specifies 128 standard
voices, Table 12.1
 Drum machine and percussion samplers



There is no guarantee that identical
sounds will be generated for each name
by different instruments.


Drum kits, Table 12.2
A good sampler may use high quality samples
of the corresponding real instruments.
QuickTime : MIDI-like functionality
MIDI Software

MIDI sequencing programs





Capture and editing functions equivalent to those of
video editing software.
Multiple tracks
Composition
Music can be captured as it is palyed from MIDI
controllers attached to a computer via a MIDI
interface.
Punch in

The start and end point of a defective passage are
marked, the sequencer starts playing before the
beginning, and then switches to record mode,
allowing a new version of the passage to be recorded
to replace the original.
Sequencers




Quantize tempo during recording, fitting the
length of notes to exact sixteenth notes, or
eighth note triplets, or whatever duration is
specified.
Most programs allow music to be entered using
classical music notation.
Printed sheet music to be scanned and will
perform optical character recognition to
transform the music into MIDI.
The opposite transformation, from MIDI to a
printed score, is also often provided, enabling
transcriptions of performed music to be made
automatically.
Piano-roll interface, Fig. 12.14
 Major limitations of MIDI


Impossibility of representing vocals
MIDI can be transformed into audio.
 Reverse transformation is sometimes
supported, although it is more difficult to
implement.

Computer Sequencing Software
Music Notation Software
Combing Sound and Picture



Voice-overs should match the picture they describe,
music will often be related to edits, and natural sounds
will be associated with events on screen.
Synchronization, timecode
If sound and video are physically independently,
synchronization will sometimes be lost.


Audio and video data streams must carry the equivalent of
timecode, so that their synchronization can be checked.
Audio and video play from local hard disk


For short clips, it is possible to load the entire sound track into
memory before playback begins.
This is impractical for movies. Fore these, it is normal to
interleave the audio and video.