Digital Media

Download Report

Transcript Digital Media

Digital Sound
Dr. Kairui Chen
GGC
What is sound?
• Conversion of energy into vibrations in the air
or some other elastic medium
– Vocal chords
– Tuning fork
– Guitar strings
Waveforms
• Sounds change over time (or sound is a
function of time)
– e.g. speech changes constantly
• Frequency spectrum – relative amplitudes of
the frequency components
– alters as sound changes
• Waveform is a plot of amplitude against time
– Provides a graphical view of characteristics of a
changing sound
– Can identify syllables of speech, rhythm of music,
quiet and loud passages, etc
Frequency of Sound Wave
• Refers to the number of complete back-andforth cycles of vibrational motion of the
medium particles per unit of time
• Unit for frequency: Hz (Hertz)
• 1 Hz = 1 cycle/second
4
Frequency
Suppose it is1 second
a cycle
a cycle
Frequency = 2 Hz (i.e., 2 cycles/second)
5
Frequency
Suppose it is1 second
a cycle
a cycle
a cycle
a cycle
Frequency = 4 Hz (i.e., 4 cycles/second)
Higher frequency than the previous waveform.
6
Frequency
• Sound frequency often referred to as pitch of
the sound.
– Higher pitch -> higher frequency
– Lower pitch -> lower frequency
• Range of human hearing: roughly 20Hz–
20kHz, varies from person to person and falls
as we age
Sound Intensity
• Sound intensity:
– an objective measurement
– can be measured with auditory devices
– in decibels (dB)
• 0 dB:
– Threshold of hearing
– minimum sound pressure level at which humans can hear a
sound at a given frequency
– does NOT mean zero sound intensity
– does NOT mean absence of sound wave
• about 120 dB:
– threshold of pain
– sound intensity that is 1012 times greater than 0 dB
8
A Single Tone Sound: A Simple Sine
Wave Waveform
A sinlge sine wave waveform
A single tone
9
Adding Sound Waves
Most sound sources vibrate in complex ways leading to sounds with components at
several different frequencies.
A sinlge sine wave waveform
A single tone
A second sinlge sine wave waveform
A second single tone
A more complex waveform
A more complex sound
10
Digitizing Sound
Suppose we want to digitize this sound wave:
11
Effects of Sampling Rate
original waveform
sampling rate = 10 Hz
sampling rate = 20 Hz
12
Effects of Sampling Rate
Higher sampling rate:
• The reconstructed wave looks closer to the
original wave;
• More sample points, more data to record, and
thus larger file size;
13
Estimate Thresholds of Sampling
Rate Based on Human Hearing
Let's consider these two factors:
1. Human hearing range
2. A rule called Nyquist's theorem
14
Nyquist Theorem
We must sample at least 2 points in each sound
wave cycle to be able to reconstruct the sound
wave satisfactorily.
Sampling rate of the audio  twice of the
audio frequency (called a Nyquist rate)
Sampling rate of the audio is higher for audio
with higher pitch
15
Choosing Sampling Rate: Example 1
If we consider human ear's most sensitive range of frequency
(2,000 Hz to 5,000 Hz), then what is the lowest sampling
rate may be used that still satisfies the Nyquist Theorem?
A.
B.
C.
D.
E.
F.
11,025 Hz AM Radio Quality/Speech
22,050 Hz Near FM Radio Quality (high-end multimedia)
44,100 Hz CD Quality
48,000 Hz DAT (digital audio tape) Quality
96,000 Hz DVD-Audio Quality
192,000 Hz DVD-Audio Quality
16
Choosing Sampling Rate: Example 2
Given the human hearing range (20 Hz to 20,000
Hz) and Nyquist Theorem, why do you think
the sampling rate (44,100 Hz) for the CDquality audio is reasonable?
17
Sampling Rate Examples
• 11,025 Hz AM Radio Quality/Speech
• 22,050 Hz Near FM Radio Quality (high-end
multimedia)
• 44,100 Hz CD Quality
• 48,000 Hz DAT (digital audio tape) Quality
• 96,000 Hz DVD-Audio Quality
• 192,000 Hz DVD-Audio Quality
18
Digitization: Quantization
• Each of the discrete samples of amplitude values obtained from the
sampling step are mapped and rounded to the nearest value on a
scale of discrete levels.
• The number of levels in the scale is expressed in bit depth--the
power of 2.
–
–
–
–
More levels: more accurate mapping, better quality, but larger file size
Less levels: less accurate mapping, worse quality, but smaller file size
Bit depth of a digital audio is also referred to as resolution.
For digital audio, higher resolution means higher bit depth.
• An 8-bit audio allows 28 = 256 possible levels in the scale
– only use if some distortion is acceptable, e.g. voice communication
• CD-quality audio is 16-bit (i.e., 216 = 65,536 possible levels)
19
Digital Audio File Size
• File size of uncompressed digital audio is
determined by:
– Sampling rate (r);
– Bit depth (s);
– Number of channels;
• Mono: single channel;
• Stereo: two channels;
• Multiple channels;
– Duration of the audio in seconds (t);
Let's estimate the file size of a 1minute CD-quality audio file
21
1-minute CD Qualtiy Audio
• Sampling rate = 44100 Hz
(i.e., 44,100 samples/second)
• Bit depth = 16
(i.e., 16 bits/sample)
• Stereo
(i.e., 2 channels: left and right channels)
22
File Size of 1-min CD-quality Audio
• 1 minute = 60 seconds
• Total number of samples
= 60 seconds  44,100 samples/second
= 2,646,000 samples
• Total number of bits required for these many samples
= 2,646,000 samples  16 bits/sample
= 42,336,000 bits
This is for one channel.
• Total bits for two channels
= 42,336,000 bits/channel  2 channels
= 84,672,000 bits
23
File Size of 1-min CD-quality Audio
84,672,000 bits
= 84,672,000 bits / (8 bits/byte)
= 10,584,000 bytes
= 10,584,000 bytes / (1024 bytes/KB)
 10336 KB
= 10336 KB / (1024 KB/MB)
 10 MB
24
General Strategies to Reduce Digital
Media File Size
• Reduce sampling rate
• Reduce bit depth
• Apply compression
• For digital audio, these can also be options:
– reducing the number of channels
– shorten the length of the audio
25
Reduce Sampling Rate
• Sacrifices the fidelity of the digitized audio
• Need to weigh the quality against the file size
• Need to consider:
– human perception of the audio
(e.g., How perceptibe is the audio with lower
sampling rate?)
– how the audio is used
• music: may need higher sampling rate
• short sound clips such as explosion and looping ambient
background noise: may work well with lower sampling rate
26
Effect of Sampling Rate on File Size
File size = duration  sampling rate  bit depth 
number of channels
• File size is reduced in the same proportion as
the reduction of the sampling rate
• Example: Reducing the sampling rate from
44,100 Hz to 22,050 Hz will reduce the file size
by half.
27
Effect of Bit Depth on File Size
File size = duration  sampling rate  bit depth 
number of channels
• File size is reduced in the same proportion as
the reduction of the bit depth
• Example: Reducing the bit depth from 16-bit
to 8-bit will reduce the file size by half.
28
Most Common Choices of Bit Depth
• 8-bit
– usually sufficient for speech
– in general, too low for music
• 16-bit
– minimal bit depth for music
• 24-bit
• 32-bit
29
Effect of Number of Channels on File Size
File size = duration  sampling rate  bit depth 
number of channels
• File size is reduced in the same proportion as
the reduction of the number of channels
• Example: Reducing the number of channels
from 2 (stereo) to 1 (mono) will reduce the file
size by half.
30
Digital Sound Editing
• Software: Audacity (tutorial and hands on
activity will be given in class).
• Timeline divided into tracks
• Sound on each track displayed as a waveform
• 'Scrub' over part of a track e.g. to find pauses
• Cut and paste, drag and drop
• May combine many tracks from different
recordings (mix-down)
Effects and Filters
• Noise gate: remove hiss from music
• Low pass and high pass filters
• Notch filter: removes a single narrow frequency
band
• De-esser: removes the sibilance
• Click repairer: removes clicks from recordings
taken from old vinyl records
• Reverb: echo effect
• etc
Audio File Compression
• Lossless
• Lossy
– gets rid of some data, but human perception is
taken into consideration so that the data removed
causes the least noticeable distortion
– e.g. MP3 (good compression rate while preserving
the perceivably high quality of the audio)
33
Compression
• In general, lossy methods required because of
complex and unpredictable nature of audio data
• CD quality, stereo, 3-minute song requires over
25 Mbytes
– Data rate exceeds bandwidth of dial-up Internet
connection
• Difference in the way we perceive sound and
image means different approach from image
compression is needed
Companding
• Non-linear quantization
• Higher quantization levels
spaced further apart than
lower ones
• Quiet sounds represented
in greater detail than loud
ones
ADPCM
• Differential Pulse Code Modulation
– Similar to video inter-frame compression
– Compute a predicted value for next sample, store
the difference between prediction and actual
value
• Adaptive Differential Pulse Code Modulation
– Dynamically vary step size used to store quantized
differences
Perceptually-Based
Compression
• Identify and discard data that doesn't affect
the perception of the signal
– Needs a psycho-acoustical model, since ear and
brain do not respond to sound waves in a
simple way
• Threshold of hearing – sounds too quiet to
hear
• Masking – sound obscured by some other
sound
The Threshold of Hearing
Masking
Compression Algorithm
• Split signal into bands of frequencies using filters
– Commonly use 32 bands
• Compute masking level for each band, based on
its average value and a psycho-acoustical model
– i.e. approximate masking curve by a single value for
each band
• Discard signal if it is below masking level
• Otherwise quantize using the minimum number
of bits that will mask quantization noise
MP3
• MPEG Audio, Layer 3
• Three layers of audio compression in MPEG-1
(MPEG-2 essentially identical)
• Layer 1...Layer 3, encoding proces increases in
complexity, data rate for same quality
decreases
– e.g. Same quality 192kbps at Layer 1, 128kbps at
Layer 2, 64kbps at Layer 3
• 10:1 compression ratio at high quality
AAC
• Advanced Audio Coding
• Defined in MPEG-2 standard, extended and
incorporated into MPEG-4
• Not backward compatible with earlier
standards
• Higher compression ratios and lower bit rates
than MP3
• Subjectively better quality than MP3 at the
same bit rate
Audio Formats
• Platform-specific file formats
– AIFF (mac), WAV (windows), AU (unix)
• Multimedia formats used as 'container
formats' for sound compressed with different
codecs
– QuickTime, Windows Media, RealAudio
• MP3 has its own file format, but MP3 data can
be included as audio tracks in QuickTime
movies and SWFs
MIDI
• Musical Instruments Digital Interface
• Instructions about how to produce music, which can be
interpreted by suitable hardware and/or software
– cf. vector graphics as drawing instructions
• Standard protocol for communicating between
electronic instruments (synthesizers, samplers, drum
machines)
• Allows instruments to be controlled by hardware or
software sequencers
MIDI and Computers
• MIDI interface allows computer to send MIDI data
to instruments
• Store MIDI sequences in files, exchange them
between computers, incorporate into multimedia
• Computer can synthesize sounds on a sound card,
or play back samples from disk in response to
MIDI instructions
– Computer becomes primitive musical instrument
(quality of sound inferior to dedicated instruments)
MIDI Messages
• Instructions that control some aspect of the
performance of an instrument
• Status byte – indicates type of message
• 2 data bytes – values of parameters
– e.g. Note On + note number (0..127) + key velocity
• Running status – omit status byte if it is the
same as preceding one
Common Audio File Types
File Type
Acronym For
.wav
Originally
Created By
File Info &
Compression
Platforms
IBM
Microsoft
compressed,
uncompressed
Windows
.mp3
MPEG audio layer 3
Moving Pictures
Experts Group
Good compression
rate with
perceivably
high quality sound
Cross-platform
.mov
QuickTime movie
Apple
• Not just for video
• supports audio
track and a MIDI
track
• a variety of sound
compressors
• files can be
streamed
• "Fast Start"
technology
Cross-platform;
requires QuickTime
player
47
Common Audio File Types
File Type
Acronym For
Originally
Created By
File Info &
Compression
Platforms
.aiff
Audio Interchange
File Format
Apple
compressed,
uncompressed
Mac, Windows
Sun
compressed
Sun, Unix, Linux
compressed; can be
streamed with Real
Server
Cross-platform;
requires Real player
.au
.snd
.ra
.rm
Real Audio
Real Systems
.wma
Window Media
Audio
Microsoft
48
Choosing an Audio File Type
Determined by the intended use
• File size limitation
• Intended audience
• Whether as a source file
49
File Size Limitations
• Is your audio used on the Web?
– file types that offer high compression
– streaming audio file types
50
Intended Audience
• What is the equipment that your audience will
use to listen to your audio?
• If they are listening on computers, what are
their operating systems?
– cross-platform vs. single platform
51
Whether as a Source File
If you are keeping the file for future editing,
choose a file type:
• uncompressed
• allows lossless compression
52
References
• 1. Digital Multimedia, Nigel Chapman and
Jenny Chapman.
• 2. Digital Media Primer, Yue-Ling Wong.