Sound and Music for Video Games

Download Report

Transcript Sound and Music for Video Games

Sound and Music for Video
Games
Technology Overview
Roger Crawfis
Ohio State University
Overview
•
•
•
•
Fundamentals of Sound
Psychoacoustics
Interactive Audio
Applications
What is sound?
• Sound is the sensation perceived by the
sense of hearing
• Audio is acoustic, mechanical, or
electrical frequencies corresponding to
normally audible sound waves
Dual Nature of Sound
• Transfer of sound and physical stimulation
of ear
• Physiological and psychological
processing in ear and brain
(psychoacoustics)
Transmission of Sound
• Requires a medium with elasticity and
inertia (air, water, steel, etc.)
• Movements of air molecules result in the
propagation of a sound wave
Longitudinal Motion of Air
Wavefronts and Rays
Reflection of Sound
Absorption of Sound
• Some materials readily absorb the energy
of a sound wave
• Example: carpet, curtains at a movie
theater
Refraction of Sound
Refraction of Sound
Diffusion of Sound
• Not analogous to diffusion of light
• Naturally occurring diffusions of sounds
typically affect only a small subset of
audible frequencies
• Nearly full diffusion of sound requires a
reflection phase grating (Schroeder
Diffuser)
The Inverse-Square Law
(Attenuation)
I
W
2
4r
I is the sound intensity in W/cm^2
W is the sound power of the source in W
r is the distance from the source in cm
The Skull
• Occludes wavelengths “small” relative to
the skull
• Causes diffraction around the head (helps
amplify sounds)
• Wavelengths much larger than the skull
are not affected (explains how low
frequencies are not directional)
The Pinna
Ear Canal and Skull
• (A) Dark line – ear canal only
• (B) Dashed line – ear canal and skull
diffraction
Auditory Area (20Hz-20kHz)
Spatial Hearing
• Ability to determine direction and
distance from a sound source
• Not fully understood process
• However, some cues have been identified
as useful
The “Duplex” Theory of
Localization
• Interaural Intensity Differences (IIDs)
• Interaural Arrival-Time Differences (ITDs)
Interaural Intensity Difference
• The skull produces a sound shadow
• Intensity difference results from one ear being
shadowed and the other not
• The IID does not apply to frequencies below
1000Hz (waves similar or larger than size of
head)
• Sound shadowing can result in up to ~20dB
drops for frequencies >=6000Hz
• The Inverse-Square Law can also effect intensity
Head Rotation or Tilt
• Rotation or tilt can
alter interaural
spectrum in
predictable manner
Interaural Arrival-Time
Difference
• Perception of phase
difference between
ears caused by
arrival-time delay
(ITD)
• Ear closest to sound
source hears the
sound before the
other ear
Digital Sound
• Remember that sound is an analogue
process (like vision).
• Computers need to deal with digital
processes (like digital images).
• Many similar properties between computer
imagery and computer sound processing.
Class or Semantics
• Sample
• Stream
• Music
• Tracks
• MIDI
Sounds
Sound for Games
• Stereo doesn’t cut it anymore – you need positional
audio.
• Positional audio increases immersion
• The Old: Vary volume as position changes
• The New: Head-Related Transfer Functions (HRTF) for
3d positional audio with 2-4 speakers
• Games use:
–
–
–
–
–
Dolby 5.1: requires lots of speakers
Creative’s EAX: “environmental audio”
Aureal’s A3D: good positional audio
DirectSound3D: Microsoft’s answer
OpenAL: open, cross-platform API
Audio Basics
• Has two fundamental physical properties
– Frequency (the pitch of the wave – oscillations per second
(Hertz))
– Amplitude (the loudness or strength of the wave - decibels)
Amplitude
Frequency
Sampling
• A sound wave is “sampled”
– measurements of amplitude taken at a “fast” rate
– results in a stream of numbers
Amplitude
1
0.5
mS Time
5
-0.5
-1
10
15
20
Data Rates for Sound
• Human ear can hear frequencies between ??
and ??.
• Must sample at twice the highest frequency.
–
–
–
–
Assume stereo (two channels)
Assume 44Khz sampling rate (CD sampling rate)
Assume 2 bytes per channel per sample
How much raw data is required to record 3 minutes of
music?
Waveform Sampling:
Quantization
• Quantization
• Introduces
•
Noise
• Examples: 16, 12, 8, 6, 4 bit music
•
16, 12, 8, 6, 4 bit speech
Limits of Human Hearing
• Time and Frequency
Events longer than 0.03 seconds are resolvable in time
shorter events are perceived as features in frequency
20 Hz. < Human Hearing < 20 KHz.
(for those under 15 or so)
“Pitch” is PERCEPTION related to FREQUENCY
Human Pitch Resolution is about 40 - 4000 Hz.
Limits of Human
Hearing
•
Amplitude or Power???
– “Loudness” is PERCEPTION related to POWER,
not AMPLITUDE
– Power is proportional to (integrated) square of signal
– Human Loudness perception range is about 120 dB,
where +10 db
= 10 x power
= 20 x amplitude
– Waveform shape is of little consequence. Energy
at each frequency, and how that changes in time,
is the most important feature of a sound.
Limits of Human Hearing
• Waveshape or Frequency Content??
– Here are two waveforms with identical power spectra, and which
are (nearly) perceptually identical:
Wave 1
Wave 2
Magnitude
Spectrum
Limits of Human Hearing
Masking in Amplitude, Time, and Frequency
– Masking in Amplitude: Loud sounds ‘mask’ soft ones.
Example: Quantization Noise
– Masking in time: A soft sound just before a louder
sound is more likely to be heard than if it is just after.
Example (and reason): Reverb vs. “Preverb”
– Masking in Frequency: Loud ‘neighbor’ frequency
masks soft spectral components. Low sounds
mask higher ones more than high masking low.
Limits of Human Hearing
• Masking in Amplitude
• Intuitively, a soft sound will not be heard if
there is a competing loud sound. Reasons:
– Gain controls in the ear
stapedes reflex and more
– Interaction (inhibition) in the cochlea
– Other mechanisms at higher levels
Limits of Human Hearing
• Masking in Time
– In the time range of a few milliseconds:
– A soft event following a louder event tends to
be grouped perceptually as part of that louder
event
– If the soft event precedes the louder event, it
might be heard as a separate event (become
audible)
Limits of Human Hearing
• Masking in Frequency
Only one component in this spectrum is
audible because of frequency masking
Sampling Rates
• For Cheap Compression, Look at
Lowering the Sampling
Rate First
•
44.1kHz 16 bit = CD Quality
•
8kHz 8 bit MuLaw = Phone Quality
• Examples:
•
•
Music: 44.1, 32, 22.05, 16, 11.025kHz
Speech: 44.1, 32, 22.05, 16, 11.025, 8kHz
Views of Digital Sound
• Two (mainstream) views of sound
and their implications for compression
• 1) Sound is Perceived
– The auditory system doesn’t hear everything present
– Bandwidth is limited
– Time resolution is limited
– Masking in all domains
• 2) Sound is Produced
– “Perfect” model could provide perfect compression
Production Models
• Build a model of the sound production system,
then fit the parameters
– Example: If signal is speech, then a wellparameterized vocal model can yield highest quality
and compression ratio
• Benefits:
Highest possible compression
• Drawbacks: Signal source(s) must be
assumed, known, or identified
MIDI and Other ‘Event’ Models
• Musical Instrument Digital Interface
– Represents Music as Notes and Events
– and uses a synthesis engine to “render” it.
• An Edit Decision List (EDL) is another
example.
– A history of source materials, transformations, and
processing steps is kept. Operations can be
undone or recreated easily. Intermediate nonparametric files are not saved.
Event Based Compression
• A Musical Score is a very compact
representation of music
• Benefits:
– Highest possible compression
• Drawbacks:
– Cannot guarantee the “performance”
– Cannot assure the quality of the sounds
– Cannot make arbitrary sounds
Event Based Compression
• Enter General MIDI
– Guarantees a base set of instrument sounds,
– and a means for addressing them,
– but doesn’t guarantee any quality
• Better Yet, Downloadable Sounds
– Download samples for instruments
– Benefits:
Does more to guarantee quality
– Drawbacks: Samples aren’t reality
Event Based Compression
• Downloadable Algorithms
– Specify the algorithm, the synthesis engine runs it,
and we just send parameter changes
– Part of “Structured Audio” (MPEG4)
• Benefits:
– Can upgrade algorithms later
– Can implement scalable synthesis
• Drawbacks:
– Different algorithm for each class of sounds
(but can always fall back on samples)
Compressed Audio Formats
Name
Extension
Ownership
AIFF (Mac)
.aif, .aiff
Public
AU (Sun/Next)
.au
Public
CD audio (CDDA)
N/A
Public
MP3
.mp3
MPEG Audio Layer-III
Windows Media Audio
.wma
Proprietary (Microsoft)
QuickTime
.qt
Proprietary (Apple)
RealAudio
.ra, ram
Proprietary (Real Networks)
WAV
.wav
Public
To be continued …
• Stop here
• Sound Group Technical Presentations.
• Suggested Topics:
– Compression
– Controlling the Environment
– ToolKit I features
– ToolKit II features
– Examples and Demos
Environmental Effects
•
•
•
•
Obstruction/Occlusion
Reverberation
Doppler Shift
Atmospheric Effects
Obstruction
• Same as sound shadowing
• Generally approximated by a ray test and
a low pass filter
• High frequencies should get shadowed
while low frequencies diffract
Obstruction
Occlusion
• A completely blocked sound
• Example: A sound that penetrates a closed
door or a wall
• The sound will be muffled (low pass filter)
Reverberation
•
•
•
•
Effects from sound reflection
Similar to echo
Static reverberation
Dynamic reverberation
Static Reverberation
• Relies on the “closed container”
assumption
• Parameters used to specify approximate
environment conditions (decay, room size,
etc.)
• Example: Microsoft DirectSound3D EAX
Static Reverberation
Dynamic Reverberation
• Calculation of reflections off of surfaces
taking into account surface properties
• Typically diffusion and diffraction ignored
• “Wave Tracing”
• Example: Aureal A3D 2.0
Dynamic Reverberation
Comparison
• Static Reverberation less expensive
computationally, simple to implement
• Dynamic Reverberation very expensive
computationally, difficult to implement, but
potentially superior results
Doppler Shift
•
•
•
•
Change in frequency due to velocity
Very susceptible to temporal aliasing
The faster the update rate the better
Requires dedicated hardware
Atmospheric Effects
• Attenuate high frequencies faster than low
frequencies
• Moisture in air increases this effect