Transcript Sound

Sound
you tube
Some from Heim Chap 13
Learning outcomes
• Describe the basics of human hearing
• Explain the difference between visual and auditory interaction
• Describe the classes and subclasses of sound output and the
attributes of each
• Describe the classes and subclass of sound input and
recognition and attributes of each
2
Hearing
• Provides information about environment:
distances, directions, objects etc.
– protects inner and amplifies sound
– transmits sound waves as
• inner ear
vibrations to inner ear
– chemical transmitters are released
and cause impulses in auditory nerve
• Sound
• pitch
• loudness
• timbre
– sound frequency
– amplitude
– type or quality
the human 1
• outer ear
• middle ear
3
• Physical apparatus:
Sound is vibration
1-4
http://www.hsc.csu.edu.au/ipt/mm_systems/3288/digitising_sound_answers.htm
Timbre is harmonic structure
• A sine wave is all energy on the ‘first harmonic’ or
‘fundamental’ frequency (sounds like O)
• Other shapes of sound wave come from a distribution of energy
into other multiples of the fundamental
http://hyperphysics.phy-astr.gsu.edu/hbase/audio/geowv.html
1-5
http://www.sfu.ca/sonic-studio/handbook/Triangle_Wave.html
Hearing (cont)
• Humans can hear frequencies from 20Hz to 15kHz
• can attend to sounds over background noise.
• for example, the cocktail party phenomenon.
• Hearing aids disrupt this filtering
• Hearing is involuntary
• A sudden ‘grabs’ attention before we think
• And some sounds are harder to ignore (e.g. baby crying)
• ‘Listening’ is voluntary (largely)
• Whether we choose to process the meaning, especially if the
sound is language (although something like hearing your name is
pretty well involuntary)
the human 1
• Auditory system filters sounds
6
• less accurate distinguishing high frequencies than low.
• Higher frequencies disappear as you get older
What if….
• You are in a noisy environment
• Night clubbing
the human 1
• Your hearing is below average
• You are deaf
7
• Phone call/ text message?
Sound versus Visual
Sound exists in time and over space,
vision exists in space and over time.
(Gaver, 1989)
- Sound is only there when it is
playing/made
- Vision is there until it is replaced
8
Sound Interaction
• Computer Output/Generation (input to human)
• Non speech
• Music
• Audio Icons and Earcons
• Speech
• Computer Input/Recognition
• Speech
• Non speech
• Environmental
• Music
9
Computer Output: Music
• Can be pre-recorded or generated
• Movies
• Games
• Immersive experiences
• Activates your brain in a different way
from language
• Acts almost entirely independently from
hand-to-eye processing
10
Generating music
• Exciting area for artists
• Everything from pseudo real to completely abstract
• There are Jazz music generators that only skilled people can
differentiate from actual musicians.
• Serato – dj software (www.serato.com)
• Auckland company doing fantastic things
• Several UOA grads
there
11
Auditory Icons and Earcons
• The difference between these two is subtle
• Auditory icons: emphasis on ‘natural’ sounds and metaphor with
real world
• e.g. sound of filling a bottle with water to match moving a
large file
• Earcons: ‘Artificial’ sounds (generated)
• e.g. more abstract metaphorical relationship to action or
purely a convention (like corporate colour schemes)
Windows hardware
fail
insert
remove
12
Auditory Icons and Earcons
• Redundant Encoding
• It aids memory by adding additional associations.
• Can alert without interrupting (well, at least leaves the visual field
clear)
• An alterative communications channel.
• Positive/Negative Feedback
• Auditory alarms might be crucial to the safe operation of
computer-operated machinery or mission-critical environments
• Too many alarms
• Annoying
• Ignored
13
Using Sound in Interaction Design
• Learnability of the mapping between the icon and the
object represented
• “Oink” and “bow wow” have high articulatory
directness (low distance between ‘appearance’ and
function [or denotation])
• A swishing sound accompanying a paintbrush tool also
has high articulatory directness
• A system beep, on the other hand, carries no
information about what it denotes (but we may
quickly learn to associate it with an error; and the
square wave structure is a bit toward unpleasant, so
it’s better for an error than feedback on success)
14
Can you remember earcons?
• How many?
• How often do you hear them?
• Can you intuitively tell what these mean?
On
Off
Sleep
15
Misrecognized
Disambiguate
Speech Output
• Eyes free operation
• Alternative output channel
• Good for checking your essays
• Navigation is hard
• Back tracking,
• Finding location of a particular thing
16
Speech Output
• Recorded
• Menu choices for telephone systems
• Books or other multimedia experiences
• Generated (‘text-to-speech’, TTS)
• Synthesizer built into Office
• See http://office.microsoft.com/en-nz/powerpoint-help/using-the-speaktext-to-speech-feature-HA102066711.aspx
• Google Translate has a nice one too (better, I think)
• Can give pronunciation rules (the Google one sounds
British to me, see also http://www.bell-labs.com/project/tts/sable.html)
• Still sound a little artificial
• Best synthesizers have a physical model of the tongue and
breath to give natural flow between phonemes
17
Sound Input
•Speech
•Environmental
•Music
18
Speech Recognition
• Two distinct applications:
• Transaction
• Transcription
• Transaction
• Telephone menu systems
• Choose from a limited number of options, works ok
• Automatic speech recognition (ASR)
• Built into operating systems
• Siri (iPhone) and Android are ~~ usable
• This is a triumph of Artificial Intelligence
• Very difficult, ongoing research problem
• Not just about recognizing phonemes but also finding the ‘right’
interpretation (helped e.g. by statistical word triple frequencies,
but better if AI is ‘deeper’)
19
Searching Speech and Audio
• Sound files do not afford easy opportunities for indexing and
searching
• Speech recognition can be used to transcribe speech files and
create transcripts that can be searched like any other text file
• So long as recognition accuracy is ok, which it isn't at the moment
• Tune identification apps
• Hum a bit of the tune and it tells you what it is! (e.g. Soundhog)
20
Summary
• Describe the basics of human hearing
• Explain the difference between visual and auditory interaction
• Sound is transitory
• Describe the classes and subclasses of sound output and the attributes of each
• Non speech
• Music
• Earcons
• Speech
• Describe the classes and subclass of sound input and recognition and attributes
of each
• Speech
• Transaction
• Transcription
21