pcpandey_iiita_09mar.. - EE-IITB

Download Report

Transcript pcpandey_iiita_09mar.. - EE-IITB

Second International Conference on Intelligent Interactive Technologies and
Multimedia (IITM 2013), 09-11 March 2013, Allahabad, India
Speech Processing for Persons with
Moderate Sensorineural Hearing Impairment
Prem C. Pandey
EE Dept., IIT Bombay
pcpandey @ ee.iitb.ac.in
www.ee.iitb.ac.in/~pcpandey, www.ee.iitb.ac.in/~spilab
09 March 2013
Outline
A. Speech & Hearing
B. Noise Suppression
S. K. Waddi, P. C. Pandey, N. Tiwari
Speech Enhancement Using Spectral Subtraction and Cascaded
Median Based Noise Estimation for Hearing Impaired Listeners
(Proc. NCC 2013, Delhi, 15-17 Feb. 2013, Paper 3.2_2_1569696063)
C: Reducing the Effect of Increased Spectral Masking
N. Tiwari, P. C. Pandey, P. N. Kulkarni
Real-time Implementation of Multi-band Frequency Compression for
Listeners with Moderate Sensorineural Impairment
(,Proc. Interspeech 2012, Portland, Oregon, 9-13 Sept 2012, Paper 689)
[email protected]
Speech Production Mechanism
Excitation source & filter model
• Excitation: voiced/unvoiced
glottal, frication
• Filtering: vocal tract filter
[email protected]
Speech segments
• Words • Syllables • Phonemes • Sub-phonemic segments
Phonemes: basic speech units
• Vowels: Pure vowels, Diphthongs
• Consonants: Semivowels, Stops, Fricatives, Affricates, Nasals
/aba/
/ada/
/apa/
/aga/
[email protected]
Phonemic features
• Modes of excitation
• Glottal
Unvoiced (aspiration, constriction at the glottis)
Voiced (vibration of vocal chords)
• Frication
Unvoiced (constriction in vocal tract)
Voiced (constriction in vocal tract & glottal vibration)
• Movement of articulators
• Continuant (steady-state vocal tract configuration): vowels, nasal stops, fricatives
• Non-continuant (changing vocal tract): diphthongs, semivowels, oral stops (plosives)
• Place of articulation (place of maximum constriction in vocal tract)
Bilabial, Labio-dental, Linguo-dental, Alveolar, Palatal, Velar, Gluttoral
• Changes in voicing frequency (Fo)
Supra-segmental features
• Intonation • Rhythm
[email protected]
Hearing Mechanism
Peripheral auditory system
• External ear (sound collection)
• Pinna
• Auditory canal
• Middle ear (impedance matching)
• Ear drum
• Middle ear bones
• Inner ear (analysis and transduction): cochlea
• Auditory nerve (transmission of neural impulses)
Central auditory system
Information processing & interpretation
[email protected]
Auditory system
[email protected]
Tonotopic map of cochlea
Hearing impairment
Types of hearing losses
• Conductive loss
• Central loss
• Sensorineural loss
• Functional loss
Sensorineural hearing loss
• Elevated hearing thresholds
Reduced intelligibility as speech components are inaudible
• Reduced dynamic range & loudness recruitment (abnormal loudness growth)
Distortion of loudness relationship among speech components
• Increased temporal masking
Poor detection of acoustic events
• Increased spectral masking (due to widening of auditory filters)
• Reduced frequency selectivity
• Reduced ability to sense spectral shapes of speech sounds
>> Poor intelligibility and degraded perception of speech
[email protected]
Signal processing in hearing aids
Currently available
• Frequency selective amplification
Improves audibility but may not improve intelligibility in presence of noise
• Automatic volume control
• Multichannel dynamic range compression (settable attack time, release time,
and compression ratios)
Compresses the natural dynamic range into the reduced dynamic range
Under Investigation
• Improvement of consonant-to-vowel ratio (CVR): for reducing the effects of
increased temporal masking
• Techniques for reducing the effects of increased spectral masking: Binaural
dichotic presentation, Spectral contrast enhancement, Multi-band frequency
compression
• Noise suppression
[email protected]
 Analog Hearing Aids
Pre-amp → AVC → Selectable Freq. Response → Amp.
 Programmable Digital Hearing Aids
Pre-amp → AVC → Multi-band Amplitude Compression & Freq. Response → Amp.
 Major Problems
•
•
•
•
Noisy environment & reverberation
Distortions due to multiband amplitude compression
Poor speech perception due to increased spectral & temporal masking
Visit to audiologist for change of settings
 Proposed Hearing Aids (with user selectable settings)
Pre-amp → AVC → Noise Suppression
→ Processing for Reducing the Effects of Increased Spectral Masking
→ Processing for Reducing the Effects of Increased Temporal Masking
→ Multi-band Amplitude Compression & Freq. Response → Amp.
[email protected]
Our Research Objectives
 Developing techniques for improving speech perception by listeners
with moderate-to-severe sensorineural loss
• Reduction of effects of increased spectral masking
Binaural aids: Binaural dichotic presentation using comb filters for spectral
splitting
Monoaural aids: Mutiband frequency compression
• Reduction of spectral masking
Enhancement of transient parts (weak & short but perceptually important )
• Noise Suppression
 Implementation of the techniques using a low-power DSP chip for
real-time operation and with acceptable signal delay (< 60 ms)
[email protected]
Our Research Objectives
 Developing techniques for improving speech perception by listeners
with moderate-to-severe sensorineural loss
• Reduction of effects of increased spectral masking
Binaural aids: Binaural dichotic presentation using comb filters for spectral
splitting
Monoaural aids: Mutiband frequency compression
• Reduction of spectral masking
Enhancement of transient parts (weak & short but perceptually important )
• Noise Suppression
 Implementation of the techniques using a low-power DSP chip for
real-time operation and with acceptable signal delay (< 60 ms)
[email protected]
P. C. Pandey (EE Dept, IIT Bombay): "Speech Processing for Persons with Moderate Sensorineural Hearing Impairment",
Plenary talk, Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), 09-11
March 2013, Allahabad, India
Abstract
Our objective is to develop techniques for improving speech perception by listeners with moderate-to-severe
sensorineural loss and to implement these techniques using a low-power DSP chip for real-time operation and with
acceptable signal delay (< 60 ms). Here we present two techniques to reduce the adverse effects of increased spectral
masking associated with sensorimeural loss. The first technique reduces the effects of noise in the listening
environment and the second one reduces the effects of increased intra-speech spectral masking.
A spectral subtraction technique is presented for real-time speech enhancement in the aids used by hearing impaired
listeners. For reducing computational complexity and memory requirement, it uses a cascaded-median based estimation
of the noise spectrum without voice activity detection. The technique is implemented and tested for satisfactory realtime operation, with sampling frequency of 12 kHz, processing using window length of 30 ms with 50% overlap, and
noise estimation by 3-frame 4-stage cascaded-median, on a 16-bit fixed-point DSP processor with on-chip FFT hardware.
Enhancement of speech with different types of additive stationary and non-stationary noise resulted in SNR advantage of
4 – 13 dB.
Widening of auditory filters in persons with sensorineural hearing impairment leads to increased spectral masking and
degraded speech perception. Multi-band frequency compression of the complex spectral samples using pitchsynchronous processing has been reported to increase speech perception by persons with moderate sensorineural loss.
It is shown that implementation of multi-band frequency compression using fixed-frame processing along with leastsquares error based signal estimation reduces the processing delay and the speech output is perceptually similar to that
from pitch-synchronous processing. The processing is implemented on a 16-bit fixed-point DSP processor and real-time
operation is achieved using about one-tenth of its computing capacity.
[email protected]