Wideband Codecs for Enhanced Voice Quality
Download
Report
Transcript Wideband Codecs for Enhanced Voice Quality
Wideband Codecs
for Enhanced Voice Quality
Ensuring optimum wideband speech quality in converged
VoIP/mobile applications/services
Claude Gravel
VP Engineering
VoiceAge Corporation
Contents
• Introduction
• Why Wideband Speech?
• Deployment Challenges
• AMR-WB Alleviates These Challenges
• Market Momentum / Conclusions / Demo
3
VoiceAge Corporation – who are we?
Business Low bit rate audio compression technologies research, IPR
licensing and optimized implementations development
Headquarters Montreal, Canada
Technologies AMR : 3GPP, CableLabs narrowband voice codec
AMR-WB : 3GPP, ITU-T, CableLabs wideband voice codec
VMR-WB : 3GPP2, CableLabs wideband voice codec
AMR-WB+ : 3GPP, DVB-H audio codec
Achievements Won every international audio compression standard for
which VoiceAge competed in the last 10 years at 3GPP,
3GPP2, ITU, ETSI, TIA, CableLabs
Implementations World Class optimized implementations and proprietary
solutions on multiple O/S and processors/platforms
(including TI- & ARM-based systems)
4
Deployment More than 2B mobile phones and over 500M PCs currently
use VoiceAge’s technologies
International Standards Using ACELP®
5
Contents
• Introduction
• Why Wideband Speech?
• Deployment Challenges
• AMR-WB Alleviates These Challenges
• Market Momentum / Conclusions / Demo
6
Speech Synthesis Model
Used in CELP/ACELP® Speech Coding
1 = air from lungs
3
2 = vocal chords (periodicity)
3 = vocal tract articulators
(including jaw, lips,
tongue, velum)
2
1
c(n)
Innovative
excitation
1
7
Long-term
Prediction
2
v(n)
Short-term
Prediction
3
^
s(n)
Synthesized
speech
Speech Signal
– Basically, same synthesis model for everyone
– So, speech has a “universal” structure or signature
1.25 sec
v
oi
ce
a
ge
180 ms
45 ms
Voiced fricative
70 ms
• quasi periodic + noise
Purely Voiced
• lower energy
• quasi periodic
• high energy
• more low frequency
energy
• strongly correlated
45 ms
8
45 ms
Unvoiced
• non periodic
• low energy
• uncorrelated
• more high frequency energy
Transient
• variable energy
• fast spectral
evolution
What is Wideband Communication?
• Delivers double the
audio signal bandwidth
• Enables digital end-to-end
packet-based services to
deliver much better speech
communication quality than
traditional PSTN circuitswitched telephony
• VoIP quality
differentiator
9
Signal Power
• Substantially increases
captured speech information
Frequency Range
An Emerging Opportunity to Deliver
Vastly Improved Speech Quality
Signal Bandwidth
Wideband Speech: Below 200 Hz: increased naturalness, presence, and comfort.
Above 3400 Hz: increased intelligibility and fricative differentiation
Voiced segment
Unvoiced segment
10
Typical Speech Signal Acoustics
7000
Improved
voice quality &
intelligibility
(e.g., s & f
differentiation)
6000
4000
3000
2000
1000
0
0
0.5
1.0
1.5
Time [s]
2.0
2.5
3.0
200 - 3400Hz
50 - 7000Hz
Frequency [Hz]
5000
Improved
speech
naturalness,
presence and
comfort
“Everyone looked extremely confused about the news”
11
Wideband telephony covers much more speech signal information
Why Wideband Speech Now?
• Improved intelligibility, naturalness and presence
– Reduces listener fatigue
– Improved hands-free/speakerphone sound quality
– Improves speaker and speech recognition
• High-quality low-bit-rate wideband codecs
– G.722.2/AMR-WB at ~7–24 kbps
– No need to increase network capacity to deliver better quality sound
• Wideband capable devices are available now
– Wideband audio microphones and device acoustics more affordable
• Rising user awareness of enhanced sound quality
– Wideband teleconferencing
– Wideband enterprise/ASP IP telephony
– Wireless/VoIP multimedia services
12
Speech Coding Technology, Network/Device Capabilities and Market
Demand are Converging Towards Pervasive Wideband Communications
Contents
• Introduction
• Why Wideband Speech?
• Deployment Challenges
• AMR-WB Alleviates These Challenges
• Market Momentum / Conclusions / Demo
13
Voice Processing -- Key for Speech Quality
Control & Management
Voice Processing (Digital Communications Domain)
PCM
I/F
Echo
Echo
Canceller
Canceller
Speech
Codec
Noise
Suppressor
VAD
CNG
DTX
PLC
VariableMulti Rate
Switching
Jitter
Buffer
Voice MIBs
System MIBs
Call
Processing
SNMP
Signaling
Protocol
Packet
De-Packet
[RTP]
UDP
Analog Domain
TCP
IP
MAC Layer
Physical Layer
14
Codec choice impacts network cost and interoperability
+
A major contributor to the listener quality experience
Speech Coding Attributes
As required
by specific
applications
Bit rate
• As low as possible
Delay
• As little as possible
Quality
• As high as possible
Difficult to
attain all of
these often
divergent
objectives at
the same time
Complexity
• As algorithmically simple as possible to constrain
platform processing and memory requirements and
reduce battery consumption in mobile devices
Robustness
• Effective operation under background noise and
channel impairment conditions
Standards compliance
• Open, tested and interoperable solutions
15
VoIP Speech Quality Challenges
16
• Missing packets
• Due to network congestion or transmission errors
• Wireless networks are more prone to losing
packets
• Packet delay
• Due to network congestion or transmission errors
• Real-time communication can’t wait too long for
packets or retransmission
• Transcoding
• Needed when end-devices and network equipment
support incompatible speech/audio coding
technologies – traversing diverse networks such as
across fixed/mobile environments
• Increases system costs, adds delays and
introduces audio quality impairments
• Background noise
• Reduces intelligibility and comfort level of
conversations
• Ambient office/workplace/household noise
• Street/car noise in mobile applications
Speech Processing Techniques for
Improving VoIP Voice Quality
• Missing packet impairments can be mitigated through…
– Sending additional data to help preserve information
• FEC/Repetition of frames
• Works well for sporadic packet losses but not so well for bursts of
lost packets
• Increases transmitted bit rate to send redundant information
frames
f(n-2)
f(n-1)
f(n)
f(n+1)
f(n+2)
f(n+3)
f(n+4)
p(n-1)
p(n)
p(n+1)
packets
p(n+2)
p(n+3)
time
17
p(n+4)
A simple forward error correction scheme based on repeating the previous frame in each packet
Speech Processing Techniques for
Improving VoIP Voice Quality
• Missing packet impairments can be mitigated through …(cont’d)
– Packet loss concealment (PLC)
• Techniques used by the decoder to estimate parameter values for
missing frames based on the characteristics of preceding frames
• Can be improved by classifying frames and repeating or adjusting
parameters based on heuristics driven by the classes of the frames
preceding the missing frame(s)
– Extrapolate missing frame parameters as a function of the
expected frame class (e.g., voiced/unvoiced, stops, nasals, …)
– E.g., for voiced frames, repeat the pitch parameters
– Objective: limit abrupt changes in energy that can cause
annoying clicks
• Late packet arrival processing can also be leveraged to benefit from
some of the information in a packet that arrives too late
– Can benefit PLC methods as applied to subsequent delayed or
lost packets
18
Speech Processing Techniques for
Improving VoIP Voice Quality
• Missing packet impairments can be mitigated through…(cont’d)
– Frame Interleaving
• Each packet contains non-contiguous frames to lower the overall
impact on the reconstructed speech signal of a lost packet
• Introduces delays which may make it unsuitable for real-time speech
communication
• Works well for audio streaming
frames
f0
f0
f1
f3
packet 1
f2
f3
f4
19
f6
f1
f4
f7
f8
I.e., loss of packet 2 leads to
non-contiguous missing
frames which are easier to
compensate for in the
decoder through PLC
f6
packet 2
time
f5
f7
f2
f5
f8
packet 3
Speech Processing Techniques for
Improving VoIP Voice Quality
• Network congestion, which can lead to delayed or dropped
packets, can be alleviated by lowering the average
communication bit rate …
– VAD/DTX/CNG
• Using Voice Activity Detection (VAD), Discontinuous Transmission
(DTX) and Comfort Noise Generation (CNG) capabilities to limit
consumed bandwidth during periods of silence during a conversation
– Adaptive codecs
– Source controlled
» Optimal selection of the bit rate and coding scheme based on
active speech
– Network controlled
» Adapt the bit rate to make best use of varying available
bandwidth
20
Transcoder-Free Network Design for
Fixed/Mobile Convergence
21
Improving VoIP Speech Quality
Mitigating the main issues impacting VoIP speech quality
• Missing packets
• Delayed packets
• Transcoding
• Background noise
22
• Proper network engineering with integrated
QoS mechanisms (in closed systems)
• Choosing the best speech coding/processing
technology (adaptive, enhanced voice quality,
robust and extensible)
• Improved packet loss concealment
•Late packet arrival processing
•Time scale modification
• Adaptive jitter buffering
• Transcoder-free network design to avoid
increased system costs, delays and audio
quality impairments
• Leverage seamlessly interoperable standardsproven codecs
• Choose codecs that can readily accommodate
background noise suppression algorithms
• Proven noise suppression in standards
selection & characterization testing results
Contents
• Introduction
• Why Wideband Speech?
• Deployment Challenges
• AMR-WB Alleviates These Challenges
• Market Momentum / Conclusions / Demo
23
Why AMR-WB/G.722.2
• AMR-WB/G.722.2 is the right wideband codec for
network convergence
– Very robust
• Supports dynamic adaptation to mobile network conditions
• Includes built-in efficient packet loss concealment
• Performs well even with high bit error rates
– Multi-rate codec delivers very good quality even at bit rates
comparable to those of narrowband (~12 kbps)
• No need for potentially costly and time-consuming network
capacity upgrades
–
–
–
–
Supports VAD/DTX/CNG for enhanced efficiency
Low-complexity encoder and decoder
Standardized in 3GPP, ITU-T & CableLabs PacketCable 2.0
Can interoperate transcoder free across mobile/IP networks
• Eliminates latency, impairments, costs
24
Subjective NB-WB Quality Comparison
NB-WB Voice Quality as a Function of Bit Rate
Ericsson Review, No. 3, 2006
25
AMR-WB/G.722.2 Greatly Improves Perceived Voice Quality
AMR-WB Subjective Testing Results
5.0
4.5
Clean Condition Test (English Language)
AMR-WB/G.722.2 Characterization Test
G.722 @ 64 kbps
4.0
G.722 @ 48 kbps
MOS
3.5
3.0
G.722.2 @ 8.85 kbps
2.5
G.722.2 @ 12.65 kbps
G.722.2 @ 18.25 kbps
2.0
G.722.2 @ 23.05 kbps
1.5
1.0
26
No Tandem -26 dBov
Self-Tandem -26 dBov
AMR-WB/G.722.2 Delivers Excellent Wideband Speech Quality
Even at Low Bit Rates (e.g., MOS at 8.85 kbps exceeds G.722 at 48 kbps)
AMR-WB CPU efficiency
• AMR-WB/G.722.2 performance on widely deployed
communications device processors show the codec’s
relatively low complexity
Mode
Bit rate (kbps)
0
6.6
1
8.85
2
12.65
3
14.25
4
15.85
5
18.25
6
19.85
7
23.05
8
23.85
39
11
34
9
39
8
41
8
41
8
42
8
43
8
43
8
43
9
19.67
4.88
21.24
4.35
24.64
4.20
27.02
4.30
27.23
4.39
28.20
4.55
29.33
4.61
29.13
4.83
26.64
5.21
22.15
5.94
23.75
5.00
26.98
4.81
29.36
4.85
29.58
4.88
30.68
4.95
32.10
4.98
31.76
5.05
29.97
5.40
ARM 9E (MHz)
Encoder
Decoder
TI C55x (MIPS)
Encoder
Decoder
TI C64x (MIPS)
Encoder
Decoder
27
Supported by most commonly used communications processors
The Standard Solution Advantage
• Open, collaborative and competitive process
• Requirements specifically address target
applications
• Published algorithms and source code
– Permits wider and more effective scrutiny
– Clearer intellectual property ownership
• Rigorous comparative testing under diverse
conditions
–
–
–
–
28
Background noise types and levels
Spoken languages
Speaker types
Various network impairments
Interoperable, Open and Fully Tested
Ensures that the best technologies are chosen
Interoperability between Fixed/Mobile
Network Services
Transcoder-free Interoperability in Fixed/Mobile Convergence
•
•
•
•
3GPP – Wi-Fi/WiMAX – ITU-T interoperability
AMR-WB / G.722.2 end-to-end across networks
No need for transcoding at media gateways
Improves on service quality end to end
•
•
29
Reduces network delays and equipment complexity
Lowers network costs (equipment costs and licensing)
Contents
• Introduction
• Why Wideband Speech?
• Deployment Challenges
• AMR-WB Alleviates These Challenges
• Market Momentum / Conclusions / Demo
30
Growing Market Momentum
Chipset / Silicon
Vendors
• VeriSilicon
• Texas Inst.
• Freescale
• Renesas
• ST Micro
•…
31
Test Set
Vendors
Terminal Device
Manufacturers
•
•
•
•
•
•
•
•
•
Nokia
Sony-Ericsson
Motorola
Samsung
Panasonic
NEC
CounterPath
Polycom
Mobiles,
Softphones,
VoIP terminals,
Conferencing
terminals…
Network
Equipment
Vendors
•
•
•
•
Nokia
Ericsson
AudioCodes
Gateways,
ATA/MTA,
Softswitches,
…
• VoiceAge
• Others…
Codec
Developers
• T-Mobile
Trial
• Wireless
Operators
• Cablecos
• VoIP ASPs
• …
• Ixia
• Tektronix
• GL Comms
• NetHawk
• Many others
Network
Operators
Service
Providers
Accelerating Adoption of AMR-WB/G.722.2 leads to
Happy Consumers and a Wealthy Telecom Service Value Chain
Successful Ericsson/T-Mobile Trial
> 90% +’ve
35%
Extremely Good
36%
11%
11%
Good
Quite
Good
Nice to
Have
4% 2% 3%
Ericsson Review, No. 3, 2006
• 150 consumers participated for 4 weeks in Germany,
April/May 2006 – confirmed earlier lab MUSHRA tests
Quite Bad
Bad
Extremely Bad
– More than 90% perceived better voice quality & clarity
– Felt a greater sense of privacy, discretion & comfort due to improved
voice quality & intelligibility
– Could more easily place & complete calls in environments with high
background noise
– Business users highly valued voice quality for improving communication,
reducing expenses & giving a positive impression
• Ericsson anticipates positive outcomes for operators
32
– More mobile traffic, i.e., more calls for longer durations
– Can offer enhanced services for conferencing, personalized ringback
signals, automatic voice recognition, voice mail …
– Can cut costs, e.g., by reducing cost of acquiring new subscribers,
reducing helpdesk costs
Wideband Speech Communications
An Evolutionary Migration
• Wideband speech coding is consistent with
narrowband codecs
– Bit rates comparable to narrowband codecs
– Similar robustness techniques to handle packet losses
and delays can be used
– Low-complexity implementations available for all popular
communications processor types
– While vastly improving perceived voice quality
• Strategically deploying wideband capability in
terminal and network equipment enables evolution
to wideband speech communications
33
– Compatible with existing network infrastructure
– No forklift replacements needed … a graceful
evolutionary migration, not a disruptive revolution
Conclusions
Speech communications are rapidly moving to end-to-end digital
packets over all networks – wired and wireless – towards
fixed/mobile convergence
• Provides an opportunity to vastly improve communications quality through
widescale deployment of wideband speech
– Efficient codecs, devices with wideband acoustics and processing are
already available
• Many benefits but also some challenges to consistently delivering high-quality
voice end to end in real-world deployments
• Enhanced speech coding and processing techniques have been developed to
help overcome these challenges
• The selection of standards-based advanced wideband speech coding
technologies such as AMR-WB/G.722.2 is one of the fundamental steps towards
improving voice quality between diverse devices and converging networks
• Adoption of AMR-WB/G.722.2 in the telecom service delivery value chain is
growing – wideband speech quality has been shown to be highly preferred by
consumers
34
Are your devices, systems, solutions, services ready?
Hear the rich sound of wideband
Wideband Demo
35
Wideband Codecs for Enhanced Voice Quality
Thank you!
[email protected]
www.voiceage.com
Come and talk to VoiceAge at Booth #107
36