Phoneme Speech Synthesis (cont)

Download Report

Transcript Phoneme Speech Synthesis (cont)

Frequency Domain Analysis/Synthesis
•Concerned with the reproduction of the frequency
spectrum within the speech waveform
•Less concern with amplitude variation (I.e time domain)
•A mathematical model of the frequency spectrum is stored
and used to control an electronic model of a human vocal
tract (opposed to time domain – digitize speech waveform
on a one to one analog to digital conversion basis)
•Two methods employed:
•Linear Predictive Coding (LPC)
•Formant analysis/synthesis
1
Frequency Domain Analysis/Synthesis
Linear Predictive Coding (LPC)
•E.g: Speak & Spell education toy by Texas Instrument
•Speech waveform is digitized with ADC using SPCM then
the waveform is analyzed to extract the frequency, intensity
and other vocal tract type variables needed to
mathematically reconstruct the waveform.
•The extracted speech data are then coded into a series of
linear equation parameters called LPC codes that tmodel
the frequency characteristics of the spoken waveform.
•The synthesizer circuit is designed as a model of the
human vocal tract.
2
Frequency Domain Analysis/Synthesis
Linear Predictive Coding (LPC) (cont)
•Synthesizer circuit can be divided into 3 major sections:
•Excitation source
•Multistage digital filter
•DAC
3
Frequency Domain Analysis/Synthesis
Linear Predictive Coding (LPC) (cont)
Excitation source
•Periodic pulse generator
•Emulates vocal cords action by producing periodic voiced sound
frequencies
•The rate at which vocal cords vibrate determine the pitch of the
synthesized sound
•White Noise Generator
•Produce unvoiced sounds (produced as a result of air turbulence
in the vocal cavity) by generating random frequency pattern that
result in a hissing type of noise
•Electronic Switch
•The voiced and unvoiced sound are combined by electronically
switching between the two sounds generator
•Amplifier amplified the sounds and pass it through multistage
digital filter circuit.
4
Frequency Domain Analysis/Synthesis
Linear Predictive Coding (LPC) (cont)
Multistage digital filter
•Shape or modulate the excitation signal the same way the
throat, tongue, teeth and lips modulate vocal cavity sounds
DAC
•Convert digital to analog speech signals
5
Frequency Domain Analysis/Synthesis
Linear Predictive Coding (LPC) (cont)
•LPC's code controls the the following circuit function:
•Pitch of the voiced sounds
•Selection between voiced and unvoiced sounds
•Amplitude of the excitation signal
•Control of the digital filter by giving the filter coefficients
6
Frequency Domain Analysis/Synthesis
Linear Predictive Coding (LPC) (cont)
•Weakness: It can take several minutes with a large computer
just to convert a few seconds of speech to the required LPC's
format
•Advantages:
•Once coded, LPC data rate required to reproduce speech is
less than 24,000 bps (10 seconds of speech can be stored in
less than 2.9k byte of memory)
•Retains all the pitch and accent characteristics
7
Frequency Domain Analysis/Synthesis
Linear Predictive Coding (LPC) (cont)
8
Frequency Domain Analysis/Synthesis
Linear Predictive Coding (LPC) (cont)
9
Frequency Domain Analysis/Synthesis
Formant Analysis/ Synthesis
•Similar to LPC (based on frequency spectrum found in natural
speech and utilize the same synthesizer circuit)
•Formant analysis/synthesis attempts to generate speech by
reconstructing the formant.
•Formant: Any of several frequency regions of relatively
great intensity in a sound spectrum, which together
determine the characteristic quality of a vowel sound
•Formant frequency are constantly shifting to produce
different sound as you speak. Formant frequency
characteristics of a spoken waveform can be digitally coded
and used to control frequency generators and filters in
electronic synthesizer to reproduce the original speech 10
Frequency Domain Analysis/Synthesis
Formant Analysis/ Synthesis (cont)
•Original speech formant can be coded and synthesized one word
at a time.
•Individual words are stored and played back to produce
connected speech. This is called stored-word or dictionary
•Weakness: vocabulary is fixed and limited by memory
available.
•Advantage: Less complex and economical.
11
Phoneme Speech Synthesis
12
Phoneme Speech Synthesis (cont)
•Most phoneme synthesizer are really LPC synthesizer
•Phoneme synthesizer can be divided into three major sections:
•Lookup ROM
-Translates phoneme code into a set of LPC parameter that is
applied to the excitation sources and digital filter
-LPC parameters control which excitation source is selected,
its pitch and the filter settings that are required to produce the
given phoneme.
•Excitation source
•Multistage digital filter
•Phoneme speech synthesizer can be used in one of two ways:
•direct speech synthesis
•-text-to-speech synthesis
13
Phoneme Speech Synthesis (cont)
Direct Phoneme Synthesis
•Phoneme code for a given phrase must be determined by programmer.
•This code is called phoneme string and are usually stored as part of a
speech subroutine in RAM or ROM
•The subroutine is executed when the programmed phrase must be
spoken. For example, a robot might be programmed to say “low
voltage” when its battery needs recharging. This phrase will be
executed when the voltage sensing circuit detected the low voltage
condition.
14
Phoneme Speech Synthesis (cont)
Direct Phoneme Synthesis (cont)
•Developing Phoneme String :
•Determine phoneme string symbol required for the given words
within a phrase.
•Provide pauses between syllables and words as needed for timing
and rhythm
•Provide intonation for the individual word as well as the entire
phrase
•Convert the phoneme symbol string to phoneme code string
•Execute the phoneme string, listen to the result and modify
accordingly.
15
Phoneme Speech Synthesis (cont)
Direct Phoneme Synthesis (cont)
16
Phoneme Speech Synthesis (cont)
Direct Phoneme Synthesis (cont)
17
Phoneme Speech Synthesis (cont)
Direct Phoneme Synthesis (cont)
18
Phoneme Speech Synthesis (cont)
Direct Phoneme Synthesis (cont)
19
Phoneme Speech Synthesis (cont)
Text to Speech Conversion
•Phrases is entered into a computer by means of keyboard and let the
computer perform the code conversion. Since most computer represent
letters and symbols using ASCII code, the program task reduces to
converting ASCII code to phoneme code
•Example of usage: for person who loses their sight, mute etc
•3 ways written text can be converted to phoneme code string:
•word lookup
•morpheme lookup
•phoneme lookup
20
Phoneme Speech Synthesis (cont)
Text to Speech Conversion (cont)
Word Lookup
•Also known as dictionary method
•Software will look for the ASCII representation of a space to divide
up the phrase into individual words. Each individual word will be
compared with dictionary until a match is found.
•If there is a match, lookup table will produce phoneme code string
that is required to pronounce the word.
•Phoneme code string are sequentially passed to a phoneme
synthesizer for immediate speech reproduction or temporarily stored in
a phoneme memory buffer for subsequent playback
•Weakness:
•Less flexible and need large memory
•Large dictionary require too much search time
•Abbreviation, misspelled or unusual odd might never be found. 21
Phoneme Speech Synthesis (cont)
Text to Speech Conversion (cont)
Morpheme Lookup
•Morpheme is any word or a word segment that conveys meaning.
•Example: sun in sundown, ortho in orthopedic, blue in blueberry,
the sun in sundown.
•Works like word lookup system in that the morph are stored in
memory
•Weakness:
•Text must be dissected and analyzed to produce appropriate
morph string.
•Relatively require large amount of computer time and is
inefficient (software must look at all possible ways that a given
word can be broken up in order to find respective morph).
22
Phoneme Speech Synthesis (cont)
Text to Speech Conversion (cont)
Morpheme Lookup (cont)
•Advantage
•More flexible if compared to word lookup. Only 8000 or so
morph (English word) need to be stored to obtain very large
vocabulary. New and unusual words rarely need to be added to
the dictionary, since in most cases they will consist of existing
morph.
23
Phoneme Speech Synthesis (cont)
Text to Speech Conversion (cont)
Phoneme Lookup
•Most efficient and flexible
•Also known as letter-to-phoneme lookup because of the software
attempts to convert each individual text letter or symbol to its
corresponding phoneme
•A system developed by Naval Research Laboratory (NRL) uses
production rules to convert written text into phonemes:
IF<left context (text character) right context > THEN <phoneme>
•# Context must be one or more vowels
•: Context must be zero or more consonants
•! Context must be a non-alphanumeric character (e.g. space,
punctuation mark, mathematical symbol
24
Phoneme Speech Synthesis (cont)
Text to Speech Conversion (cont)
Phoneme Lookup (cont)
•E.g: IF #: (AL)! THEN UH, L
•From the example #: means that context before AL must be one
or more vowels and must be zero or more consonants from left
to right
•The right context is represented by a single exclamation mark
(!) or context must be a non-alphanumeric character
•Therefore the word FICTIONAL (as an example) satisfies
IF #: (AL)! THEN UH, L
25
26
27
28