Speech Slides
Download
Report
Transcript Speech Slides
Machine Translation
Machine translation is of one of the earliest uses of
AI
Two approaches:
Traditional approach using grammars, rewrite
rules, and lexicons
May be shallow or deep translation
May translate directly between two languages, or
from one language into interlingua and then into
second language
Statistical machine translation
Shallow Translation
Transfer model: Keep a database of translation
rules or examples. When rule matches, translate
directly
Could operate on lexical, syntactic, or semantic
level
Example: technical manuals (Siemens)
Doesn't deal with context
Deep Translation
One method uses language independent
representation of information – interlingua
Three problems:
Knowledge representation
Parsing into that representation
Generation from representation
Alternate approach is directly from one language
to another
Statistical Machine Translation
Translation model is learned from a bilingual
corpus
One system:
Break the original sentences in phrases
Choose a corresponding phrase in the target
language
Choose a permutation of the phrases
Select the most probable translation
Efficiency
Instead of examining all permutations (n!), use the
concept of distortion
Distortion di is the number of words that the
phrase fi has moved with respect to fi-1, positive if
moved to right, negative if moved to the left
Find a probability distribution for d
Each distortion is independent of the others
Efficiency (cont'd)
Still exponential over the number of phrases
Use beam search with a heuristic that estimates
probability to find nearly-most-probably translation
Speech Recognition
Challenges:
Little segmentation, unlike written text
Coarticulation: sound at end of word runs into
sound at beginning of next word
Homophones
Solutions:
Find acoustic model: P(soundi:t | word1:t)
Find language model: P(word1:t)
Use HMM and Viterbi algorithm
Acoustical Processing
Sample analog signal: sampling rate, quantization
factor matter
Divide into frames
Extract features from each frame:
Use Fourier Transform to measure acoustic energy
at about a dozen frequencies
Computer the mel frequency cepstral coefficient
(mfcc) for each frequency
Yields thirteen features
Processing (cont'd)
Each phone has a onset, middle, and end
The phone models are strung together to form a
pronunciation model for each word
Words can have a coarticulation model: “tomato”
vs. “tomahto”
Language model can be an n-gram model learned
from a corpus of text