Speech Slides

Download Report

Transcript Speech Slides

Machine Translation


Machine translation is of one of the earliest uses of
AI
Two approaches:

Traditional approach using grammars, rewrite
rules, and lexicons



May be shallow or deep translation
May translate directly between two languages, or
from one language into interlingua and then into
second language
Statistical machine translation
Shallow Translation




Transfer model: Keep a database of translation
rules or examples. When rule matches, translate
directly
Could operate on lexical, syntactic, or semantic
level
Example: technical manuals (Siemens)
Doesn't deal with context
Deep Translation


One method uses language independent
representation of information – interlingua
Three problems:




Knowledge representation
Parsing into that representation
Generation from representation
Alternate approach is directly from one language
to another
Statistical Machine Translation


Translation model is learned from a bilingual
corpus
One system:




Break the original sentences in phrases
Choose a corresponding phrase in the target
language
Choose a permutation of the phrases
Select the most probable translation
Efficiency




Instead of examining all permutations (n!), use the
concept of distortion
Distortion di is the number of words that the
phrase fi has moved with respect to fi-1, positive if
moved to right, negative if moved to the left
Find a probability distribution for d
Each distortion is independent of the others
Efficiency (cont'd)


Still exponential over the number of phrases
Use beam search with a heuristic that estimates
probability to find nearly-most-probably translation
Speech Recognition

Challenges:




Little segmentation, unlike written text
Coarticulation: sound at end of word runs into
sound at beginning of next word
Homophones
Solutions:



Find acoustic model: P(soundi:t | word1:t)
Find language model: P(word1:t)
Use HMM and Viterbi algorithm
Acoustical Processing



Sample analog signal: sampling rate, quantization
factor matter
Divide into frames
Extract features from each frame:



Use Fourier Transform to measure acoustic energy
at about a dozen frequencies
Computer the mel frequency cepstral coefficient
(mfcc) for each frequency
Yields thirteen features
Processing (cont'd)




Each phone has a onset, middle, and end
The phone models are strung together to form a
pronunciation model for each word
Words can have a coarticulation model: “tomato”
vs. “tomahto”
Language model can be an n-gram model learned
from a corpus of text