DOWN - Ubiquitous Computing Lab
Download
Report
Transcript DOWN - Ubiquitous Computing Lab
Soft Computing
Lecture 22
Using of NN in NLP and speech recognition
12.12.2005
1
Agenda
• Introduction to NLP
• Using of recurrent NN for recognition of
correct sentences
• Example of learning software for searching
of documents by query in natural language
12.12.2005
2
Programs based on NLP
•
•
•
•
•
Question-Answering Systems
Control by command in Natural Language
Readers from text to speech
Translators
Search of information by query in Natural
Language
• OCR – Optical Characters Recognition
• Virtual Persons
12.12.2005
3
Main areas of NLP
• Understanding of NL
• Generation of NL
• Analyzing and synthesis of speech
12.12.2005
4
Levels of language
• Words, parts of words (lexical level, morphology)
– Structure of words
• Phrases, sentences (Syntax, syntactic level)
– Structure of phrases and sentences
• Sense, meaning of phrases (Semantics, semantic level)
– The meaning here is that associated with the sentential structure, the
juxtaposition of the meanings of the individual words
• Sense, meaning of sentences (Semantics, discourse level)
– - Its domain is intersentenial, concerning the way sentences fit into the
context of a dialog text
• Sense as goals, wishes, motivations and so on (Pragmatics)
– Deals with not just a particular linguist context but the whole realm of
human experience
12.12.2005
5
Example
• Following sentences are unacceptable on the basis of syntax,
semantics, and pragmatics, respectively:
– John water drink.
– John drinks dirt.
– John drinks gasoline.
• Note that the combination of "drink" and "gasoline" is not
unacceptable, as in "People do not drink gasoline" or the
metaphorical "Cars drink gasoline.“
• It is traditional for linguists to study these levels separately and
for computational linguists to implement them in natural
language systems as separate components. Sequential
processing is easier and more efficient but far less effective
than an iterated approach.
12.12.2005
6
Syntactic analyzing
• A natural language grammar specifies allowable sentence structures in
terms of basic syntactic categories such as nouns and verbs, and allows us
to determine the structure of the sentence. It is defined in a similar way to a
grammar for a programming language, though tends to be more complex,
and the notations used are somewhat different. Because of the complexity
of natural language a given grammar is unlikely to cover all possible
syntactically acceptable sentences.
• To parse correct sentences:
– John ate the biscuit.
– The lion ate the schizophrenic.
– The lion kissed John.
• To exclude incorrect sentences:
– Ate John biscuit the.
– Schizophrenic the lion the ate.
– Biscuit lion kissed.
12.12.2005
7
Simple context free grammar for
previous examples
•
•
•
•
•
•
•
•
•
•
•
sentence --> noun_phrase, verb_phrase.
noun_phrase --> proper_name.
noun_phrase --> determiner, noun.
verb_phrase --> verb, noun_phrase.
proper_name --> [Mary].
proper_name --> [John].
noun --> [schizophrenic].
noun --> [biscuit].
verb --> [ate].
verb --> [kissed].
determiner --> [the].
12.12.2005
8
Parsing
sentence
noun_phrase
verb_pharse
verb
noun_phrase
p_name
p_name
John
12.12.2005
loves
Mary
9
Parts of speech
12.12.2005
10
Examples of the part-of-speech tagging
Experiments showed that adding sub-categorization to the bare category
information improved the performance of the models. For example, an
intransitive verb such as sleep would be placed into a different class from
the obligatorily transitive verb hit. Similarly, verbs that take sentential
complements or double objects such as seem, give or persuade would be
representative of other classes. Fleshing out the sub-categorization
requirements along these lines for lexical items in the training set resulted
in 9 classes for verbs, 4 for nouns and adjectives, and 2 for prepositions.
12.12.2005
11
Recurrent Elman network
12.12.2005
12
Extraction of grammar (DFA) from
learned recurrent Elman network
•
•
1.
2.
3.
4.
The algorithm we use for automata extraction works as
follows: after the network is trained (or even during
training), we apply a procedure for extracting what the
network has learned—i.e., the network’s current
conception of what DFA it has learned.
The DFA extraction process includes the following steps:
clustering of the recurrent network activation space, S, to
form DFA states,
constructing a transition diagram by connecting these
states together with the alphabet labelled arcs,
putting these transitions together to make the full digraph
– forming loops,
reducing the digraph to a minimal representation.
12.12.2005
13
Acoustic Waves
• Human speech generates a wave
– like a loudspeaker moving
• A wave for the words “speech lab” looks like:
s
p
ee
ch
l a
b
“l” to “a”
transition:
Graphs from Simon Arnfield’s web tutorial on speech, Sheffield:
http://lethe.leeds.ac.uk/research/cogn/speech/tutorial/
12.12.2005
14
Acoustic Sampling
• 10 ms frame (ms = millisecond = 1/1000 second)
• ~25 ms window around frame to smooth signal
processing
25 ms
...
10ms
a1
12.12.2005
a2
a3
Result:
Acoustic Feature Vectors
15
Acoustic Features: Mel Scale Filterbank
• Derive Mel Scale Filterbank coefficients
• Mel scale:
– models non-linearity of human audio perception
– mel(f) = 2595 log10(1 + f / 700)
– roughly linear to 1000Hz and then logarithmic
• Filterbank
– collapses large number of FFT parameters by filtering with ~20 triangular
filters spaced on mel scale
...
m1 m2 m3 m4
12.12.2005
m5
m6
…
frequency
coefficients
16
Phoneme recognition system based on
the Elman predictive neural networks.
• The phrases are available in segmented form with speech
labeled into a total of 25 phonemes.
• Speech data was parametrisized into 12 liftered melfrequency cepstral coeficients (MFCCs) without delta
coeficients. The analysis window is 25ms and the window
shift 10ms.
• Each phoneme is modeled by one neural network. The
architecture of the neural networks which is seen during
recognition with the Viterbi algorithm (when the neural
network models provide the prediction error as distortion
measure) corresponds to a HMM with 3 states (supposing
that the second state is modeling the speech signal, and the
first and last state act as input and output states,
respectively).
• Results of experiments – Elman network provides best
results of recognition on training set in comparison with
HMM
12.12.2005
17
Technology for building of learned system for search
of documents by sense
End User
Processing of query
Knowledge Base
Constant part
Variable part
Additioanl dictionary
Base Dictionary
Dialogue with teacher
Processing of documents
Teacher-administrator
Store of documents
12.12.2005
18
Main principles in proposed
technology
• Orientation to recognition of semantics with
minimum usage of knowledge about syntax of the
language,
• Creation of hierarchies from concepts with
horizontal (associative) links between nodes of these
hierarchies as result of processing of documents,
• Recognition of words and word collocations on
maximum resembling with usage of neural
algorithms.
12.12.2005
19
Kinds of frames
1. frame coupled immediately to the word or the document
(the frame-word or the frame-document)
2. frame, with which associates a word collocation
(composite frame)
3. frame-concept including the links on several other frames,
playing the defined role in this concept
4. frame-heading circumscribing concept, which is
"exposition" of all concepts and documents coupled to this
heading
12.12.2005
20
Slots of frame
• Parent - link on the frame - parent or class (vertical links)
• Owner - list of links to frames - concepts or the composite
frame, in which structure enters the given frame
• Obj - object participating in concept,
• Subject - subject (or main object), participating in concept
• Act - operation (action) participating in concept
• Prop - property participating in concept
• Equal - list of concepts - synonyms circumscribed in the given
frame (horizontal links)
• UnEqual - list of concepts - antonyms circumscribed in the
given frame
• Include - list of links to the frames switched on in the given
concept constituent (vertical links)
12.12.2005
21
Other main parameters of frame
• Level - level of the frame in hierarchy
• DocName - index of filename (path) of document coupled to
the frame
• IndWord - index of a word in the dictionary coupled to the
frame
• H - threshold of operation of the frame, as neuron
• Role - role of the frame in concept, which it enters or can enter
(A-operation, O-object, S-subject, P-property, U-undefined or
D - the operation at the analysis (by special procedure)
• NO - indication of inversion of the frame
12.12.2005
22
Dictionaries
• Basic, in which the words with their roles
(essence, operation or property, in other
words - noun, verb or adjective are stored
• The supplemented (dynamic) dictionary
including a words, not recognized in the base
dictionary
• Dictionary of special words, associated with
separators and analyzed as separators.
12.12.2005
23
Steps of analyzing of sentence in
context of learning
1) selection of words (using signs of punctuation and spaces)
2) the recognition of words on maximum resembling with words
in the dictionary, thus if the approaching word is not in the
fundamental dictionary, then searching of this word in the
supplemented dictionary, and in fail case this word adds in this
dictionary
3) the creation of the frames of a level 0, the result of this stage is
object-sentence representing list of the frames
4) replacement in this object of special words by signsseparators,
5) processing of the object-sentence by a procedure of
recognition-creation of the frames of levels 1 and 2
12.12.2005
24
Steps of analyzing of sentence in
context of processing of query
1) selection of words (using signs of punctuation and spaces),
2) the recognition of words on maximum resembling with words in the dictionaries. In
case of unknown word system ask question "what is < new word >?". The answer of
the user is processed in context of learning.
3) the creation of the frames of a level 0, the result of this stage is object-sentence
representing list of the frames,
4) the recognition of the frames of a level 1 or 2 - word collocations in the knowledge
base maximum similar to recognized phrase (here is used neural algorithm, i.e.
weighed addition of signals from words, entering into the frame, or frames and
matching with a threshold),
5) the searching associatively coupled by the links Equal with the recognized phrases
of the frames (level 0), coupled with documents,
6) the searching of frames-documents from the retrieved frames on connections such
as include, act, obj, subject, prop from above downwards
7) the output of the retrieved names of documents or words which are included in
structure of the retrieved frames.
12.12.2005
25
Steps of learning of System
• Initial tutoring to recognition of structure of sentence by input
of sentences as "word - @symbol”. This step provides creation
of dictionary of special words
• Initial tutoring. During this step the knowledge base is filling
by fundamental concepts from everyday practice or data
domain as sentences such as "money - means of payment",
"morals - rule of behavior", ”kinds of business: trade,
production, service" etc.
• Base tutoring. In this step the explanatory dictionary of data
domain is processed, where the concepts of any area are
explained with use "-" or corresponding words.
• Information filling. In this step the real documents are
processed.
12.12.2005
26
12.12.2005
27
12.12.2005
28