Fundamentals of C - Natural Language Processing Lab., Korea

Download Report

Transcript Fundamentals of C - Natural Language Processing Lab., Korea

Chapter1
Introduction to NLP, CL, and Speech
Recognition
Hae-Chang Rim
speech and language processing






1.1 Knowledge in SLP
1.2 Ambiguity
1.3 Models and Algorithms
1.4 Language, Thought and Understanding
1.5 The State of the Art and the near Future
1.6 Brief history
2
What should we study?

study what goes into getting computers to perform
useful and interesting tasks involving human languages?

Consider HAL, the computer from 2001: A Space Odyssey
3
What should we study?

Such an artificial agent interacts with humans via
languages



understanding humans via speech recognition and
natural language understanding
communicating with humans via natural language
generation and speech synthesis
replying to humans via information retrieval,
information extraction, and inference
4
Speech & Langue Processing

Solving these language-related problems,



Natural Language Processing
Computational Linguistics
Speech Recognition & Synthesis
speech & language
processing
5
What’s needed?

categories of linguistic knowledge in SLP
 phonetics(음성학) & phonology(음운론): production of
speech sounds, patterns/rules of sounds (phonemes)
 morphology(형태론) : shape of word/morpheme, meaningful
components of words and behavior of words in contexts
 syntax(문법론) : properly order and group words together to
make phrases, clauses, and sentences (structural
relationships between words)
6
What’s needed?

categories of linguistic knowledge in SLP(cont.)
 semantics(의미론): lexical semantics(the meaning of the
component words), compositional semantics(how the
components combine to form larger meanings)
 pragmatics(화용론): appropriate use of language, in terms of
their context of use (background knowledge, beliefs of speaker
and hearer, relevant answer), how language is used to
accomplish goals
 discourse(담화) : structured conversation, the study of
linguistic units larger than a single utterance
7
What Should You Care?

all tasks in SLP can be viewed as resolving ambiguity at
one of six levels……………
8
Ambiguity

Consider the spoken sentence I made her duck.
five interpretations:
(1.1) I cooked waterfowl for her.
(1.2) I cooked waterfowl belonging to her.
(1.3) I created the (plaster?) duck she owns.
(1.4) I caused her to quickly lower her head or body.
(1.5) I waved my magic wand and turned her into
undifferentiated waterfowl.
9
Ambiguity

Ambiguities of “ I made her duck” :
duck: verb, noun (morphologically ambiguous)  POS
tagging
her: dative pronoun, a possessive pronoun
(morphologically or syntactically ambiguous) 
syntactic disambiguation
make: create, cook (semantically ambiguous)  word
sense disambiguation
make: taking a single object (transitive), taking two objects
(ditransitive) (syntactically ambiguous) syntactic
disambiguation
10
Resolving Ambiguities

Lexical disambiguation



Syntactic disambiguation


Part-of-speech tagging
Word sense disambiguation
E.g. probabilistic parsing
Speech act interpretation

Given sentence is statement or a question
11
Models and Algorithms

models : the formalisms that are used to capture the
various kinds of linguistic facts (knowledge) we need


State machines, formal rule systems, logic, etc.
Algorithms: used to search or manipulate input
representations to create the structures that are needed

Depth first search, best-first search, etc.
12
Models in SLP

State Machines: formal models that consist of states,
transitions among states and an input representations


Formal rule systems


deterministic/non-deterministic FSA, FST, weighted automata,
(hidden) markov models
regular grammar, regular relations, context-free grammars,
feature-augmented grammars, and their probabilistic variants
algorithms associated with both state-machines and
formal rule systems
search algorithm : In most problems, the input spaces are
normally too large to exhaustively explore, depth-first search,
best-first, A*
dynamic algorithm : redundant computations are avoided
13
Models in SLP

Logical formalisms


Probability theory : to solve the many kinds of
ambiguity problems (choose the most probable one)


first-order logic (predicate-calculus), feature-structures,
semantic networks, conceptual-dependency
Each of the other models (state-machines, formal rule systems,
and logic) can be augmented with probabilities
Machine learning tools: focus on ways to automatically
learn the various representations; automata, rule
systems, search heuristics, classifiers

trained on large corpora
14
language, thought and understanding

SLP has an AI-ish flavor


the effective use of language is intertwined with our
general cognitive abilities
machine and think

Turing test (1950): .en empirical test in which a
computer’s use of language would form the basis for
determining if it could think

ELIZA program(1966) : early natural language processing
system capable of carrying on a limited form of
conversation with a user, make use of simple patternmatching to mimic a psychotherapist
15
Turing Test
INTERFACE
CONTROLLED
BY JUDGE
JUDGE
HUMAN
‘INTELLIGENT SUBJECT’
QUESTION
ANSWER
HUMAN
QUESTION
ANSWER
QUESTION
ANSWER


MACHINE
The goal of the machine is to fool the judge into believing that
it is the person.
If the machine succeeds at this, then we will conclude that the
machine can think.
16
The state of the art

recent commercialization of robust speech recognition
systems and the rise of the Web


SLP in spotlight & a plethora of exciting possible applications
current applications



METEO project : broadcast weather reports in English and
French (Chandioux, 1976)
Babel Fish: translation system from Systran operating on Alta
Vista search engine
VOYAGER system : spoken language interface system can
answer a number of different types of questions concerning
navigation within a city, as well as provide certain information
about hotels, restaurants, libraries (Zue et al., 1991)
17
The state of the art

current applications (cont.)




IEA system: scoring written essays by computer (Landauer et
al., 1997)
project LISTEN’s Reading Tutor : helps children learn to read,
uses speech recognition to listen to them read and responds
with spoken and graphical feedback (Mostow and Aist 1999).
VITRA system (visual translator) : watch a short video clip of a
soccer match and provide a natural language report
(integrating vision processing and natural language processing)
(Wahlster 1989)
intelligent communication aids for people with disabilities
(Newell et al., 1998; McCoy et al., 1998)
18
Some brief history

SLP is interdisciplinary….., has different historical
threads




computational linguistics in linguistics,
natural language processing in computer science,
speech recognition in electrical engineering,
computational psycholinguistics in psychology.
19
Some brief history

Foundational insights(1940s and 1950s) : intensive
work on two paradigms: the automaton and
probabilistic or information-theoretic models





automaton (Turing 1936), McCulloch-Pitts neuron (McCulloch
and Pitts 1943)
probabilistic models of discrete Markov processes to
automata for language (Shannon 1948)
finite-state grammar (Chomsky 1956)
noisy channel, decoding, entropy (Shannon)
first a statistical machine speech recognizer that recognize
any of the 10 digits from a single speaker (Bell Labs, Davis et
al., 1952)
20
Some brief history

1957~1970 : two paradigms: symbolic and stochastic
 symbolic paradigm
 took off from two lines of linguistic research: the work of
Chomsky, work on formal language theory and
generative syntax
 many works on parsing : top-down, bottom-up, dynamic
programming, e.g. Harris’s parser (1962)
 AI-related works (reasoning and logic, knowledgerepresentation, general problem solver) : John McCathy,
Marvin Minsky, Claude Shannon, Newell
21
Some brief history

1957~1970 : two paradigms: symbolic and stochastic
(cont.)
 stochastic paradigm
 took hold mainly in statistics and electrical engineering
 Bayesian methods were applied to optical character
recognition and text recognition (Browning, 1959; Mosteller
and Wallace, 1964)
 first on-line corpora, one-line dictionary
 Brown corpus : a 1 million word collection of samples
(Kucera and Francis, 1967; 1979; 1982)
 DOC : on-line Chinese dialect dictionary
22
Some brief history

1970-1983 : Four paradigms (stochastic, logic-based,
natural language understanding, discourse modeling)
 stochastic paradigm
 played a huge role in the development of speech
recognition algorithms, particularly the Hidden Markov
Model, noisy channel, and decoding
 SR research group
 IBM’s TJ Watson Research group (Jelinek, Bahl, Mercer)
 CMU group (Baker)
 AT&T Bell Lab. (Rabiner and Juang)
23
Some brief history

1970-1983 : Four paradigms (cont.)
 logic-based paradigm
 Q-systems and metamorphosis grammars (Colmerauer,
1970, 1975)
 Definite Clause Grammars (Pereira and Warren, 1980)
 Functional grammar (Kay, 1979)
 LFG and feature structure unification (Bresnan and Kaplan,
1982)
24
Some brief history

1970-1983 : Four paradigms (cont.)
 natural language understanding paradigm
 SHRDLU system which simulated a robot embedded in a
world of toy blocks by accepting natural language text
commands (Winograd, 1972)
 Conceptual knowledge representation researches such
as scripts, plans, goals, and human memory organization
(Schank and his colleagues, 1972, 1975, 1979)
 Network based semantics (Quillian, 1968; Rumelhart,
1975; Fillmore, 1968; Simmons, 1973)
 LUNAR QA system (Woods, 1973)
25
Some brief history

1970-1982 : Four paradigms (cont.)
 discourse modeling paradigm
 focused on four key areas in discourse
 study of substructure in discourse (Groz, 1977)
 study of discourse focus (Sidner, 1983)
 study of automatic reference resolution (Hobbs, 1978)
 study of BDI (belief-desire-intention) framework and
speech acts (Perrault and Allen, 1980; Cohen and
Perrault 1979)
26
Some brief history

1983-1993 : Empiricism and Finite State Models Redux
 Finite State Models
 finite-state phonology and morphology (Kaplan and Kay,
1981)
 finite-state models of syntax (Church, 1980)
 Return of empiricism
 the rise of probabilistic models throughout speech and
language processing
 probabilistic methods and data-driven approaches
spread into POS tagging, parsing, attachment
disambiguation
 connectionist approaches
27
Some brief history

1994-1999 : the field comes together




probabilistic and data-driven models had become quite
standard throughout natural language processing
the increases in the speed and memory of computers had
allowed commercial exploitation of a number of SLP: speech
recognition, and spelling & grammar checking
SLP algorithms began to be applied to Augmentative and
Alternative Communication (AAC)
the rise of the Web emphasized the need for language-based
information retrieval and information extraction
28
Summary

A good way to understand the concerns of SLP process
ing research is to consider what it would take to create
an intelligent agent like HAL from 2001: A Space Odyss
ey.

Speech and language technology relies on formal mode
ls, or representations, of knowledge of language at the
6 levels of phonology and phonetics, morphology, synt
ax, semantics, pragmatics and discourse
29
Summary

The foundations of speech and language technology lie
in computer science, linguistics, mathematics, electrica
l engineering and psychology.

The critical connection between language and thought
has placed speech and language processing technolog
y at the center of debate over intelligent machines.

Revolutionary applications of speech and language pro
cessing are currently in use around the world.

Recent advances in speech recognition and the creation of the
World-Wide Web will lead to many more applications
30
bibliographical and historical notes

NLP-related conferences






IR-related conferences




ACL/EACL/NAACL
COLING
IJCNLP
ANLP(Applied Natural Language Processing)
EMNLP(Empirical Methods in Natural Language Processing
SIGIR
AIRS
TREC
NLP-related journal


Computational Linguistics
Natural Language Engineering
31
bibliographical and historical notes

speech-related conferences




ICSLP (International Conference on Spoken Language
Processing)
EUROSPEECH
IEEE ICASSP(IEEE International Conference on Acoustics,
Speech, and Signal Processing)
speech-related journal



Speech Communication
Computer Speech and Language
IEEE Transactions on Pattern Analysis and Machine
Intelligence
32
bibliographical and historical notes

AI-related conferences



AAAI (American Association for Artificial Intelligence)
IJCAI (International Joint Conference on Artificial Intelligence)
AI-related journal




Artificial Intelligence
Computational Intelligence
IEEE Transactions on Intelligent Systems
Journal of Artificial Intelligence Research
33
bibliographical and historical notes

Cognitive Science-related Workshops



DARPA Speech and Natural Language Processing Workshop
ARPA Workshop on Human Language Technology
Cognitive Science-related journal

Cognitive Science
34
bibliographical and historical notes

Textbooks





Foundations of Statistical Language Processing (Manning and
Schütze, 1999)
Statistical Language Learning (Charniak, 1993)
Natural Language Understanding (Allen, 1995)
Natural Language Processing in Lisp/Prolog (Gazdar and
Mellish, 1989)
Readings in Natural Language Processing (Grosz et al., 1986)
35