Transcript Slide-ppt

CSC 594 Topics in AI –
Applied Natural Language Processing
Fall 2009/2010
1. Introduction
1
What is NLP?

Natural Language Processing (NLP) is a field in
Artificial Intelligence (AI) devoted to creating
computers that use natural language as input
and/or output.
2
NLP involves other disciplines..





Linguistics
– NLP is also called ”Computational Linguistics”
Psychology
Mathematics and Statistics
Information Theory
Computer Science
3
Machines that Can Speak

HAL 9000 in “2001: A Space Odyssey”
4
Machines that Can Speak (cont.)
 C3PO
in Star Wars

KITT
in Knight Rider
5
But Still a Sci-Fi…

“However, as 2001 approached it became clear that
2001's predictions in computer technology were far
fetched. Natural language, lip reading, planning
and plain common sense in computers were still the
stuff of science fiction.”
HAL 9000 - Wikipedia
6
NLP is AI-complete
• “The most difficult problems in AI manifest
themselves in human language phenomena.”
• Use of language is the touchstone of intelligent
behavior.
7
Test for Intelligence – Turing Test
 Alan Turing (1950) proposed a
test of a machine's capability to
perform human-like
conversation.

A human judge engages in a natural language
conversation with two other parties, one a human
and the other a machine; if the judge cannot
reliably tell which is which, then the machine is said
to pass the test.
8
Early Conversational Programs
• ELIZA (by Joseph Weizenbaum), 1966
– A psychotherapist
– No real understanding; simple pattern-matching
to respond to user input -> canned responses
(my ?x depresses me) (why does your ?x depress you) 5
(life ?x) (why do you say it ?x) 3
(I could ?x) (you could ?x) 2
(because ?x) (that is a good reason) 3
(?x) (tell me more) 0
9
>> Hello.
How do you do. Please state your problem.
>> Men are all alike.
In what way ?
>> They're alwas bugging us about something or other.
I'm not sure I understand you fully.
>> They're always bugging us about something or other.
Can you think of a specific example ?
>> Well, my boyfriend made me come here.
Your boyfriend made you come here ?
>> He says I'm depressed much of the time.
I am sorry to hear that you are depressed.
10
Recent Conversational Programs
Loebner Prize &
Chatterbots/Chatbots (1)
• In 1990, Hugh Loebner started an annual Turing
Test competition.
• Conversational Programs are nowadays called
Chatterbots (or Chatbots).
• $100,000 to the first bot that judges cannot
distinguish from a real human in a Turing test that
includes text, visual, and auditory input.
• The Prize dissolves once the $100,000 prize is
won.
11
Loebner Prize &
Chatterbots/Chatbots (2)
• Nobody has won the prize yet.
12
Real-World Applications of NLP
• NLP can be stand-along applications or
components embedded in other systems.
• Major NLP applications include:
–
–
–
–
Machine translation
Question answering
Summarization
Conversational agents
13
Commercial World
• Lot’s of exciting stuff going on…
Powerset
Source: Jurafsky & Martin “Speech and Language Processing”
14
1. Machine Translation (MT)
• One of the very earliest pursuits in computer
science (after WWII).
• Basic approaches:
– Inter-lingual (rule-based)
– Direct translation (corpus-based)  more
popular these days
• Example: Google Translate
– MT engine (based on SYSTRAN system developed in EC).
15
2. Question Answering
• Finds an answer (not a document) to a question
typed in as a natural language sentence (not
keywords).
• Most systems can only answer simple, trivial pursuit
type questions.
• Example: Ask.com
• Some search engines perform limited, phrasebased Q&A, e.g. Google
16
3. Text Summarization
• Create a summary of a text or texts.
• Many difficult problems, including:
– Paraphrases
– Anaphora (e.g.“it”, “they”)
17
4. Analyzing Web Documents
• Recently there have been many NLP applications
which analyze (not just retrieve) weblogs,
discussion forums, message boards, user groups,
and other forms of user generated media
–
–
–
–
Product marketing information
Political opinion tracking
Social network analysis
Buzz analysis (what’s hot, what topics are people talking
about right now).
Source: Jurafsky & Martin “Speech and Language Processing”
18
5. NLP in IR
• Query expansion
– Add synonyms, related words to the query terms to
improve search results.
– Example: Google AdWords tool
• NOTE: Stemming is NOT a NLP technique!!!
19
Source: Marti Hearst, i256, at UC Berkeley
20
NLP Tasks
• Those NLP applications require several NLP analyses:
– Word tokenization
– Sentence boundary detection
– Part-of-speech (POS) tagging
• to identify the part-of-speech (e.g. noun, verb) of each word
– Named Entity (NE) recognition
• to identify proper nouns (e.g. names of person, location,
organization; domain terminologies)
– Parsing
• to identify the syntactic structure of a sentence
– Semantic analysis
• to derive the meaning of a sentence
21
Different Levels of Language Analysis
• Phonology
– Speech audio signal to phonemes
• Morphology
– Inflection (e.g. “I”, “my”, “me”; “eat”, “eats”, “ate”, “eaten”)
– Derivation (e.g. “teach”, “teacher”, “nominate”, “nominee”)
• Syntax
– Part-of-speech (noun, verb, adjective, preposition, etc.)
– Phrase structure (e.g. noun phrase, verb phrase)
• Semantics
– Meaning of a word (e.g. “book” as a bound volume or an
accounting ledger) or a sentence
• Discourse
– Meaning and inter-relation between sentences
22
Why is NLP so hard..?
• Understanding natural languages is hard …
because of inherent ambiguity
• Engineering NLP systems is also hard …
because of:
– Huge amount of data resources needed (e.g.
grammar, dictionary, documents to extract
statistics from)
– Computational complexity (intractable) of
analyzing a sentence
23
Ambiguity (1)
“Get the cat with the gloves.”
Source: Marti Hearst, i256, at UC Berkeley
24
Ambiguity (2)
Find at least 5 meanings of this sentence:
“I made her duck”
1.
2.
3.
4.
5.
I cooked waterfowl for her benefit (to eat)
I cooked waterfowl belonging to her
I created the (plaster?) duck she owns
I caused her to quickly lower her head or body
I waved my magic wand and turned her into
undifferentiated waterfowl
Source: Jurafsky & Martin “Speech and Language Processing”
25
Ambiguity (3)
Some ambiguous headlines
•
•
•
•
•
Juvenile Court to Try Shooting Defendant
Teacher Strikes Idle Kids
Kids Make Nutritious Snacks
Bush Wins on Budget, but More Lies Ahead
Hospitals are Sued by 7 Foot Doctors
Source: Marti Hearst, i256, at UC Berkeley
26
Ambiguity is Pervasive
• Phonetics
–
–
–
–
–
–
–
–
–
–
I mate or duck
I’m eight or duck
Eye maid; her duck
Aye mate, her duck
I maid her duck
I’m aid her duck
I mate her duck
I’m ate her duck
I’m ate or duck
I mate or duck
Sound like
“I made her duck”
Source: Jurafsky & Martin “Speech and Language Processing”
27
• Lexical category (part-of-speech)
– “duck” as a noun or a verb
• Lexical Semantics (word meaning)
– “duck” as an animal or a plaster duck statue
• Compound nouns
– e.g. “dog food”, “Intelligent design scores …”
• Syntactic ambiguity
“I saw a man on the hill with a telescope”
• [But semantics can sometimes help disambiguate]
“I saw a man on the hill with a hat”
28
Dealing with Ambiguity
Four possible approaches:
1. Formal approaches -- Tightly coupled
interaction among processing levels;
knowledge from other levels can help decide
among choices at ambiguous levels.
2. Pipeline processing that ignores ambiguity as
it occurs and hopes that other levels can
eliminate incorrect structures.
3. Probabilistic approaches based on making
the most likely choices
4. Don’t do anything, maybe it won’t matter
Source: Jurafsky & Martin “Speech and Language Processing”
29
The Bottom Line
• Complete NL Understanding (thus general
intelligence) is impossible.
• But we can make incremental progress.
• Also we have made successes in limited domains.
• [But NLP is costly – Lots of work and resources are
needed, but the amount of return is sometimes not
worth it.]
30
Statistical Approaches to NLP
• Get large text collections (corpora)
• Compute statistics over the words in those collections
• Surprising results:
– Getting more data is better than fine-tuning algorithms!
Banko & Brill ‘01
Source: Marti Hearst, i256, at UC Berkeley
31
Sentence Analysis
“John ate the cake”
Syntactic structure
Semantic structure
(ACTION ingest
S
(ACTOR John-1)
NP
V
“John” “ate”
NP
(PATIENT food))
“the cake”
32
Syntactic Parsing


The process of deriving the phrase structure of a
sentence is called “parsing”.
The structure (often represented by a Context-Free
parse tree) is based on the grammar.
S
Grammar
R0:
R1:
R2:
R3:
R4:
R5:
R6:
R7:
S  NP VP
NP  Det N
VP  VG NP
VG  V
NP  " John"
V  " ate"
Det  " the"
N  " cake"
NP
VP
V
NP
“John” “ate” Det
N
“the” “cake”
33
Parsing Algorithms
•
•
•
•
•
•
Top-down Parsing -- (top-down) derivation
Bottom-up Parsing
Chart Parsing
Earley’s Algorithm – most efficient, O(n3)
Left-corner Parsing – optimization of Earley’s
and lots more…
34
(Bottom-up) Chart Parsing
“John ate the cake”
0
Grammar
1
2
3
4
(11) reduce
S  NP VP
NP  Det N
VP  VG NP
VG  V
NP  " John"
V  " ate"
Det  " the"
N  " cake"
0,4, S  NP VP 
(10) reduce
---
1,4, VP  VG NP 
(5) shift 2
(9) reduce
1,2, VP  VG  NP
2,4, NP  Det N 
(2) shift 2
(4) shift 2
(7) shift 2
0,1, S  NP  VP
1,2, VG  V 
2,3, NP  Det  N
(1) shift 1 “John”
(3) shift 1 “ate”
(6) shift 1 “the”
(8) shift 1 “cake”
0,1, NP " John" 
1,2, V " ate" 
2,3, Det " the" 
3,4, N " cake" 
---
0
1
2
---
3
4
35
Earley’s Algorithm
“John ate the cake”
0
Grammar
S  NP VP
NP  Det N
VP  VG NP
VG  V
NP  " John"
V  " ate"
Det  " the"
N  " cake"
0,0, S'  S
1
2
3
4
0,4, S'  S 
(12) completor
(1) predictor
0,1, S  NP  VP
0,0, S  NP VP
(2) scanner
“John”
0,4, S  NP VP 
(11) completor
(3) predictor
1,2, VP  VG  NP
1,1, VP  VG NP
(4) predictor
1,1, VG  V
(6) completor
1,2, VG  V 
(5) scanner
“ate”
(10) completor
(7) predictor
2,2, NP  Det N
1,4, VP  VG NP 
2,3, NP  Det  N
(8) scanner
“the”
2,4, NP  Det N 
(9) scanner
“cake”
36
Demo using my CF parser
37
Probabilistic Parsing
• For ambiguous sentences, we’d like to know which
parse tree is more likely than others.
• So we must assign probability to each parse tree …
but how?
• A probability of a parse tree t is
p(t )   p(r ) where r is a rule used in t.
r
and p(r) is obtained from a (annotated) corpus.
38
Partial Parsing
• Parsing fails when the coverage of the grammar is
not complete – but it’s almost impossible to write
out all legal syntax (without accepting
ungrammatical sentences).
• We’d like to at least get pieces even when full
parsing fails.
• Why not abandon full parsing and aim for partial
parsing from the start…
39
Semantic Analysis (1)



Derive the meaning of a sentence.
Often applied on the result of syntactic analysis.
“John ate the cake.”
NP
V
NP
((action
INGEST) ; syntactic verb
(actor
JOHN-01) ; syntactic subj
(object
FOOD)) ; syntactic obj
To do semantic analysis, we need a (semantic)
dictionary (e.g. WordNet,
http://www.cogsci.princeton.edu/~wn/).
40
Semantic Analysis (2)


Semantics is a double-edged sword…
– Can resolve syntactic ambiguity
 “I saw a man on the hill with a telescope”
 “I saw a man on the hill with a hat”
– But introduces semantic ambiguity
 “She walked towards the bank”
But in human sentence processing, we seem to
resolve both types of ambiguities simultaneously (and
in linear time as well)…
41
Demo using my Unification parser
42