ppt - CSE, IIT Bombay

Download Report

Transcript ppt - CSE, IIT Bombay

An Introduction to
Natural Language Syntax
Rajat Mohanty
[email protected]
CS-460/IT-632
Department of Computer Science and Engineering
13 Jan 2006
Indian
Institute of Technology, Bombay
Outline
Grammatical Analysis
Finite State Grammar
Phrase Structure Grammar
Transformational Grammar
Natural Language Phenomena
13 Jan 2006
A Ubiquitous Task for NLP
Sequence labeling task can be at different
levels.
In written text
Words
Phrases
Sentences
Paragraphs
13 Jan 2006
Names for Labeling Tasks
Words: Part of Speech tagging
Phrases:
Chunking
Sentences: Parsing
Paragraphs: Co-reference annotating
13 Jan 2006
Example (Words: POS Tagging)
<s> The dispute shows clearly the global power
of Japan's financial titans.</s>
<s>[ The/DT dispute/NN ] shows/VBZ
clearly/RB [ the/DT global/JJ power/NN ]
of/IN [ Japan/NNP 's/POS financial/JJ
titans/NNS ]
./.
</s>
13 Jan 2006
Example (Phrases: Chunking)
The dispute
shows clearly
the global power
of
Japan's financial titans
13 Jan 2006
Example (Sentences: Parsing)
( (S (NP-SBJ The dispute)
(VP shows
(ADVP-MNR clearly)
(NP (NP the global power)
(PP of
(NP (NP Japan 's)
financial titans))))
.))
13 Jan 2006
Parse Tree
S
NP
Det
VP
NP
V
NP
N
Det
The
13 Jan 2006
dispute
shows
JJ
N
PP
the global power
of Japan’s
financial
titans
Example (Sentences: Co-referencing)
( (S (NP-SBJ-1 The banks)
(VP (ADVP-MNR badly)
want
(S (NP-SBJ *-1)
(VP to
(VP break
(PP into
(NP (NP all aspects)
(PP of
(NP the securities business))))))))
13 Jan 2006
What is Grammar?
A theory of language
A theory of competence of a native speaker
(in the context of a Natural Language)
A finite set of rules
that generates only and all sentences of a
language.
that assigns an appropriate structural
description to each one.
An explicit model of competence
13 Jan 2006
What are the requirements?
An explicit model of competence
Should be able to generate an infinite set of
grammatical sentences of the language
Should not generate any ungrammatical ones
Should be able to account for ambiguities (i.e., If a
sentence is understood to have two meanings, the
grammar should give two different structural
description)
If two sentences are understood to have same
meaning, the grammar should give the same structure
for both at some level
If two sentences are understood to have different
internal relationship, the grammar should assign
13 Jan 2006 different structural description
What is Syntax?
Syntax is the study of the combination of
words into phrases, clauses and sentences
Syntax describes how sentences and their
constituents are structured
13 Jan 2006
Grammatical Analysis Techniques
Two main devices
Breaking up a String
Sequential
Hierarchical
Transformational
Labeling the Constituents
Morphological
Categorial
Functional
A grammar may combine any of these
devices for grammatical analysis.
13 Jan 2006
Breaking up and Labeling
Sequential Breaking up
Sequential Breaking up
and Morphological Labeling
Sequential Breaking up
and Categorial Labeling
Sequential Breaking up
and Functional Labeling
Hierarchical Breaking up
Hierarchical Breaking up
and Categorial Labeling
Hierarchical Breaking up
and Functional Labeling
13 Jan 2006
Sequential Breaking up
That student solved the problems.
that + student + solve + ed + the + problem + s
13 Jan 2006
Sequential Breaking up and
Morphological Labeling
That student solved the problems.
that student solve ed
word
13 Jan 2006
word
stem
the
affix word
problem
s
stem
affix
Sequential Breaking up and Categorial
Labeling
This boy can solve the problem.
this
boy
can
solve
the
problem
Det
N
Aux
V
Det
N
They called her a taxi.
13 Jan 2006
They
call
ed
her
a
taxi
Pron
V
Affix
Pron
Det
N
Sequential Breaking up and
Functional Labeling
13 Jan 2006
They
called
her
Subject
Verbal
Direct
Object
They
called
her
Subject
Verbal
Indirect
Object
a
taxi
Indirect
Object
a
taxi
Direct
Object
Hierarchical Breaking up
Old men and women
Old men
and
women
Old
men and women
Old
men and women
men
13 Jan 2006
and
women
Old men
Old
men
and
women
Hierarchical Breaking up and
Categorial Labeling
Poor John ran away.
S
NP
A
Poor
13 Jan 2006
VP
N
John
V
Adv
ran
away
Hierarchical Breaking up and Functional
Labeling
Immediate Constituent (IC) Analysis
Construction types in terms of the function
of the constituents:
Predication (subject + predicate)
Modification (modifier + head)
Complementation (verbal + complement)
Subordination (subordinator + dependent unit)
Coordination (independent unit + coordinator)
13 Jan 2006
Predication
[Birds]subject [fly]predicate
S
Subject
Birds
13 Jan 2006
Predicate
fly
Modification
[A]modifier [flower]head
John [slept]head [in the room]modifier
S
Subject
John
Predicate
Head
slept
13 Jan 2006
Modifier
In the room
Complementation
He
[saw]verbal
[a lake]complement
S
Subject
He
13 Jan 2006
Predicate
Verbal
Complement
saw
a lake
Subordination
John slept [in]subordinator [the room]dependent unit
S
Subject
John
Predicate
Head
slept
Modifier
Subordinator
in
13 Jan 2006
Dependent Unit
the room
Coordination
[John came in time] independent unit [but]coordinator
[Mary was not ready] independent unit
S
Independent Unit
Coordinator
Independent Unit
John came in time
but
Mary was not ready
13 Jan 2006
An Example
In the morning, the sky looked much brighter.
S
Head
Modifier
Subordinator
Modifier
DU
Subject
Head Modifier Head Verbal
Predicate
Complement
Modifier
In
Head
the morning,the sky looked much brighter
13 Jan 2006
Hierarchical Breaking up and
Categorial / Functional Labeling
Hierarchical Breaking up coupled with Categorial
/Functional Labeling is a very powerful device.
But there are ambiguities which demand
something more powerful.
E.g., Love of God
Someone loves God
God loves someone
13 Jan 2006
Hierarchical Breaking up
Categorial Labeling
Love of God
Noun
Phrase
love
13 Jan 2006
Love of God
Prepositional
Phrase
of
Functional Labeling
God
Head
love
Modifier
Sub
DU
of
God
Types of Generative Grammar
Finite State Model
(sequential)
Phrase Structure Model
(sequential + hierarchical) + (categorial)
Transformational Model
(sequential + hierarchical + transformational)
+ (categorial + functional)
13 Jan 2006
Finite State Model
THE
OLD
THE
13 Jan 2006
The machine
begins in the
initial state, runs
through a
sequence of
states (producing
a word with each
transition), and
ends in the final
state (producing
a sentence)
Phrase Structure Model
13 Jan 2006
Phrase Structure Grammar (PSG)
A phrase-structure grammar G consists of a
four tuple (V, T, S, P), where
V is a finite set of alphabets (or vocabulary)
E.g., N, V, A, Adv, P, NP, VP, AP, AdvP, PP, student, sing,
etc.
T is a finite set of terminal symbols: T  V
E.g., student, sing, etc.
S is a distinguished non-terminal symbol, also called
start symbol: S  V
P is a set of production rules
13 Jan 2006
Noun Phrases
John
the student
NP
NP
NP
N
Det
N
John
the
student
13 Jan 2006
the intelligent student
Det
AdjP
N
the intelligent student
Noun Phrase
his first five PhD students
NP
13 Jan 2006
Det
Ord
Quant
N
his
first
five
PhD
N
students
Noun Phrase
The five best students of my class
NP
13 Jan 2006
Det
Quant
the
five
AP
N
best students
PP
of my class
Verb Phrases
can sing
can hit the ball
VP
VP
Aux
V
Aux
V
NP
can
sing
can
hit
the ball
13 Jan 2006
Verb Phrase
Can give a flower to Mary
VP
Aux
can
13 Jan 2006
V
NP
give a flower
PP
to Mary
Verb Phrase
may make John the chairman
VP
Aux
may
13 Jan 2006
V
NP
make John
NP
the chairman
Verb Phrase
may find the book very interesting
VP
13 Jan 2006
Aux
V
NP
may
find
the book
AP
very interesting
Prepositional Phrases
in the classroom
near the river
PP
13 Jan 2006
PP
P
NP
P
NP
in
the classroom
near
the river
Adjective Phrases
intelligent
very honest
AP
AP
A
Degree
intelligent
very
13 Jan 2006
fond of sweets
AP
A
A
honest fond
PP
of sweets
Adjective Phrase
• very worried that she might have done badly in the
assignment
AP
Degree
very
S’
A
worried
that she might have done badly in the
assignment
13 Jan 2006
Phrase Structure Rules
The boy hit the ball.
Rewrite Rules:
1.
2.
3.
4.
5.
6.
S
NP
VP
Det
N
V






NP VP
Det N
V NP
the
boy, ball
hit
We interpret each rule X  Y as the
instruction rewrite X as Y.
13 Jan 2006
Derivation
The boy hit the ball.
Sentence
NP + VP
Det + N + VP
Det + N + V + NP
The + N + V + NP
The + boy + V + NP
The + boy + hit + NP
The + boy + hit + Det + N
The + boy + hit + the + N
The + boy + hit + the + ball
13 Jan 2006
(1) S NP VP
(2) NP  Det N
(3) VP  V NP
(4) Det  the
(5) N  boy
(6) V  hit
(2) NP  Det N
(4) Det  the
(5) N  ball
PSG Parse Tree
The boy hit the ball.
S
NP
VP
Det
N
the
boy
13 Jan 2006
V
hit
NP
Det
N
the
ball
PSG Parse Tree
John wrote those words in the Book of Proverbs.
S
NP
PropN
VP
V
PP
NP
NP
P
John wrote
13 Jan 2006
NP
those
words
PP
in
the
book
of
proverbs
Transformational Model
13 Jan 2006
Transformational Grammar
If a generative grammar makes use of all the
three
Sequential
Hierarchical
transformational
breaking up and two
categorial
functional
labeling is called a Transformational
grammar (Universal Grammar).
13 Jan 2006
Other Grammar Formalisms
Lexical Functional Grammar (LFG)
Generalised Phrase Structure Grammar (GPSG)
Tree Adjoining Grammar (TAG)
Categorial Grammar (CG)
Head-driven Phrase Structure Grammar (HPSG)
Systemic Functional Grammar (SFG)
13 Jan 2006
Levels of Representation
in Universal Grammar (UG)
Lexicon
D(eep)-Structure
Move -alpha
S(urface)-Structure
PF
(phonetic form)
13 Jan 2006
LF
(logical form)
Interacting subsystems
UG consists of interacting subsystems
Various subcomponents of the rule system
of grammar
Subsystems of Principles
13 Jan 2006
Subcomponents
Subcomponents of the rule system
Lexicon
Syntax
Categorial component
Transformational component
PF-component
LF-component
13 Jan 2006
Principles
Subsystem of Principles
X-bar Theory
Theta-theory
Government
Binding Principles
Case Theory
Control Theory
13 Jan 2006
Issues in
Phrase Structure Grammar
Limitation
Overgeneration
Solutions
Subcategorization Restrictions
Selectional Restriction
13 Jan 2006
Overgeneration
Ungrammaticality
The boy relied on the girl.
* The boy relied the girl.
*The boy relied.
Grammatically sound but semantically odd
*The boy frightens sincerity.
*Sincerity kicked the boy.
13 Jan 2006
Ungrammaticality
Given sentences:
The boy relied on the girl.
* The boy relied the girl.
*The boy relied.
PS Rules:
VP
 V (NP) (PP)
NP
 Det N
V
 rely
Det  the
N
 boy | girl
13 Jan 2006
Subcategorization Frame
Specify the categorial class of the lexical
item.
Specify the environment.
Examples:
kick: [V; _ NP]
cry: [V; _ ]
rely: [V; _PP]
put: [V; _ NP PP]
think: : [V; _ S` ]
13 Jan 2006
Subcategorization Frame
forward
V
__ NP PP
e.g., We will be forwarding our new
catalogue to you
invitation
N
__ PP
e.g.,
An invitation to the party
e.g.,
A program making science is more
accessible to young people
accessible
A
__ PP
13 Jan 2006
Subcategorization Rules
Subcategorization Rule:
V
13 Jan 2006
y
/
_NP]
_]
_PP]
_NP PP]
_S`]
Applying Subcategorization Rules
• The boy relied on the girl.
1. S  NP VP
2. VP  V (NP) (PP) (S`)…
3. NP  Det N
4. V  rely / _PP]
5. P  on / _NP]
6. Det  the
7. N  boy, girl
* The boy relied the girl.
*The boy relied.
13 Jan 2006
Semantically Odd Constructions
Can we exclude these two ill-formed
structures ?
*The boy frightened sincerity.
*Sincerity kicked the boy.
Necessity of a mechanism
13 Jan 2006
Selectional Restrictions
Inherent Properties of Nouns:
[+/- ABSTRACT], [+/- ANIMATE]
E.g.,
Sincerity [+ ABSTRACT]
Boy [+ANIMATE]
Lexical information of this type can be used
to set up a context sensitive ‘rewrite rule’.
13 Jan 2006
Selectional Rules
A selectional rule specifies certain selectional
restrictions associated with a verb.
V
V
y /
frighten /
[+/-ABSTARCT] __
__ [+/-ANIMATE]
[+/-ABSTARCT] __
__ [+ANIMATE]
*The boy frightened sincerity.
*Sincerity kicked the boy.
13 Jan 2006
Nature of Transformation
Topicalization
Topicalized NP
Topicalized PP
Movement
Wh-movement
Relative Pronoun movement
13 Jan 2006
Topicalization
I can solve this problem.
This problem, I can solve.
I can solve *(this problem).
S
NP
Pron
I
13 Jan 2006
VP
Aux
can
V
solve
NP
Det
the
N
problem
Topicalization
This problem, I can solve.
S
NPi
Det
NP
N
VP
Pron
Aux
this
NP
problem
I
13 Jan 2006
V
can
solve t(race)i
Topicalization
To John, Mary gave the book.
S
PPi
P
NP
NP
N
VP
V
NP
PP
N
to
13 Jan 2006
John
Mary gave
Det
N
the
book
t(race)i
Wh-movement
John can solve this problem.
Which problem can John solve?
S
NP
VP
N
Aux
John
can
V
solve
NP
Det
this
13 Jan 2006
N
problem
Wh-movement
[Which problemi can John solve ti ? ]
S`
Comp
S
Aux
NP
VP
NPi
N
Wh-Det
which
13 Jan 2006
V
N
problem
can
John
solve
NP
t(race)i
Relative Pronoun Movement
John heard the claim which Bill made.
S
NP
N
VP
NP
V
John heard
Det
the
13 Jan 2006
N
S`
claimi …
Relative Pronoun Movement
[the claim whichi Bill made ti ].
NP
Det
S`
N
Comp
the
claimi
NP
13 Jan 2006
S
VP
NP
Rel-Pron
N
whichi
Bill
V
made
NP
t(race)i
Relative Pronoun Movement
[The problemi thati he solved ti was easy].
S
NP
VP
S`
Det
N
Comp
NP
S
Rel-Pron Pron
the
problemi
13 Jan 2006
thati
VP
NP
he
V
AP
was
A
V
NP
solved
t(race)i
easy
Parser Output
The
The problem
problem that
that he
he solved
solved was
was easy.
easy.
S
NP
DT
VP
NN
SBAR
that
ADJP
was
JJ
S
IN
the
problem
13 Jan 2006
AUX
NP
VP
PRP
VBD
he
solved
easy
X-bar Theory
It tells us how words are combined to
make phrases and sentences.
It captures the commonality between
different types of phrases, which PSrules cannot.
13 Jan 2006
X-bar Projection
XP (Maximal projection)
YP
X `(Intermediate projection)
X (Zero projection)
13 Jan 2006
ZP
X-bar Projection
XP (X-phrase)
YP(Specifier)
X`
X (Head)
13 Jan 2006
ZP
(Complement)
X-bar Projection
XP
X`
YP
(Specifier)
X`
X (Head)
13 Jan 2006
ZP
(Adjunct)
ZP (Complement)
X-bar Projection
NP
N`
NP
John’s
N
PP
solution
to the problem
13 Jan 2006
X-bar Projection
NP
N`
Det
the
N
discussion
13 Jan 2006
N`
PP
PP
In the cabinet
meeting
of the cricket match
X-bar Theory
[Specifier-Head-Complement]
SHC
[Specifier-Complement-Head]
SCH
[Head-Complement-Specifier]
HCS
Every phrase is endocentric. There is a specific
relation between the specifier and the head, i.e.,
Spec-Head configuration.
13 Jan 2006
C(onstituent)-command
C-command is a structural relation among the
terminal and non-terminal nodes in a syntactic tree
 c-commands  iff:
the first branching node dominating  also
dominates 
 does not dominate 
A
B
C
13 Jan 2006
E
D
F
G
C-command
NP
Det
N`
discussion
NP
P
PP
N
13 Jan 2006
PP
N`
the
of
P
NP
of
the cricket match
Det
the
N`
N
meeting
Government
 governs  iff
 is a lexical head (or tensed I)
 C-commands 
No barrier (VP, NP, PP, AP, or tensed IP)
intervenes between  and 
13 Jan 2006
Theta-Theory
Hit:
<1,2>
<Agent, Patient>
Smile: <1>
<Agent>
Forward: <1,2,3>
<Agent, Theme, Goal>
(argument structure)
(thematic structure)
(argument structure)
(thematic structure)
(argument structure)
(thematic structure)
Theta-Criterion
Each argument must be assigned a theta-role
Each theta-role must be assigned to an argument
13 Jan 2006
Thematic Roles
The man forwarded the mail to the minister.
forward
V
__ NP PP
Event
FORWARD
([
Agent
THE MAN],
[Goal
13 Jan 2006
[Theme THE MAIL],
TO THE MINISTER]
)
Binding Principles
A relation, called Binding
 binds  iff
 c-commands 
 and  are co-indexed
Rajivi likes himselfi.
13 Jan 2006
Binding
IP
NP
N`
I`
I
N
Rajiv Tense
AGR
…
VP
V`
NP
t
V
like
NP
N`
N
13 Jan 2006
himselfi
Binding
IP
NP
Rajiv’s
brother
I`
I
VP
V`
NP
Tense
AGR
…
t
V
like
NP
N`
N
13 Jan 2006
himselfi
Binding
Rajivi’s brotherj likes himself*i /j
[Rajiv’s brother] is the antecedent of [himself].
[Rajiv] cannot be the antecedent of [himself].
That is, the sentence cannot mean that “Rajivi’s
brother likes Rajivi”.
A particular kind of structural relation is
maintained between [Rajiv’s brother] and
[himself], but not between [Rajiv] and [himself].
This structural relation is called
C(onstituent)-command.
13 Jan 2006
Binding
For the purpose of interpretation, noun phrases
have been conveniently divided into three groups:
Anaphors (Reflexives and Reciprocals)
e.g., myself, yourself, each other, one another, etc
Pronouns
e.g. he, she, it, we, etc
R-Expressions
e.g., John, Mumbai
13 Jan 2006
Binding Principles
Principle A: An anaphor is bound in its governing
category
Rajivi likes himselfi
Principle B: A pronominal is free in its governing
category
Rajivi likes him*i / j
Principle C: An R-expression is always free
John likes Mary
Examples
We think that nobody likes us.
13 Jan 2006 *We think that nobody likes ourselves.
Natural Language Phenomena
Agreement
Subject-verb agreement
Agreement in Relative Pronouns (English):
The man who/*which I saw
The book which/*who I saw
Ambiguity
The mayor asked the police to stop drinking after midnight.
Yesterday I saw a crane in the campus.
Negation Scope
John did not deliberately broke the glass.
John deliberately did not broke the glass.
Quantifier Scope
Every student likes a teacher in the class.
Gapping
John bought a story book and Mary a pen.
Meena was crying because her mother was.
13 Jan 2006
Natural Language Phenomena
Scrambling effect
Slifting
John has robbed the bank, I believe.
Sluicing
John bought something but I don’t know what [John bought t].
Question
Auxiliary Inversion
Wh-fronting
Intonation
Wh-in situ
Control Structures
I compelled John to read this article.
I promised John to read this article.
13 Jan 2006
Suggested Readings
Chomsky, N. 1957. Syntactic Structures. Mouton, The
Hague.
Chomsky, N. 1981. Lectures on Government and
Binding. MIT, Mass.
Radford, A. 1988. Transformational Grammar. CUP.
Jurafsky, D and J. Martin, 2000. An Introduction to
Natural Language Processing, Computational
Linguistics, and Speech Recognition. Prentice Hall,
New Jersey.
Allen, James, 1995. Natural Language Understanding.
The Benjamins/Cummings Publishing Company, Inc.
UK.
13 Jan 2006
Thank You
13 Jan 2006