Transcript Syntax

Syntax
Sudeshna Sarkar
25 Aug 2008
1
Some Fundamental Questions
What is Language?
How to define a Language?
What makes a language different from another?
Is there anything common to all languages?
2
Syntax
Syntax: from Greek syntaxis, “setting out together,
arrangmenet’
Refers to the way words are arranged together, and
the relationship between them.
Distinction:
Prescriptive grammar: how people ought to talk
Descriptive grammar: how they do talk
Goal of syntax is to model the knowledge of that
people unconsciously have about the grammar of
their native language
3
The Two Schools
Rationalists
It’s all hardcoded in our brains
Principle and Parameter
Theory
Poverty of Stimulus
Recursion
Empiricists
Just a special kind of pattern
recognition
No different from other
cognitive abilities like vision
Language is a stochastic
phenomenon
4
The Generative Grammar
“The grammatical principles underlying
languages are innate and fixed, and the
differences among the world's languages can
be characterized in terms of parameter
settings in the brain …”
- www.wikipedia.org
Noam Chomsky [1928-]
Courtesy www.chomsky.info
5
I & E Languages
I – Language: Mentally represented system of rules (I
– internal)
E – Language: Observable external products of Ilanguage (written text, utterances)
Language: Collective E-language of a very large
group of speakers
Syntax: Study of the I-language from E-language
6
The Chomsky Hierarchy
Gram
mar
Languages
Automaton
Type-0
Recursively
enumerable
Turing machine
Type-1
Contextsensitive
Type-2 Context-free
Type-3
Regular
Linear-bounded nondeterministic Turing
machine
Non-deterministic
pushdown automaton
Finite state automaton
Production
rules
No
restrictions
αAβ → αγβ
A→γ
A → aB
A→a
7
From Formal to Natural Languages
Organizational Unit
Complexity
Word
Regular
Sounds
Regular
Sentence
Context-free
Discourse
??
8
Some Observations on NLs
Constituency: A group of words acts as a single unit –
phrases, clauses etc.
Grammatical Relations: Different words/ phrases are
related to the main verb of the sentence – object,
subject, instrument
Subcategorization and Dependency Relations: Not all
verbs can take all type of arguments – transitive,
intransitive etc.
9
Syntax
Why should you care?
Grammar checkers
Question answering
Information extraction
Machine translation
10
Why NLP is difficult:
Newspaper headlines
Iraqi Head Seeks Arms
Juvenile Court to Try Shooting Defendant
Teacher Strikes Idle Kids
Stolen Painting Found by Tree
Local High School Dropouts Cut in Half
Red Tape Holds Up New Bridges
Clinton Wins on Budget, but More Lies Ahead
Hospitals Are Sued by 7 Foot Doctors
Kids Make Nutritious Snacks
11
Why is NLU difficult? The hidden structure of
language is hugely ambiguous
Tree for: Fed raises interest rates 0.5% in effort to
control inflation (NYT headline 5/17/00)
12
Where are the ambiguities?
13
The bad effects of V/N ambiguities
14
Context-Free Grammars
Capture constituency and ordering
Ordering is easy
What are the rules that govern the ordering of
words and bigger units in the language
What’s constituency?
How words group into units and how the various
kinds of units behave wrt one another
15
Constituency
We have NLP classes from 5:30 to 6:30 pm on Tuesday.
On Tuesday we have NLP classes from 5:30 – 6:30 pm.
From 5:30 to 6:30 pm on Tuesday we have NLP classes.
We have NLP on Tuesday from 5:30 to 6:30 pm classes.
On we have NLP classes from Tuesday 5:30 to 6:30 pm.
From 5:30 we have to 6:30 pm on Tuesday NLP classes.
16
Constituency
We have NLP classes from 5:30 to 6:30 pm on Tuesday.
On Tuesday we have NLP classes from 5:30 – 6:30 pm.
From 5:30 to 6:30 pm on Tuesday we have NLP classes.
We have NLP on Tuesday from 5:30 to 6:30 pm classes.
On we have NLP classes from Tuesday 5:30 to 6:30 pm.
From 5:30 we have to 6:30 pm on Tuesday NLP classes.
17
Phrases
Phrase: Group of words that act as a unit
Noun Phrase NP
– A midsummer night’s dream, My experiments with truth,
The man who knew infinity
Verb Phrase VP
– Gone with the wind, Saving private Ryan
Prepositional Phrases PP
– Of sons and lovers, to sir with love, Beyond the blue
mountains, Into the heart of the mind
18
Modelling the Syntax of English
Let us try CFGs
S  NP VP
I love India.
S  VP
Love your country.
S  Aux NP VP
Do you love your country?
S  Wh-NP VP
Who loves his country?
S  Wh-NP Aux NP VP
Which country do you live in?
19
Phrase Structure Grammar
Context Free Grammars are also called
phrase structure grammars
Phrases are the building blocks of any PSG
(i.e. CFG)
Phrases in turn are defined by CFG (PSG)
20
Is CFG Necessary?
Can we model the syntax of English using Regular
Grammar?
NO! we cannot model recursion in RG
S  NP VP
VP  Verb S
I think that Einstein thought that Newton said …
21
CFG Examples
S -> NP VP
NP -> Det NOMINAL
NOMINAL -> Noun
VP -> Verb
Det -> a
Noun -> flight
Verb -> left
22
CFGs
S -> NP VP
This says that there are units called S, NP, and VP
in this language
That an S consists of an NP followed immediately
by a VP
Doesn’t say that that’s the only kind of S
Nor does it say that this is the only place that NPs
and VPs occur
23
Context Free Grammars
A CFG consists of a tuple (N,T,S,P)
N is a finite set of non-terminal symbols
T is a finite set of terminal symbols
S is the start symbol
P is a finite set of rules of the form X   where X
 N and {N U T}*
24
Phrase Structure Parsing
Phrase structure organizes words into phrases, often
called constituents
This organization is hierarchical
For a given string there is often ambiguity about the
correct phrase structure
This ambiguity often corresponds to semantic
ambiguity
25
26
Simple examples of a CFG
Take the non-terminals = {S, NP, VP, V}
And the terminals {boys, study, play, books, cricket)
Let the start symbol be S
Let the rule set be
S  NP VP
VP  V
VP  V NP
NP  boys
NP  books
NP  cricket
V study
V play
This CFG licenses a
finite number of tree
sentences
27
Generativity
As with FSAs and FSTs you can view these
rules as either analysis or synthesis machines
Generate strings in the language
Reject strings not in the language
Impose structures (trees) on strings in the
language
28
Derivations
A derivation is a sequence of rules applied to a string
that accounts for that string
Covers all the elements in the string
Covers only the elements in the string
29
Derivations as Trees
30
Two views of linguistic structure: 1.
Constituency (phrase structure)
Phrase structure organizes words into
nested constituents.
How do we know what is a constituent?
(Not that linguists don't argue about some
cases.)
Distribution: a constituent behaves as a unit that
can appear in different places:
– John talked [to the children] [about drugs].
– John talked [about drugs] [to the children].
– *John talked drugs to the children about
Substitution/expansion/pro-forms:
– I sat [on the box/right on top of the box/there].
Coordination, regular internal structure, no
intrusion, fragments, semantics, …
31
Two views of linguistic structure: 2.
Dependency structure
Dependency structure shows which words depend
on (modify or are arguments of) which other words.
put
boy
The boy put the tortoise on the rug
The
tortoise on
rug
the
the
32
Parsing
Parsing is the process of taking a string and a
grammar and returning a (many?) parse tree(s) for
that string
It is completely analogous to running a finite-state
transducer with a tape
It’s just more powerful
– Remember this means that there are languages we can
capture with CFGs that we can’t capture with finite-state
methods
33
Other Options
Regular languages (expressions)
Too weak
Context-sensitive or Turing equiv
Too powerful (maybe)
34
Context?
The notion of context in CFGs has nothing to do with
the ordinary meaning of the word context in language.
All it really means is that the non-terminal on the lefthand side of a rule is out there all by itself (free of
context)
A -> B C
Means that
I can rewrite an A as a B followed by a C regardless of the
context in which A is found
Or when I see a B followed by a C I can infer an A regardless
of the surrounding context
35
Key Constituents (English)
Sentences
Noun phrases
Verb phrases
Prepositional phrases
36