Introduction to Computational Natural Language

Transcript Introduction to Computational Natural Language

Introduction to Computational Natural
Language Learning
Linguistics 79400 (Under: Topics in Natural Language Processing)
Computer Science 83000 (Under: Topics in Artificial Intelligence)
The Graduate School of the City University of New York
Fall 2001
William Gregory Sakas
Hunter College, Department of Computer Science
Graduate Center, PhD Programs in Computer Science and Linguistics
The City University of New York
1
A Very Brief Linguistic History of Modern Syntax I:
Phrase structure rules and transformation (Chomsky
1957, 1965)
Movement and subject-aux inversion transformations:
He will give a bone to the dog.
He will give what to the dog.
What will he give t to the dog?
Base generated
structure
Surface
structure
But serious problem. The theory gives too many
transformations without a good criteria for the
learner to select among them.
(However see Wexler and Culicover (1980) for an impressive
formal characterization of how to learn transformations.)
2
A Very Brief Linguistic History of Modern Syntax II:
X-bar theory (Jackendoff, 1977)
Specific internal relationships between phrasal components
regardless of what the "main" or head category of the
phrase is.
Roughly. the underlying principles that determine how the
ingredients of a noun phrase, verb phrase, prepositional
phrase (etc.) interact, are identical.
Moreover, these principles apply cross linguistically.
In English the head of a phrase (e.g. a verb or a preposition)
comes before its argument (e.g. an object), whereas in
Japanese, the head follows its argument.
3
A Very Brief Linguistic History of Modern Syntax II:
principles and parameters (Chomsky, 1981)
i.
X-bar theory
ii.
Movement transformations (and of course the notions of
base structure and surface structure)
iii. Natural languages are assumed to share the same innate
universal principles governing i and ii and to differ only with
respect to their lexicons and the settings of a finite
number of parameters.
Universal Grammar (UG) = = principles and parameters.
A natural language (human) grammar = = a lexicon and an array
of settings (or values) of the parameters + UG.
4
principles and parameters (con't)
For example:
all languages have subjects of some sort
Overt subjects are optional (yes / no)
(e.g. English)
(e.g Spanish)
on
off
Null Subject Parameter
Language acquisition is the process of selecting the correct
value of each such parameter for the language the learner
is exposed to.
5
A Bit of review from first lecture:
Why computationally model natural language
acquisition?
Pinker (1979) :
"...it may be necessary to find out how language learning
could work in order for the developmental data to tell
us how is does work." [emphasis mine]
6
Learnability Is the learner guaranteed to converge on
the target grammar for every language in a given
domain?
Gold (1967), Wexler and Culicover (1980), Gibson & Wexler (1994), Kanazawa (1994)
An early learnability result (Gold, 1967)
Exposed to input strings of an arbitrary target
language generated by grammar Gtarg, it is impossible
to guarantee that any learner can converge on Gtarg if
Gtarg is drawn from any class in the Chomsky hierarchy.
(E.g. context-free grammars).
7
Gold’s result is sometimes taken to be strong
evidence for a nativist Universal Grammar.
1) Psycholinguistic research indicates that children learn
grammar based on positive exemplar sentences.
2) Gold proves that Greg Gcfg Gcs Gre can’t be learned this way.
Conclude:
some grammatical competence must be in place before
learning commences.
Gold’s result is often misapplied, but much discussion.
8
Back to new stuff: Another Learnability result:
All classes of grammars possible within the principles
and parameters framework are learnable because each
class is of finite size.
In fact a simple Blind Guess Learner is guaranteed to
succeed in the long run for any finite class of grammars.
But, is this not-so-blind-learner guaranteed (it has a parse test)
to converge on all possible targets in a P&P domain?
Error-driven Blind-Guess Learner (no oracle):
1. randomly hypothesize a current grammar
2. consume and attempt to parse a sentence from the
linguistic environment
3. If the sentence is parsable by the current grammar, go to 2.
Otherwise go to 1.
9
Feasibility Is acquisition possible within a reasonable
amount of time and/or with a reasonable amount
of work?
Clark (1994, in press), Niyogi and Berwick (1996), Lightfoot (1989) (degree-0), Sakas(2000),
Tesar and Smolensky (1996) and many PAC results concerning induction of FSA’s
Feasibility measure (Sakas and Fodor, 2001)
Near linear increase of the expected number of
sentences consumed before a learner converges on
the target grammar.
10

Introduction to Computational Natural Language

Transcript Introduction to Computational Natural Language

Directory