2 a - Hunter College

Download Report

Transcript 2 a - Hunter College

Introduction to Computational Natural
Language Learning
Linguistics 79400 (Under: Topics in Natural Language Processing)
Computer Science 83000 (Under: Topics in Artificial Intelligence)
The Graduate School of the City University of New York
Fall 2001
William Gregory Sakas
Hunter College, Department of Computer Science
Graduate Center, PhD Programs in Computer Science and Linguistics
The City University of New York
1
Feasibility Is acquisition possible within a reasonable
amount of time and/or with a reasonable amount
of work?
Clark (1994, in press), Niyogi and Berwick (1996), Lightfoot (1989) (degree-0), Sakas(2000),
Tesar and Smolensky (1996) and many PAC results concerning induction of FSA’s
Feasibility measure (Sakas and Fodor, 2001)
Near linear increase of the expected number of
sentences consumed before a learner converges on
the target grammar.
2
Infeasible
Feasible
800
# sents consumed before
convergence on target grammar
# sents consumed before
convergence on target grammar
35000
30000
25000
20000
15000
10000
5000
0
0
5
10
15
20
# of param eters
700
600
500
400
300
200
100
0
0
5
10
15
20
# of param eters
Feature of learning model
to be tested as feasible or not feasible.
This could be # of parameters,
average sentence length, etc.
3
A Feasibility Case Study :
A three parameter domain (Gibson and Wexler, 1994)
SV / VS - subject precedes verb / verb precedes subject
VO / OV - verb precedes object / object precedes verb
+V2 / -V2 - verb or aux must be in the second position in the sentence
Sentences are strings of the symbols: S, V, 01, 02, aux, adv
Allie will eat the birds  S aux V O
4
Surprisingly, G&W’s simple 3-parameter domain
presents nontrivial obstacles to several learning
strategies, but the space is ultimately learnable
example by a "blind-guess-learner" – see next slide).
(for
Big question:
How will the learning process scale up in terms of
feasibility as the number of parameters increases?
Two problems for most acquisition strategies:
1) Ambiguity
2) Size of the domain
5
"Size of the domain" feasibility result:
A Blind-guess learner succeeds only after consuming
a number of sentences exponentially correlated with the
number of parameters.
Blind-guess learner (with oracle, no parsing, no error-driven constraint):
1)
consume a sentence from the linguistic environment.
2)
randomly select a setting for each parameter
== randomly pick a grammar out of the domain.
3)
if the settings indicate the target grammar stop, otherwise goto 1.
A simple math result shows that the average number of trials required to
achieve success* (in this case picking the target grammar) is:
1 / Probability of success.
So if we can figure out the probability of picking the target grammar at step
3, we can model how many inputs the learner will require on average.
* given that each trial has a possible outcome of either success or failure.
minded, this is a hypergeometric distribution.
For those technically
6
Probability of Success == Probability of picking the target
grammar at step 3, as is calculated as:
Pr(Success) = 1 / Num of grammars
= 1 / 2
num of parameters
Applying the math result for number of trials on average:
1 / (1 / 2
num of parameters)
=
2
num of parameters
If # Parameters = 30
then # Grammars = 230
= 1,073,741,824
So, on average, over a billion sentences
would need to be consumed by the blindguess learner before attaining the target.
The search space is huge!
7
Cross-language ambiguity
SV VO -V2
(English-like)
SV
SVO
S V O1 O2
S AUX V
S AUX V O
S AUX V O1 O2
ADV S V
ADV S V O
ADV S V O1 O2
ADV S AUX V
ADV S AUX V O
ADV S AUX V O1 O2
SV OV +V2
(German-like)
ADV V S
ADV V S O
ADV V S O2 O1
ADV AUX S V
ADV AUX S O V
ADV AUX S O2
O1 V
SV
SVO
OVS
S V O2 O1
O1 V S O2
O2 V S O1
S AUX V
S AUX O V
O AUX S V
S AUX O2 O1 V
O1 AUX S O2 V
O2 AUX S O1 V
Indicates a few ambiguous strings
8
P&P acquisition:
How to obtain informative feasibility results
studying linguistically interesting domains with
cognitively plausible learning algorithms?
9
Create an input space for a linguistically plausible
domain. But won't work for large domains.
-- simulations.
(Briscoe (2000), Elman (1990, 1991,1996), Yang (200))
So, how to answer questions of feasibility
as the number of grammars (exponentially)
scales up?
Answer:
introduce some formal notions in order to abstract
away from the specific linguistic content of the
input data.
10
A formal approach
1) formalize the learning process and
input space
2) use the formalization in a
probabilistic model to empirically test
the learner across a wide range of
learning scenarios
Gives general data on the expected
performance of acquisition algorithms.
Can answer the question:
Given learner L, if the input space
exhibits characteristics x, y and z, is
feasible learning possible?
11
The Triggering Learning Algorithm -TLA (Gibson and Wexler, 1994)
Searches the (huge) grammar space using local heuristics
repeat until convergence:
receive a string s from L(Gtarg)
SVC
Error-driven
if it can be parsed by Gcurr , do nothing
otherwise, pick a grammar that differs by
one parameter value from the current grammar
if this grammar parses the sentence,
make it the current grammar, otherwise do nothing
Greediness
12
TLA-minus
(Sakas and Fodor, 2001)
simplified TLA used for introducing a way of studying the effects of
cross-language ambiguity through some simple formulas
repeat until convergence:
receive a string s from L(Gtarg)
if it can be parsed by Gcurr , do nothing
otherwise, pick a grammar at random
(not necessarily one parameter distant)
if this grammar parses the sentence,
make it the current grammar, otherwise do nothing
13
Some variables for TLA- analysis:
n =
number of parameters in the domain
r =
number of relevant parameters in the domain
a =
number parameters expressed ambiguously by an arbitrary
input sentence
An example of n and r :
Suppose there are 5 parameters in the domain under study and the learner
is being exposed to a language for which either setting of parameter 5 (p5)
is "correct."
In this case n = 5 and r = 4 and,
there are 2n = 25 = 32 grammars in the domain and,
2n-r =
25-4 = 21 = 2 grammars that the learner can consider a successful
convergence.
14
Some variables for TLA- analysis (con't):
An example of a :
Suppose a sentence, s, is consumed by a learner where a successful
parse can be acquired ONLY in the case that p1 and p2 set to 0 and
1, respectively, but the settings for p3 and p4 don't matter (and of
course, as we said previously, the setting of p5 doesn't matter for the entire target
language).
In this case a = 2. That is, 2 parameters are ambiguous - the
settings of p3 and p4.
15
Some formulas for TLA- analysis:
Recall:
# grammars that make up the cluster of legit target grammars
= 2n-r
# grammars that are ambiguous
= 2a
Number of grammars that license an arbitrary sentence s :
= 2n-r * 2a
= 2a+n-r
** Probability that an arbitrary sentence s from the target
language can be parsed by an arbitrary grammar: **
= # grammars that can parse / # grammars in the domain
= (2n-r * 2a) / (2n)
= 2a-r
16
Some formulas for TLA- analysis (con't):
Recall:
Number of grammars that license an arbitrary sentence s = 2a+n-r
Probability that one of those grammars is in the 2n-r target clusters:
= # grammars that make up the cluster divided by
# grammars that can parse the current input
= 2n-r / (2a+n-r)
= 1 / 2a
17
A tree of choices and outcomes for the TLABlue is probability
Red is action
Black is outcome
G' = = Gtarget
Retain Gcurr
and get another
sentence.
Retain G'
and get another
sentence.
Sentence 
and will be
retained "in the
limit." (This is due
to the errordriven constraint)
Pick G'
Forget about G'.
Keep Gcurr (this is
Greediness) and
get another
sentence.
18
19