Class slides 2

Download Report

Transcript Class slides 2

Parameter Setting
Compounding Parameter
American Sign Language
Austrooasiatic (Khmer)
Finno-Ugric
Germanic (German, English)
Japanese-Korean
Sino-Tibetan (Mandarin)
Tai (Thai)
Basque
Afroasiatic (Arabic, Hebrew)
Austronesian (Javanese)
Bantu (Lingala)
Romance (French, Spanish)
Slavic (Russian, Serbo-Croatian)
Resultatives













Productive N-N Compounds













Developmental Evidence
• Complex predicate properties argued to appear as a group in
English children’s spontaneous speech (Stromswold & Snyder
1997)
• Appearance of N-N compounding is good predictor of
appearance of verb particle constructions and other complex
predicate constructions - even after partialing out contributions
of
–
–
–
–
Age of reaching MLU 2.5
Production of lexical N-N compounds
Production of adjective-noun combinations
Correlations are remarkably good
Sample Learning Problems
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Null-subject parameter
V2
Long-distance reflexives
Scope inversion
Argument structure alternations
Wh-scope marking
Complex predicates/noun-noun compounding
Condition C vs. Language-specific constraints
One-substitution
Preposition stranding
Disjunction
Subjacency parameter
etc.
Classic Parameter Setting
• Language learning is making pre-determined choices based on
‘triggers’ whose form is known in advance
• Challenge I: encoding and identifying reliable triggers
• Challenge II: overgeneralization
• Challenge III: lexically bound parameters
The Concern…
• “From its inception, UG has been regarded as that which makes
acquisition possible. But for lack of a thriving UG-based account
of acquisition, UG has come to be regarded instead as an
irrelevance or even an impediment. It is clearly open to the
taunt: ‘All that innate knowledge, only a few facts to learn, yet
you can’t say how!’” (Fodor & Sakas, 2004, p.8)
Input
utterance
corpus
Analyze Input
Action
Gibson & Wexler (1994)
• Triggering Learning Algorithm
– Learner starts with random set of parameter values
– For each sentence, attempts to parse sentence using current
settings
– If parse fails using current settings, change one parameter value
and attempt re-parsing
– If re-parsing succeeds, change grammar to new parameter setting
+ B -
+
A
-
Si
Gibson & Wexler (1994)
• Triggering Learning Algorithm
– Learner starts with random set of parameter values
– For each sentence, attempts to parse sentence using current
settings
– If parse fails using current settings, change one parameter value
and attempt re-parsing
– If re-parsing succeeds, change grammar to new parameter setting
+ B -
+
A
-
Si
Single Value Constraint
Greediness Constraint
Gibson & Wexler (1994)
• For an extremely simple 2-parameter space, the
learning task is easy - any starting point, any
destination
• Triggers do not really exist in this model
VO
SVO
VOS
OV
SOV
OVS
SV VS
Gibson & Wexler (1994)
• Extending the space to 3-parameters
– There are non-adjacent grammars
– There are local maxima, where current grammar and all
neighbors fail
VO
SVO
VOS
OV
SOV
OVS
SV VS
+V2
-V2
Gibson & Wexler (1994)
• Extending the space to 3-parameters
– There are non-adjacent grammars
– There are local maxima, where current grammar and all
neighbors fail
VO
SVO
VOS
OV
SOV
OVS
SV VS

+V2
-V2
String:
Adv S V O
Gibson & Wexler (1994)
• Extending the space to 3-parameters
– There are non-adjacent grammars
– There are local maxima, where current grammar and all
neighbors fail
– All local maxima involve impossibility of retracting a +V2
hypothesis
VO
SVO
VOS
OV
SOV
OVS
SV VS

+V2
-V2
String:
Adv S V O
Input
utterance
corpus
Analyze Input
Action
Sample Learning Problems
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Null-subject parameter
V2
Long-distance reflexives
Scope inversion
Argument structure alternations
Wh-scope marking
Complex predicates/noun-noun compounding
Condition C vs. Language-specific constraints
One-substitution
Preposition stranding
Disjunction
Subjacency parameter
etc.
Gibson & Wexler (1994)
• Solutions to local maxima problem
– #1: Initial state is
V2:
default to [-V2]
S:
unset
O:
unset
– #2: Extrinsic ordering
VO
SVO
VOS
OV
SOV
OVS
SV VS

+V2
-V2
String:
Adv S V O
Fodor (1998)
• Unambiguous Triggers
– Local maxima in TLA result from the use of ‘ambiguous triggers’
– If learning only occurs based on unambiguous triggers, local
maxima should be avoided
• Difficulties
– How to identify unambiguous triggers?
– Unambiguous trigger can only be parsed by a grammar that
includes value Pi of parameter P, and by no grammars that include
value Pj.
– A parameter space with 20 binary parameters implies 220 parses for
any sentence.
Fodor (1998)
• Ambiguous Trigger
– SVO can be analyzed by at least 5 of the 8 grammars in
G&W’s parameter space



VO
SVO
VOS
OV
SOV
OVS

+V2
-V2
Fodor (1998)
•
Structural Triggers Learner (STL)
– Parameters are treelets
– Learner attempts to parse input sentences using supergrammar, that
contains treelets for all values of all unset parameters, e.g., 40 treelets for
20 unset binary parameters.
– Algorithm
• #1: Adopt a parameter value/trigger structure if and only if it occurs as a part of
every complete well-formed phrase marker assigned to an input sentence by the
parser using the supergrammar.
• #2: Adopt a parameter value/trigger structure if and only if it occurs as a part of a
unique complete well-formed phrase marker assigned to the input by the parser
using the supergrammar.
Fodor (1998)
• Structural Triggers Learner
– If a sentence is structurally ambiguous, it is taken to be
uninformative (slightly wasteful, but conservative)
– Unable to take advantage of collectively unambiguous sets of
sentences, e.g. SVO and OVS, which entail [+V2]
– Still unclear (to me) how it manages its parsing task
Input
utterance
grammar
corpus
supergrammar
Generate Parses
Select Parses
Action
Charles Yang
Competition Model
• 2 grammars - start with even strength, both get credit for
success, one gets punished for failure
• Each grammar is chosen for parsing/production as a function of
its current strength
• Must be that increasing Pi for one grammar decreases Pj for
other grammars
• Is it the case that the presence of some punishment will
guarantee that a grammar will, over time, always fail to survive?
Competition Model
• Upon the presence of an input datum s, the child
– Selects a grammar Gi with the probability pi.
– Analyzes s with Gi.
– Updates competition
• If successful, reward Gi by increasing pi.
• Otherwise, punish Gi by decreasing pi.
• This implies that change only occurs when a selected grammar
succeeds or fails
Competition Model
• Linear reward-penalty scheme (LR-P, Bush & Mosteller, 1951)
– Given an input sentence s, the learner selects a grammar Gi with
probability pi, from the population of N possible grammars.
– If Gi --> s then
• p’i = pi + (1 - pi)
• p’j = (1 - )pj
– If Gi -/-> s then
• p’i = (1 - )pi
• p’j = /(N - 1) + (1 - )pj
Competition Model
• Linear reward-penalty scheme (LR-P, Bush & Mosteller, 1951)
– Given an input sentence s, the learner selects a grammar Gi with
probability pi, from the population of N possible grammars.
– If Gi --> s then
• p’i = pi + (1 - pi)
• p’j = (1 - )pj
– If Gi -/-> s then
• p’i = (1 - )pi
• p’j = /(N - 1) + (1 - )pj
The value p is a probability for the
entire grammar.
This rule suggests that all grammars
are affected on each trial, not only
the grammar that is currently being
tested. Other discussions in the book
do not clearly make reference to this.
From Grammars to
Parameters
• Number of Grammars problem
– Space with n parameters implies at least 2n grammars (e.g.
240 is ~1 trillion)
– Only one grammar is used at a time, so implies very slow
convergence
• Competition among Parameter Values
– How does this work?
• Each trial involves selection of a vector of parameters
[0, 1, 1, 0, 0, 1, 1, 1, …]
• Success or failure rewards/punishes all parameters, regardless
of their complicity in the outcome of the trial
– Naïve Parameter Learning model (NPL) may reward
incorrect parameter values as hitchhikers, or punish correct
parameter values as accomplices.
Avoiding Accomplices
• How could ill-placed reward/punishment be avoided?
– Identify which parameters are responsible for success/failure on a
given trial
– Parameters associated with lexical items/treelets
Empirical Predictions
• Hypothesis
Time to settle upon target grammar is a function of
the frequency of sentences that punish the
competitor grammars
• First-pass assumptions
– Learning rate set low, so many occurrences needed to lead
to decisive changes
– Similar amount of input needed to eliminate all competitors
Empirical Predictions
• ±wh-movement
– Any occurrence of overt wh-movement punishes a [-wh-mvt]
grammar
– Wh-questions are highly frequent in input to English-speaking
children (~30% estimate!)
– [±wh-mvt] parameter should be set very early
– This applies to clear-cut contrast between English and Chinese, but
…
• French: [+wh-movement] and lots of wh-in-situ
• Japanese: [-wh-movement] plus scrambling
Empirical Predictions
• Verb-raising
– Reported to be set accurately in speech of French children
(Pierce, 1992)
French: Two Verb Positions
a. Il ne voit pas le canard
he sees not the duck
b. *Il ne pas voit le canard
he not sees the duck
c. *Il veut ne voir pas le canard
he wants to.see not the duck
d. Il veut ne pas voir le canard
he wants not to.see the duck
French: Two Verb Positions
a. Il ne voit pas le canard
he sees not the duck
b. *Il ne pas voit le canard
he not sees the duck
c. *Il veut ne voir pas le canard
he wants to.see not the duck
d. Il veut ne pas voir le canard
he wants not to.see the duck
agreeing (i.e. finite)
forms precede pas
non-agreeing (i.e.
infinitive) forms
follow pas
French Children’s Speech
• Verb forms: correct
or default (infinitive)
• Verb position
changes with verb
form
finite infinitive
127
119
• Just like adults
(Pierce, 1992)
French Children’s Speech
• Verb forms: correct
or default (infinitive)
• Verb position
changes with verb
form
V-neg
122
neg-V
124
• Just like adults
(Pierce, 1992)
French Children’s Speech
• Verb forms: correct
or default (infinitive)
• Verb position
changes with verb
form
finite infinitive
V-neg 121
neg-V 6
1
118
• Just like adults
(Pierce, 1992)
Empirical Predictions
• Verb-raising
– Reported to be set accurately in speech of French children (Pierce,
1992)
– Crucial evidence
… verb neg/adv …
– Estimated frequency in adult speech: ~7%
– This frequency set as an operational definition of sufficiently
frequent for early mastery (early 2’s)
Empirical Predictions
• Verb-second
– Classic argument: V2 is mastered very early by
German/Dutch speaking children (Poeppel & Wexler, 1993,
Hageman, 1995)
– Yang’s challenges
• Crucial input is infrequent
• Claims of early mastery are exaggerated
Two Verb Positions
a. Ich sah den Mann
I saw the man
b. Den Mann sah ich
the man saw I
c. Ich will [den Mann sehen]
I want the man to.see[inf]
d. Den Mann will ich [sehen]
the man want I to.see[inf]
Two Verb Positions
a. Ich sah den Mann
I saw the man
b. Den Mann sah ich
the man saw I
c. Ich will [den Mann sehen]
I want the man to.see[inf]
d. Den Mann will ich [sehen]
the man want I to.see[inf]
agreeing verbs (i.e.
finite verbs) appear in
second position
non-agreeing verbs (i.e.
infinitive verbs) appear
in final position
German Children’s Speech
• Verb forms: correct
or default (infinitive)
• Verb position
changes with verb
form
finite infinitive
208
43
• Just like adults
Andreas, age 2;2
(Poeppel & Wexler, 1993)
German Children’s Speech
• Verb forms: correct
or default (infinitive)
• Verb position
changes with verb
form
V-2
203
V-final
48
• Just like adults
Andreas, age 2;2
(Poeppel & Wexler, 1993)
German Children’s Speech
• Verb forms: correct
or default (infinitive)
• Verb position
changes with verb
form
finite infinitive
V-2
197
V-final 11
6
37
• Just like adults
Andreas, age 2;2
(Poeppel & Wexler, 1993)
Empirical Predictions
• Cross-language word orders
–
–
–
–
–
Dutch:
Hebrew:
English:
Irish:
Hixkaryana:
SVO, XVSO, OVS
SVO, XVSO, VSO
SVO, XSVO
VSO, XVSO
OVS, XOVS
– Order of elimination
• Frequent SVO input quickly eliminates #4 and #5
• Relatively frequent XVSO input eliminates #3
• OVS is needed to eliminate #2 - only ~1.3% of input
Empirical Predictions
• But what about the classic findings by Poeppel & Wexler (etc.)?
– They show mastery of V-raising, not mastery of V-2
– Yang argues that early Dutch shows lots of V-1 sentences, due to
the presence of a Hebrew grammar (based on Hein corpus)
e.g. week ik niet [know I not]
Empirical Predictions
• Early Argument Drop
– Resuscitates idea that early argument omission in English is
due to mis-set parameter
– Overt expletive subjects (‘there’) ~1.2% frequency in input
Null Subjects
• Child English
– Eat cookie.
• Hyams (1986)
– English children have an Italian setting of null-subject
parameter
– Trigger for change: expletive subjects
• Valian (1991)
– Usage of English children is different from Italian children
(proportion)
• Wang (1992)
– Usage of English children is different from Chinese children
(null objects)
Empirical Predictions
• Early Argument Drop
– Resuscitates idea that early argument omission in English is due to
mis-set parameter
– Wang et al. (1992) argument about Chinese was based on
mismatch in absolute frequencies between Chinese & English
learners
– Yang: if incorrect grammar is used probabilistically, then absolute
frequency match not expected - rather, ratios should match
– Ratio of null-subjects and null-objects is similar in Chinese and
English learners
– Like Chinese, English learners do not produce wh-obj pro V?
Empirical Predictions
• Null Subject Parameter Setting
– Italian environment
• English [- null subject] setting killed off early, due to presence of
large amount of contradictory input
• Italian children should exhibit an adultlike profile very early
– English environment
• Italian [+ null subject] setting killed off more slowly, since
contradictory input is much rarer (expletive subjects)
• The fact that null subjects are rare in the input seems to play no
role
Poverty of the Stimulus
• Structure-dependent auxiliary fronting
– Is [the man who is sleeping] __ going to make it to class?
– *Is [the man who __ sleeping] is going to make it to class?
• Pullum: relevant positive examples exist (in the Wall Street
Journal)
• Yang: even if they do exist, they’re not frequent enough to
account for mastery by children aged 3;2 in Crain & Nakayama’s
experiments (1987)
Other Parameters
• Noun compounding/complex predicates
– English
• Novel N-N compounds punish Romance-style grammars
• Simple data can lead to elimination of competitors
– Spanish
• Lack of productive N-N compounds is irrelevant
• Lack of complex predicate constructions is irrelevant
• How can the English (superset) grammar be excluded?
• Subset problem
Subset Problem
• Subset problem is serious: all grammars are
assumed to be present from the start
– How can the survivor model avoid the subset problem?
Li
Lj
Other Parameters
• P-stranding Parameter
– English
• Who are you talking with?
• Positive examples of P-stranding punish non P-stranding
grammars
– Spanish
• ***Quien hablas con?
• Non-occurrence of P-stranding does not punish P-stranding
grammars
• Could anything be learned from the consistent use of piedpiping, or from the absence of P-stranding?
Other Parameters
• Locative verbs & Verb compounding
– John poured the water into the glass.
*John poured the glass with water.
– (*)John filled the water into the glass. <-- ok if [+V compounding]
John filled the glass with water.
– English
• Absence of V-compounding is irrelevant
• Simple examples above do not punish Korean grammar (superset)
• Korean grammar may be punished by more liberal properties elsewhere,
e.g. pile the table with books.
– Korean
• Occurrence of ground verbs in figure frame punishes English grammar
• Occurrence of V-compounds punishes English grammar
Other Parameters
• “Or as PPI” Parameter (Takuya)
– John didn’t eat apples or oranges
– English
• Neither…nor reading punishes Japanese grammar
– Japanese
• Examples of Japanese reading punish English???
Other Parameters
• Classic Subjacency Parameter (Rizzi, 1982)
– English: *What do you know whether John likes ___?
Italian:
ok
Analysis: Bounding nodes are (i) NP, (ii) CP (It.)/IP (Eng.)
– English
• Input regarding subjacency is consistent with Italian grammar
– Italian
• If wh-island violations occur, this punishes English grammar
• Worry: production processes in English give rise to non-trivial numbers
of wh-island violations.
Triggers
•
Triggers
– Unambiguous triggers - if they exist - do not have the role of identifying the
winner as much as punishing the losers
– Distributed triggers - target grammar is identified by conjunction of two
different properties, neither of which is sufficient on its own
• Difficult for Fodor and for Gibson/Wexler - no memory
• Survivor model: also no memory, but distributed trigger works by never punishing
the winner, but separately punishing the losers
Input
utterance
corpus
Analyze Input
Action
Input
utterance
grammar
corpus
supergrammar
Generate Parses
Select Parses
Action
Input
utterance
grammar
corpus
supergrammar
Generate Parses
Select Parses
Success
Failure
Overgeneralization
• When two values of a parameter both allow analysis of an input
utterance, overgeneralization is a problem.
• Presentation of sentences from a subset grammar will not
punish a superset grammar
Lexical Learning
• Alternative subcategorizations
– John believes Mary.
– John believes that Mary is a spy.
– John understands Mary.
– John understands that Mary is a spy.
– *John hopes Mary.
– John hopes that Mary is not a spy.
Lexical Learning
• It is easy to update lexical subcategorization frequencies because the alternatives are mutually incompatible
[but this doesn’t solve the argument structure learning problem entirely]
• This is harder for choices that stand in a superset/subset
relation, since the alternatives are not mutually incompatible
• Can parametric choices be characterized in a similar way?
• Utterance-based learning
– General difficulty in assessing informativeness of input
– Distributional information can be tied to units that are carried
forward in time: lexical items, parameters
– Distributional information can be used reliably when alternatives are
mutually incompatible
• Does this situation change if we move to corpus-based
learning?
Input
utterance
corpus
Analyze Input
Action
Input
Utterance vs. Corpus
Parse vs. Trigger
Analyze Input
Generate vs. Select
Grammar vs Supergrammar
Success vs failure
Action
Reward/punish; nothing
Informed update
Overgeneralization
Mutual incompatibility
Questions about Survivor Model
• How to address the Subset Problem?
• How to address the Hitchhiker/Accomplice problem,
i.e., improve blame-assignment
• How to use a realistic model of cross-language
variation?
• Better empirical evidence for multiple grammars in
child speech
Subset Problem
•
Problem
– If a [+P] grammar generates a superset of what a [-P] grammar generates,
then the [+P] grammar is never punished by presentation of sentences from
[-P]
– The fact that the extra sentences generated by [+P] never occur plays no
role
•
Analysis-by-synthesis
– “How would I have said that?”
– If a comprehender takes input sentence [meaning] and uses grammar to regenerate sentence, feedback available from (mis-)match with original
sentence
– If speaker generates sentence with superset [+P] grammar, then mismatch
can be used to punish [+P]
– Generation using [-P] grammar will never be punished
Blame Assignment
• Problem
– All grammars are treated as vectors of parameter values
[0, 0, 1, 0, 1, 1, 1, 0, 1, …]
– Success or failure in parsing any individual trial punishes or
rewards all parameter values in the grammar, regardless of their
contribution to the success/failure (‘hitchhikers’, ‘accomplices’)
– How to update only the relevant parameter values?
• ‘Relevant’ parameter
– Fodor: only treelets that must be used are relevant on any trial
– Survivor model: reward/punish values of only those parameters
(treelets) that can be used on any given trial
Lexicalized Parameters
•
Problem
– It is hard to characterize natural language grammars as a fixed-length
vector of parameters
[0, 0, 1, 0, 1, 1, 1, 0, 1, …]
– Variation is associated with specific lexical items (e.g., domains for
anaphors), whose number varies across languages
•
Answer
– Task is to learn properties of lexical items
– Reward/punish properties of lexical items used in the sentence only
•
Challenge
– How to handle cases where items compete for realization, e.g., pronoun vs.
anaphor?
Other Information Sources
• All parameter setting models
– Syntactic properties are learned only through success or failure in
parsing input sentences
• Non-syntactic cues (e.g., Pinker ‘semantic bootstrapping’)
– Semantic information cues syntactic properties of words
– ‘Linking rules’ for verb argument structure
e.g., [manner-of-motion] -> V NPfigure PPground
John poured water into the glass
*John poured the glass with water
Role of Probabilistic Information
• Probabilistic information has limited role
– It is used to predict the time-course of hypothesis evaluation
– It contributes to the ultimate likelihood of success in only a very
limited sense
– It does not contribute to the generation of hypotheses - these are
provided by a pre-given parameter space
– By gradually accumulating evidence for or against hypotheses, the
model becomes somewhat robust to noise
• One-grammar-at-a-time models
– Negative evidence (i.e. parse failure) has drastic effect
– Hard to track degree of confidence in a given hypothesis
– Therefore hard to protect against fragility
Statistics as Evidence or as Hypothesis
• Phonological learning
– Age 0-1: developmental change reflects tracking of surface
distribution of sounds?
– Age 1-2: [current frontier] developmental change involves
abstraction over a growing lexicon, leading to more efficient
representations
• Neural Networks etc.
– Statistical generalizations are the hypotheses
• Lexical Access
– Context guides selection among multiple lexical candidates
• Syntactic Learning
– Survivor model: statistics tracks accumulation of evidence for pregiven hypotheses