Class 7: Detecting Triggers - CUNY

Download Report

Transcript Class 7: Detecting Triggers - CUNY

Introduction to
Language Acquisition Theory
Janet Dean Fodor
St. Petersburg July 2013
Class 7. How could children detect
the triggers in their input?
1
Preview of today’s class
 Last time: How much parametric information does the
input contain? We found unambiguous triggers, which
could permit ‘squeaky-clean’ deterministic P-setting.
 Today, how well can LM extract the info that is there?
 Recap: Input guidance (parametric decoding) extracts
much more information per sentence than trial-and-error
grammar testing.
 But today we’ll see there are practical limits on decoding:
 Ambiguous inputs can’t be exhaustively decoded.
 Syntactic parsing is not fail-safe.
 So we may have to settle for decoding that is not squeaky
clean. A non-deterministic version of the treelet-based
LM, which engages in some guesswork.
2
On the trail of the optimal learner (= a child)
First decision: Search or decode?
To set parameters
search domain
(trial-and-error)
decode input sentences
if ambig, guess
½ random with
walk
memory
TLA
wait for unambig
innate list of
unambig triggers
parser finds
unambig triggers
Yang’s
VM
3
Recap Class 2: On-line decoding is best
 Wasteful to choose a grammar hypothesis (Gh) to try
out, before examining the properties of the input
sentence (as the domain-search models do).
 Instead, let the parsing process reveal which grammar
could work.
 Start with Gc (LM’s current grammar). If succeeds, fine.
If it fails, it will fail at a particular place in the parse tree
being built. Failure = the next word of the sentence can’t
be fitted into the tree structure.
 Details of the tree structure there + the incoming word
provide clues to what new parameter value (treelet) is
needed in order to make them compatible, i.e., to bridge
the gap in the parse tree.
 Simple example in Class 2: preposition-stranding.
4
Patching up gaps in parse trees
 FR: Pierre compte souvent ses poulets.
EN: Peter often counts his chickens.
 In FR, the verb moves up out of VP, into TP.
In EN, the Tense moves down onto the verb in VP.
(See trees, next slides)
 A child just beginning to acquire adverbs. She knows what
‘souvent/often’ means but not how to use it in a sentence.
 The FR learner can parse as far as ‘Pierre compte’ but is
expecting the object right after the verb. She needs a treelet
that lets her attach the Adv into the tree before the object.
 An EN learner expects the verb to follow the subject. She
needs a treelet that lets her attach Adv into the tree
preceding the verb.
5
Treelets for VP-adverbs
 In GB theory this difference is not due to a word-order
parameter. It is due to two different ways in which a verb
can acquire tense (e.g., walk+Past  walked).
 Both children need a treelet in which the Adv is leftadjoined to the VP:
Why left?
Because this adverb
precedes the object
in both FR & EN.
 But this adjunction treelet must fit inside a larger treelet
which makes it possible for the verb to acquire tense even
though Adv prevents it from being adjacent to the T head
of the clause (TP).
 UG offers two different treelets for achieving this.
6
Parametric treelets for V-raising & Tns-lowering
 French: Verb raising
English: Affix-hopping
(= tense lowering)
 The French learner will find that only the Verb Raising
treelet fills the gap in her parse tree.
 The English learner will find that only the Tns-lowering
treelet fills the gap in her parse tree.
7
The greater efficiency of on-line decoding
compared with domain search models
 With decoding, an input sentence provides a word-by-word
guide on how to improve Gc when necessary.
This extracts far more info from the input than just
succeed/fail feedback on an antecedently selected Gh.
 So this on-line learning-by-parsing approach predicts faster
convergence on the target grammar.
 We measure learning speed in terms of how many input
sentences are needed for convergence.
 Calculated by Fodor & Teller (2000). For 30 parameters, the
TLA needs over a billion more input sentences than
a (perfect) decoding learner needs.
 Savings due to decoding confirmed in simulation studies
compared with both TLA and VM (Fodor & Sakas, 2004).
8
Second decision: Within decoding,
how to deal with parametric ambiguity?
decode input sentences
if ambig, guess
wait for unambig
innate list of
unambig triggers
parser finds
unambig triggers
9
For deterministic learning,
LM must wait for unambiguous input
 Deterministic learning is good – it avoids to-and-fro resetting of parameters; it predicts children make few errors
of comission; it solves the SP-retrenchment problem.
 But rigorous discipline! A deterministic LM must not set
parameters based on parametrically ambiguous input.
If a sentence is ambiguous, it must be rejected for learning
(though the child may comprehend and respond to it).
 In Class 6 we estimated that there might be enough
unambiguous input for a deterministic LM to set all the
parameters accurately.
 But we stalled on: How could LM know whether an input
sentence is parametrically unambiguous?
10
How to recognize unambiguity?
 An innate list of unambiguous triggers?
That seems to be what Chomsky’s original ‘switch-setting’
model would have required.
 Each parameter was associated with a 2-way switch,
pre-programmed with knowledge of what sentences each
setting could license. Presumably via some sort of Elanguage pattern detector, but that was never specified.
 When an unambiguous sentence is heard, it activates the
appropriate pattern detector, thus goes directly to the
relevant switch, and flips it if set wrong.
 When an ambiguous sentence is heard, it would activate
the pattern detectors of two or more switches.
 For deterministic learning, we could assume that conflicting
activations cancel each other out, so no switch is re-set.
 Easy! But implausible: Huge reliance on innate E-language.
11
Better: Can the parser
detect parametric unambiguity?
 A parallel parsing mechanism could do so. Compute all
possible parse-trees for the sentence, using all possible
treelets. If there’s more than one parse, it’s ambiguous.
 But even adults don’t have the capacity to parallel parse. 
 Many psycholinguists claim that parsing is only serial (only
one analysis is computed). Some assume there’s limited
parallelism. But not enough for full ambiguity detection.
 Within capacity of a serial parser: A sentence is unambig if
the parser finds no choice-points in the parse. 
 Good, but: We’ll see that many input sentences must be
discarded because of choice-points in the parse – even
though they are not in fact parametrically ambiguous.
 Maybe a temporary ambiguity only, resolved later in the
sentence. E.g. Sue knows Bill (is a librarian).
12
A super-cautious learner may fail
 Everything to the right of a choice-point is off-limits for
cautious learning. But it is possible that every trigger for Pi(v)
is to the right of a choice-point! If so, Pi(v) can’t be set.
 E.g. In every language that’s +Subj-initial and –NullSubj, the
initial subject is ambiguous between a topic and a non-topic.
That blocks learning from any part of the sentence!
 A merely temporary ambiguity: Sentence starts NPACC V,
could be NullSubj with no movement of NP or V. Or could be
VerbSecond (NP and V both move). But it continues with
NPDAT which rules out the no-movement analysis.
 Only one potential analysis survives, but the parser can’t
know that unless it computes both = parallel parsing!***
 Conclusion: Even if the language domain does contain
unambiguous triggers, LM may fail to recognize them.
Discarded unnecessarily, due to lack of parsing power.
13
Unrealistic, in any case, to assume
the parser never makes mistakes
 Even if the learning mechanism aims to be squeaky clean,
its front end is a parsing mechanism which is not.
 Even in adult parsing, with a complete and correct
grammar, parsing mistakes occur on-line.
 Especially relevant here:
There’s a tendency to overlook the presence of ambiguity,
i.e., to fail to spot a choice-point. (Examples, next slide.)
 Thus a sentence for which the parser reports only analysis
A might in reality be ambiguous between A and B.
 That can be bad for comprehension. But a disaster for
learning: it could mis-set a parameter.
14
Adults sometimes overlook ambiguity
(1) The horse raced past the barn fell.
Could mis-trigger the P-value for asyndetic
coordination. ( …past the barn and fell. As in Spanish.)
(2) While Mary was mending the sock fell off her lap.
Could mis-trigger [+null subject]
( …mending the sock, pro fell off her lap.)
(3) Put the frog on the napkin in the box.
See scenario 
Does PP1 modify the preceding noun, or is it an argument in
the VP? On-line, the parser favors VP-attachment (napkin is
GOAL). But 2nd PP then forces reanalysis: (napkin modifies
frog; box is GOAL).
 In this case, adults typically recover from the garden path.15
Children overlook ambiguity more than adults
 Trueswell, Sekerina, Hill & Logrip (1999)
Act-out task, with eye-movements monitored. Age 4;8 – 5;10
Put the frog on the napkin in the box.
16
A learner’s parser is not fully reliable
 Most common response by children:
Put the frog (THEME) onto the spare napkin (GOAL),
then put it/them into the box (GOAL-2) – or ignore the box!
 Adult control group: Early gaze pattern like children (on-line
garden path). But their action response is correct: pick up
[the frog that is on the napkin] (THEME) and put it into the
box (GOAL). Shows they have recovered from the g-path.
 Conclusion: Children have the same parsing biases as
adults, and are subject to the same garden-path misanalyses. (As expected if parsing mechanism is innate.)
 But they are worse than adults at recovering from them –
they are inflexible, perhaps due to resource limits.
 Child might deduce asyndetic coordination, from mis-parse!
17
More arguments for non-deterministic learning
 Determinism is badly impacted by ‘noisy’ (imperfect) input.
 Mis-heard sentences could be disruptive, permanently.
Not-yet-acquired bits of a sentence may be ignored by
child  receiving ungrammatical input.
(E.g., overlooking finiteness in root clauses.)
 Prosody and semantics can provide valuable hints about
novel syntactic structure:
 Prosody can indicate surface constituent bracketing.
 Semantics can indicate theta roles, argument structure,
agreement features, etc.
 But these are only hints - sometimes suggest wrong
syntactic structure. E.g. [Jeremy hates] [mean people].
 So a deterministic learner could not afford to accept
any help at all from these not-fully-reliable sources.
18
Instead: Parametric decoding with guessing
 So let’s consider now a decoding learner that doesn’t check
for ambiguity. Just makes its best guess on every sentence.
 A non-deterministic treelet-based learner. Different versions
possible. Here: the Parse Naturally STL (Fodor 1998b)
 Assuming an innate stock of parametric treelets, from which
to select the correct ones for the target language.
 Builds tree structure, applying the usual innate human
parsing principles: Minimal Attachment, etc. and priority to
more frequent/accessible structures.
♦ It parses every input sentence with all the treelets
accessible, though to different degrees.
♦ Every time a treelet is used its activation level automatically
rises slightly (just as for lexical items), so it’s more
accessible in future. Helpful treelets thus become dominant.
19
Parse Naturally STL – In favor
 When input happens to be unambiguous, the PN-STL
finds the right parse (within its resource limits) and sets
parameters correctly.
 The PN-STL doesn’t discard any input. For an ambiguous
input, it guesses one of the grammars that can license it.
 Guesses are linguistically smart, since parsing principles
(Minimal Attachment, Minimal Chain Principle) parallel
UG economy principles (minimize structure, shortest
movement chains).
 Affords an account of default P-values: Different innate
starting-levels of activation of the various treelets.
E.g., the VO treelet may start out more active than OV.
 This predicts "strength markedness”, i.e., a non-default
value needs more input to become dominant. (Pinker 1979)
20
Parse Naturally: more advantages
 Predicts gradual shift to a marked P-setting, with some
fluctuation for a while as neck-and-neck with default value.
Compatible with data on slow time-course of stabilizing
some parameter values. (Yang; van Kampen)
 Frequency-sensitivity permits recovery from errors that
occurred due to guessing or noisy input.
 Maybe even recovery from superset errors. The treelet for
the 'superset' value won’t be used  falls below threshold.
 All of this is because PN is just a normal working parser,
trying to comprehend each sentence it encounters.
 Unlike VM, its memory store for the relative strengths of
the parameters is not an add-on, but an integral automatic
consequence of the parser using the treelets.
21
Parse Naturally: General assessment
 PN-STL extracts as much parametric information as possible
from every input sentence (within the limit of employing a
merely serial parser).
 It has flexibility in face of real-life imperfect input and parsing
errors (unlike deterministic treelet model).
 It employs no learning mechanism at all, other than what
must be available for sentence processing in any case.
 Makes few demands on evolution – just an innate mental
collection of bits of tree structure that humans can deploy.
 BUT: It sacrifices precision. Its reliability needs to be
evaluated. (No simulation data yet.)
 Open questions at this time: Can it obey SP without massive
retrenchment? Can it make safe use of conditioned triggers?
These require certainty about other parameters.
22
Precision or flexibility?
 Competing demands unresolved. More research needed.
 Awaiting your contributions!
REMINDER: WRITING ASSIGNMENT
 Prepare a question, to ask in class on Friday, about some
aspect of language learnability that we haven’t touched on,
or haven’t dealt with satisfactorily. Indicate why it is
important or of interest.
 Hand in a written copy to me on Wednesday (to include in
your grade). Keep a copy so that you can ask it in class on
Friday.
23