Class 4 - CUNY

Download Report

Transcript Class 4 - CUNY

Introduction to
Language Acquisition Theory
Janet Dean Fodor
St. Petersburg July 2013
Class 4. Poverty of the stimulus:
Missing positive evidence
Can learners rely on positive evidence?
 On Friday, we observed the almost total absence of direct
negative evidence for learners (= evidence about what is
not in the language), such as usable corrective feedback.
 Unclear whether indirect negative evidence can step in to
distinguish grammatical/ungrammatical sentences. If not,
it follows that a grammar must be acquired on the basis of
positive evidence (= evidence of what is in the language).
 Hearing a sentence is positive evidence of its
grammaticality. Especially valuable: the ‘triggers’ for
target language values of UG-provided parameters.
 Today, we will see that much positive evidence is missing
also. And some positive ‘evidence’ is misleading. This has
further implications for how much UG must contribute.
2
Various imperfections of the input
INPUT
KNOWLEDGE ATTAINED
 Infinite number of sentences
 Complex constructions,
unbounded recursion
c. Little info about ungrammaticality  Can distinguish grammatical
from ungrammatical
MISSING INPUT
a. Finite number of sentences
b. Short simple sentences
MISLEADING INPUT
d. Slips of the tongue; misperceived
sentences
e. Idioms and peripheral forms
f. Ambiguous input, structurally
indeterminate
g. Mixed registers, dialects, lgs.
INACCESSIBLE INFORMATION h. No comparisons across inputs;
no minimal pairs or paradigms.
 Correct generalizations; these
forms not generated
 Acquired as special forms; not
generalized.
 Correct structures acquired
 Separate knowledge of each.
 Knowledge of facts not
evident in single inputs, e.g.
freedom of word order. 3
Poverty of the Positive Stimulus (‘POPS’)
 Self-evident: Child hasn't already heard every sentence
s/he produces. Children generalize beyond their input.
 So POPS is at least trivially true: The input does not
exemplify the whole language that is acquired.
 Least interesting: Learners substitute new words into
observed constructions. The cat sat.  The dog sat.
 More interesting: Recursion: Learners extend the lg
by generating degree-n clauses like degree-1 clauses.
Mary said that John thinks that Susan hopes that....
 But not by generalizing degree-0 to degree-1: The
Penthouse Principle (What happens upstairs...).
 Most interesting: Learners deduce the existence
or properties of a novel construction.
4
An early example of POPS: Aux sequences
 Child input: It may have rained. It has been raining.
 Mental representation of the input (Chomsky 1957):
Aux  Tns (M) (have) (be) + V
 Predicts the existence of: It may have been raining.
 Chomsky concludes: Children would assume that this is
in the language even if they had never heard it.
 Kimball (1973; an early corpus study) reported that
children don't reliably hear it.
 Pullum & Scholz (2002) dispute this, giving examples
from Moby Dick and Wuthering Heights. Also the Wizard
of Oz and Peter Pan.
5
Structure-dependent auxiliary inversion
 The most-discussed argument for POPS. Chomsky (1975)
The auxiliary-inversion transformation in English questions.
(a) The man is tall.
(b) Is the man tall?
 Two different generalizations compatible with (a) and (b).
 Move the structurally highest aux to the front. CORRECT
 Move the linearly first aux to the front. WRONG
 The linear generalization gives wrong results in complex
sentences. It predicts (d), instead of the correct (e).
(c) The man who is tall is in the room.
(d) * Is the man who tall is in the room?
(e) Is the man who is tall in the room?
 But there’s no way to tell that from the one-clause ex’s.
6
Structure-dependence of transformations
 Crain and Nakayama (1987) tested children 3-6 years.
‘Ask Jabba if the boy who is watching Mickey is happy.’
(f) Is [the boy who is watching Mickey] _ happy?
(g) *Is [the boy who _ watching Mickey] is happy?
The children made no linear-generalization errors like (g).
 Chomsky famously claimed “a person could go through
much or all of his life without ever having been exposed to”
the correct version; hence the structure-dependence of
transformational rules must be innately given.
 Many disagreements over 3 decades about whether children
do reliably hear such sentences. See slides below.
7
Strongest example of POPS: Parasitic gaps
 Which article did you file without reading it?
Which article did you file without reading e?
Both good. Same meaning. Overt pronoun is optional.
 John was killed by a rock falling on him.
*John was killed by a rock falling on e.
Overt pronoun cannot be omitted here.
 Chomsky claims (1983 i.a.) that constructions like these
“are so rare that it is quite likely that during the period a
child masters his native language (the first five or six
years of life), he never hears any of these constructions,
or he hears them very sporadically. Nonetheless, every
native speaker of English knows flawlessly when you can
and can't drop pronouns in these kinds of sentences.” 8
Parasitic gap constructions have
remarkable properties
 P-gaps occur inside extraction islands, such as adjunct
islands (without reading e), subject island/RC islands:
Which linguist did [ everyone who met e ] admire t?
 Island constraints block extractions that create a single
‘normal’ empty category (a trace):
* Which linguist did [ everyone who met t ] admire Sue?
 Regardless of how children come to know about island
constraints (innately?), they also come to know that these
constraints are not applicable to parasitic gaps.
 (a) Not clear that they receive positive data for p-gaps.
(b) Even if they do, they must not overgeneralize island
constraint violations to ‘normal’ gaps. Somehow, they
know that p-gaps are grammatical but special.
9
Do p-gaps follow from independently
established principles of UG?
 Chomsky’s claim: The existence of parasitic gaps (and
the fine constraints on when and where they can occur)
aren’t acquired from experience.
 So, despite their curious properties, they must follow
from innate principles + positive evidence about other
kinds of empty categories.
 In “Some Concepts and Consequences…” (1982), he
argues that a p-gap is a null pronoun which becomes
A-bar-bound by the moved antecedent of the ‘real’ gap,
as long as the ‘real’ gap doesn’t c-command it.
 He claims that the existence of p-gaps is entailed by
the existence of null pronouns, plus the binding theory,
θ-criterion, and Projection Principle.
10
Later refinements of the linguistic analysis
 Chomsky’s (1982) analysis thus explained why these
constructions (not needed for communication!) exist in
many (all?) languages. But it didn’t cover all the facts.
 the person that John described t …
(a) …without examining [any pictures of e]. e inside R-branch
(b) * …without [any pictures of e] being on file. e inside L-branch
 Later proposals by Kayne (Connectedness,1983) and
Chomsky (Barriers, 1986).
 Kayne: the ‘government projection’ of the p-gap must
meet up with the g-projection of the trace. See trees 
 This blocks p-gaps inside left branches, as in (b), since
government is to the right in English – unless the left
branch satisfies Connectedness:
e inside L-branch
(c) a person who people that talk to e usually admire t.
Connectedness constraint on the
g-projections of the p-gap and the trace
Not connected.
Ungrammatical.
12
Connected g-projections
of the p-gap and the trace
Connected.
Grammatical, as long as
NP is phonologically null
( = the trace, which
licenses the p-gap).
Ungrammatical if NP is
overt. *him *Sam.
13
Are all instances of POPS evidence of
innate linguistic knowledge?
 For p-gaps we are inclined to conclude that what the
learner’s brain has to supply, to compensate for stimulus
poverty, is non-trivial and specifically linguistic.
 But in other cases, maybe a child’s extensions of the
patterns in the input sample are guided just by general
principles of induction, rather than specialized languagespecific generalization principles.
 Some recent such challenges to POPS arguments for UG.
 One suggestion: Any learner could arrive at these same
generalizations just by tracking the statistical properties
of input sentences.
14
Structure-dependent auxiliary-inversion
 Reali & Christiansen (2005) maintained that structuredependent aux-inversion can be learned just by tracking the
frequency of bigrams (two-word sequences).
 Their bigram-based model was 96% correct in choosing
between the  and * versions of pairs like:
Is the boy [who is crying] t hurt?
* Is the boy [who t crying] is hurt?
 But Kam, Stoyneshka, Tornyova, Fodor & Sakas (2008)
showed that this was due entirely to the high corpus
frequency of the bigram ‘who is’ in all of R&C’s  versions.
 The bigram model failed on all other  variants of the same
general rule (object gap, do-support, main verb inversion in
Dutch) which don’t have ‘who is’. Did the boy who Sue likes…
 Conclusion: the R&C success was just a fluke.
15
Empirical evaluation of POPS –
a methodological challenge
 What would substantiate the claim that children acquire
some language facts without benefit of relevant positive
evidence?
 Psycholinguistic data showing the child knows X at age Y.
 Plus evidence that the child wasn’t exposed to X by age Y.
 Practical problems:
 Very difficult to prove a child was not exposed to X
(need day-and-night recordings for years!) Instead,
estimate, based on absence from CDS corpora.
 Does overhearing adult talk count? (See next slide.)
Does one instance count as exposure?
Do non-conversational genres like nursery rhymes?
16
Overheard language - incidental learning
 Saffran et al. (1997): 6 - 7 year olds overheard a
21-minute tape of nonsense words, while engaged in
an art project. No instruction to listen or remember.
 Two days in a row.
 Then tested on distinguishing words from non-words.
 Words: babupu bupada dutaba patubi pidabu tutibu
 Nonwords: batipa bidata dupitu pubati tipabu tapuba
 Children correct 68.3% (p<.01). Adults correct 73.1%.
 Conclusion: “incidental learning is a robust phenomenon
that may play a role in natural language acquisition”.
 If so, children may learn from much richer input than CDS.
17
Pullum & Scholtz don’t agree that childdirected speech is a unique (limited) genre
 Because Hudson (1994) found approx same % of nouns
in texts of many genres (inc. some child lg, tho’ no CDS).
 Some findings show genre differences, e.g., reduced relative
clauses are much rarer in conversation than in newspaper.
 But P&S deem it appropriate to cite, as evidence against
POPS, examples of aux-inversion from The Wall Street
Journal and The Importance of Being Ernest (1895).
WSJ: Is a young professional who lives in a bachelor condo as
much a part of the middle class as a family in the suburbs?
Oscar Wilde: Who is that young person whose hand my nephew
Algernon is now holding in what seems to me a peculiarly
unnecessary manner?
 P&S do admit children’s sentences are short, and suggest
a 4-word Aux-example: Has whoever left returned?
18
Corpus estimates of exposure to positive input
 P&S cite just 3 ex’s from CDS. To Nina at 2+ yrs.
All where’s (JDF: possibly treated as a unit by the child)
Where’s the little blue crib that was in the house before?
Where’s the other dolly that was in here?
Where’s the other doll that goes in there?
 Sampson (1989) cites a children’s encyclopedia for
10-yr olds, and a William Blake poem (The Tyger, 1794;
see below) which also contains *main verb fronting:
 In what distant deeps or skies burnt the fire of thine eyes?
 A serious CDS corpus search (Yang & Legate, 2002):
For both +NullSubj and +V2, acquired at approx age 3;2
(=Crain & Nakayama’s age-of-acquisition data for auxinversion), CDS provided 1.2% evidence of total input.
 But CDS for aux-inversion: Nina: 0.07%, Adam: 0.05%.19
Other practical problems in substantiating POPS
 Exposure to construction X before the child has the
capacity to process construction X should not count,
but difficult to establish when that is.
 Production tends to be delayed relative to perception in
any domain (it's a more demanding task). So production
evidence that the child knows X by age Y may be absent
even if the child really does.
 Comprehension experiments are needed.
 Adults may anticipate readiness of a child to cope with X
(e.g., relative clauses). So it’s not unlikely that a child
who utters X has already heard X – even if the hearing
was not the cause.
 So, even if children do "invent" X on the basis of UG, as
Chomsky’s POS claim implies, could it be proven?
20
A novel form of argument for POPS
 Children sometimes invent ungrammatical forms –
which we can be pretty sure they didn’t hear.
 If that novel form is grammatical in other languages, it’s
arguably not a random invention or error, but is drawn
from the possibilities made available by UG. The child
just has a UG parameter set wrong.
 Thornton (1990): Long-distance Wh-extraction in English.
2-5 yr olds often insert an overt Wh-item into the
intermediate Comp. Ungrammatical in English, but ok in
other languages (Romani, some German dialects,...).
* Who do you think who Grover wants to hug?
 Conclusion: Children use UG to compensate for gaps in
their input evidence. (How can I form a long-distance Qn?)
 Thornton suggests they’re pronouncing the intermediate
trace – just a PF-level mistake.
21
Another example of inventing a
UG-compatible but wrong form
 Corpus study: Left-branch extractions in Dutch by several
children (approx 3-6yrs); van Kampen (1997). These are
grammatical in Latin & Polish, but not in adult Dutch.
*Welke wil jij liedje zingen? (Which want you t song sing?)
*Ik weet niet hoe het lang is. (I know not how it t long is.)
 These children are not copying their input. But unrelated
children behave alike, so not just random errors.
 Proposed explanation: They’re exercising a UG option,
when their input info (positive/negative?) is insufficient.
 vK argues that left-branch extraction is a learner’s default,
because it more closely reflects scope at LF.
 In general: This is a promising form of argument for
UG-guided learning. More cases would be welcome.
22
Joint implications of POPS and PONS
 Assuming incremental learning (= retain or change
grammar after each input), what size & shape is the
generalization a learner formulates based on a novel input?
 Due to POPS, the positive input merely sets a lower bound
on the generalization: it must license at least that example.
But which others as well?
 Due to PONS, negative data (if any) merely set a very
loose upper bound: little info about what not to license.
 In between POPS and PONS is a huge information gap,
allowing a host of alternative grammar hypotheses.
 Yet, consistency across all normally developing children.
 IN THIS GAP, grammar choice must be being made by
something internal to the learner (all learners) = UG or LM.
23
Now, misleading positive input
INPUT
MISLEADING INPUT
d. Slips of the tongue; misperceived
sentences
e. Idioms and peripheral forms
f. Ambiguous input, structurally
indeterminate
g. Mixed registers, dialects, lgs.
KNOWLEDGE ATTAINED
 Correct generalizations; these
forms not generated
 Acquired as special forms; not
generalized.
 Correct structures acquired
 Separate knowledge of each.
24
Ungrammatical input may occur
(but is not the major problem)
 An early influential study. Adult speech to young children
“is unswervingly wellformed” (Newport, Gleitman &
Gleitman 1977).
 But there are many fragments in child-directed speech
(e.g, “The blue one”, “Over there”). Might a child mistake
these for whole sentences? (JDF: Ok if well-formed
in context. Children have to learn ellipsis too.)
 Children may misanalyze what they hear, which is
equivalent to hearing ungrammatical sentences.
 Misanalysis errors aren’t easy to document in syntax,
but are observed for morphology:
weather report  weathery port  weathery man
25
Misleading input is ignored – how?
 Learners evidently filter out some of their positive input.
Not just slips of the tongue, other speech errors, L2.
 They hear archaic forms in nursery rhymes and stories, but
they don’t adopt them into their own grammar. E.g.,
(i) Now I lay me down to sleep.
(ii) Did he who made the Lamb make thee?
(Wm Blake; cited by Sampson 1989 as a positive
source for correct aux-inversion, contra Chomsky)
 The same poems contain seriously misleading info:
(iii) I pray the Lord my soul to keep.
(iv) Did he smile his work to see?
*Topicalization in infinitival complement clause.
 How do children know which examples to ignore??
26
‘Peripheral’ input is not generalized – why/how?
 Children don’t mistake exceptional constructions for core.
 Children are exposed to (and use!) many idioms, some
of which resemble triggers for parameters. Danger! E.g.,
(1) Out popped the cuckoo.
(Could mis-trigger Verb Second for English main verbs.)
(2) I’m gonna have me a nice hot bath.
(Could mis-trigger local binding of pronouns, as in Maori.)
 How do children know what’s representative of the
language as a whole? It can’t be frequency; some idioms are
very frequent. (Here you are. Let go of me.)
 Could UG help learners distinguish core from periphery?
Perhaps Designated triggers for parameters (Fodor 1994).
The canonical instance: Maryi pinched heri  +LocalBinding
27
Can stimulus poverty reveal
the exact content of UG?
 POS can reveal what must follow from UG.
 POS rules out any linguistic theory too weak to provide
the missing input information or to represent the input in
a relevant way (e.g., finite state grammars; bigrams).
 Also all theories that wrongly predict which patterns of
generalization over the input are natural.
 But the 'subtraction method' cannot by itself deliver the
"psychologically real" grammar (the exact mix of
principles, rules, lexical entries, constraints on
derivations or representations, etc.) in people’s heads.
 Different linguistic theories assume different UGs,
which capture that information in different ways.
28
Summary of stimulus poverty
 Children’s input is seriously uninformative in some basic
respects, and potentially misleading.
 Yet learners quite reliably end up with (essentially) the
same grammar as each other, and as their adult models.
 Innate linguistic knowledge (UG) would be capable of
resolving some of the indeterminacies, though not all.
 Strategies of the learning mechanism may help. Innately
given procedures for coping with incomplete input info.
Uniqueness Principle; Subset Principle (Class 5)
 If not, that leaves a vacuum to be filled, perhaps by
more powerful data-driven / statistical / probabilistic /
neural network approaches. Many of these reject UG
entirely. But hybrid models may be developed.
29
Please read, for Friday (Class 5)
 1½ pages, on a retrenchment paradox due to the Subset
Principle.
 This is an excerpt from Fodor, J. D. ‘Syntax acquisition: An
evaluation measure after all?’ In Of Minds and Language:
The Basque Country Encounter with Noam Chomsky (2009)
30