Transcript Document

Why memory matters in English
grammar
Dick Hudson
Manchester, March 2009
Memory
• Long-term memory
• Short-term memory
– aka ‘working memory’ – used for thinking
– limited capacity: ‘7 ± 2’
• Maybe working memory is the currently
active area of long-term memory
Memory as a network
• Long-term memory is a network
• Evidence: activation spills onto neighbours
• Evidence:
– priming of neighbouring words
– speech errors are wrongly selected neighbours
• But the network’s not just language
– ‘cognitive linguistics’
Network activity
• Node activation
– activation takes energy, and is limited
– keeping a node active is expensive
• Node creation
– essential for processing experience
– also expensive
• Node binding
– expensive as it tends to confuse similar nodes
Activation
active!
Activation
Activation
Node building
new!!
Node binding
Node binding
Tokens and types
• Memory must include temporary tokens as
well as permanent types.
• Tokens are different from types
– different properties, e.g. time, speaker
– even conflicting properties, e.g. mispelings
• But tokens are also very expensive,
– because they’re the focus of attention.
Tokens in syntactic theory
•
•
•
•
What tokens can we afford?
At least one token per word
e.g. five tokens here
At least one dependency token per word
But do we really need more?
– e.g. for we: a word, and a DP?
• Phrases are expensive
– so they need really strong evidence!
Dependency structure
• Just one token node per word
• And one per dependency
• e.g. “Dependency grammar is very ancient.”
p
a
s
Dependency grammar
a
is very
ancient
Phrase structure
•
•
•
•
One token per word, plus:
one token per phrase-mother.
one part-whole relation per word or phrase.
e.g. “Phrase structure is very young.”
The cost of phrase structure
VERY
expensive!
phrase structure is
very young
is very
young
very
young
phrase
structure
Phrase
structure
is
very
young.
So what? (1)
• Tokens are expensive (for memory
resources) as long as they’re active.
• So the sooner they de-activate, the better.
• Tokens can de-activate sooner in
dependency structure than in phrase
structure.
• So dependency structure is psychologically
more plausible.
Dependency distance
•
•
•
•
How long must a word token stay active?
Till it’s linked as dependent to a ‘parent’.
What’s the cost of keeping it active?
The other tokens that are active at the same
time.
• I.e. cost of W = number of words between
W and its parent. = dependency distance
An example
p
a
s
Dependency grammar
dependency
distance
0
a
is very
0
0
N/A
ancient
1
Long subjects and dependency
distance
This is the dog that chased the cat that caught the rat
that ate the cheese that lay in the house that Jack built.
max dd = 0
The dog that chased the cat that
caught the rat that ate the cheese that lay in
the house that Jack built
max dd = 21
is this one.
So what (2)
• Long subjects are expensive because their
head competes for activation with all the
other words between it and the verb.
• Dependency distance measures this
precisely.
– Ed Gibson (MIT) has independently developed
a similar measure.
Learning syntax
• Dependency patterns can only be learned
from active tokens.
• Most words in casual speech have dd = 0.
– 74.2% in PEN treebank
– 63% adults in CHILDES
– only 1-4% have dd > 4.
• Every English dependency allows dd = 0.
So what? (3)
• Learning dependency patterns is easy.
• Adjacent but non-dependent words are (by
definition) random, and have no lasting effect.
• Non-adjacent but dependent words don’t matter
because the same patterns can always be learned
from easier examples.
• So most of syntax is easy to learn as data.
– inducing generalizations is more tricky.
Typology
• Why are SVO languages so common?
– SOV = 45%, SVO = 35%, VSO = 10% (± 5%)
• Each order has some benefits.
• For SVO, it’s low dependency distance.
S
O
V
min dd = 1
V
S
O
min dd = 1
S
V
O
min dd = 0
Moreover, ….
big
book
noun
about linguistics
adjective
very happy to see you
preposition
just
before Christmas
So what? (4)
• One of the pressures on languages is to
minimize dependency distances.
• If words allow two dependents, dd is 0 if
the dependents are on opposite sides.
• This is possible in all English word classes,
not just in verbs.
• Maybe SVO is part of a more general
pattern which reduces memory load.
‘consistently mixed’
Long subjects
• Long subjects are hard to produce.
• The head word may de-activate before the
verb is produced, hence frequent nonagreement examples:
“… the accuracy of the quotes have not been
disputed.”
• Long subjects are also hard to understand.
nearest active N
Why is it-extraposition helpful?
10
that extraposed It
sentences are easier
to process than their
unextraposed
equivalents
1
is clear
The extraposed version is more complex but easier.
Dependency structures for itextraposition
1
It
’s
2
clear that extraposed sentences are easier to
process than their unextraposed equivalents.
2
max dd = 2
1
max dd = 10
10
2
That extraposed sentences are easier to
process than their unextraposed equivalents
2
1
is
clear
Other tactics to help memory
• Extraposition from NP
Two people who were on the
pavement
•‘Heavy NP shift’
died
8
I saw something that would have yesterday
made even you laugh
•Topicalisation
3
3
anaphoric
distance
8
we sat down to rest and have when we got
a light snack
there 13
Grammaticality and weight
• These special strategies override normal
rules.
• But they’re only allowed for ‘heavy’ (or
otherwise memory-heavy) structures.
– *I rang up her.
– I rang up the girl who …..
• So grammarians need a theory of memory.
Thank you
• The theory is called Word Grammar:
www.phon.ucl.ac.uk/home/dick/wg.htm
• This slide show can be found at
www.phon.ucl.ac.uk/home/dick/talks.htm
So what? (5)
• English grammar has evolved to minimize
demands on memory.
– basic word order (consistently mixed)
– special orders for overriding the basic order.
• Grammaticality depends on memory load as
well as on grammar.