Towards a Psycholinguistics

Download Report

Transcript Towards a Psycholinguistics

Computer Simulation of a
Serial Parsing Model
Doan-Nguyen Hai
[email protected]
CLaC (Computational Linguistics at Concordia)
Department of Computer Science
Concordia University (Montréal, Canada)
Introduction
Question: How about a computer simulation
for serial syntactic parsing, which achieves
a medium coverage and can deal with some
classes of long and difficult sentences?
We implemented such a computer simulation:
Based on the garden-path theory (Frazier, 1978;
Frazier & Clifton, 1996)
Parsing process is basically incremental and inputdriven (bottom-up), but some top-down mechanisms
are also included
2
Introduction (2)
Focus on reanalysis
Not based on context-free grammars. Using context
extensively. Linguistic knowledge is represented
procedurally in Prolog clauses
Considering characteristics of computer to obtain
high performance whenever psycholinguistic
knowledge is not available
Leaving space for non-structural information in future
extension (a human-simulating sentence
interpretation program)
3
Introduction (3)
Limitations: not yet considered:
Lexical ambiguities: words of multiple categories
Conjunctive structures, sentences with punctuations
4
State-of-the-Art
Computational implementations:
Marcus 1980: deterministic serial
McRoy & Hirst 1990: serial model, no
reanalysis
Lombardo 1998: top-down parsing + selective
reanalysis
Stevenson 1994: connectionism
Crocker & Brants 2000: parallel, probabilistic
5
The Algorithm
Basic loop: incremental
CurrentTree := NIL;
while not end of sentence do
InNode := next input word;
attach InNode to CurrentTree;
end
How to do attachment ? Where to attach InNode
into CurrentTree ?
In principle: examine all information from the tree and
the input word. Attachment may be costly!
6
First refinement: Attachment
with Waiting Nodes
 A node is waiting for some constituent(s)
obligatory: I look for …
expected: I am reading …
 Incoming element = expected constituent ? Quick checks!
I read … the (book) / the (teacher’s book) / good (books)
 First refinement:
attach(InNode) :waiting_node(WaitingNode), attach_wait(WaitingNode, InNode);
... . % more procedures
7
More refinements
Two cases:
InNode can be attached to CurrentTree: specific
attachment
InNode cannot be attached to CurrentTree:
reanalysis
8
Specific Attachment
 attach_spec(InNode) examines specifically the tree and
the input element for an attachment
I read it … loudly
I went to the theatre of the town... to see (her)
 Resolve attachment ambiguities and improve efficiency
with psycholinguistic preferences. Classical examples:
Late Closure: He said that John came … yesterday.
Minimal Attachment: The boy … found ...
PP-attachment: V preferred to N:
I found a bird … in (the garden)
9
Reanalysis
 Reanalysis
I knew the cat in the garden … was (smart)
The cat found in the garden … was (smart)
I told him the story … was (interesting)
Is the girl sitting there … intelligent?
 Recent focus in psycholinguistics
Frazier & Rayner 1982: Selective Reanalysis.
Fodor & Inoue 1994: Diagnosis model: repairing (not
reparsing!) after diagnosis.
Fodor & Inoue 1998: Attach Anyway + Adjust
Others: Lewis 1998 (NL-Soar), Sturt & Crocker 1996 (Tree
Lowering)
10
Reanalysis (2)
Reanalysis: symptoms + guesses + verifications
+ repair
The cat found in the garden … was (smart)
symptoms: 'was' has no subject but needs one
guesses: the potential subject may be now the
object of a verb, or the subject of a clause
verifications
repair the current tree
11
Reanalysis instead of easy
attachment
verbs with multiple subcategorizations
I will bring the children … some food
v - 1 object --> v - 2 objects (Not a relative!)
I tell the children … that …
I tell the children … she …
v - object --> v - object - clause (Not relative)
I want him … to come
v - object --> v - (CP (TP him to (VP come))) (Not a
purposive adjunct)
Although there is no breakdown and
attachment is possible, reanalysis should
be invoked instead!
12
Reanalysis instead of easy
attachment (2)
 Hypothesis: When a verb is satisfied by the incoming
element, the parser still remembers its other
subcategorizations by registering the verb as a
may_wait node.
 attach_may_wait(Elt) considers the possibility of
reanalysis before doing specific attachment
I want him
verb - obj: (VP want (DP him))
may_wait(want, other subcats of 'want')
… to come
attach_may_wait reanalyzes the current tree:
(VP want (CP (TP (DP him) to (VP come)))
13
Reanalysis instead of easy
attachment (3)
empty categories
The picturesi you have ei … seen
ei is put right after have (Minimal Chain Principle (De
Vincenzi 1991))
seen can begin a modifier, cf. The pictures you took seen
from this perspective are very surrealist
Other examples
The booki I want ei … to read
Whati do you want ei … to read?
The folderi which I keep ei … my files in
Whati do you see ei … the birds with?
14
Reanalysis instead of easy
attachment (4)
 Hypothesis: (Similar to multiple subcategorization verbs) An
object trace node is registered in may_wait state. During
parsing, when the top may_wait node is a trace node,
attach_may_wait will check whether InNode needs to be
attached to the verb, and if yes, move the trace node
somewhere else.
 The picturesi you have ei ...
ei is registered as may_wait
When seen arrives, attach_may_wait reanalyzes and gives:
The picturesi you have seen
The parser then looks for a new position for ei
The picturesi you have seen ei
15
Reanalysis instead of easy
attachment (5)
 If the parser cannot find a new position for ei, it hangs it
in the memory for future location
 The folderi which I keep ei … my files in is blue
The folderi which I keep ei …
The folderi which I keep my files -- ei is hung in memory
The folderi which I keep my files in ei -- ei is located
16
The algorithm
CurrentTree := NIL;
while not end of input do
N := next word;
attach N to CurrentTree;
end
attach(N) :waiting_node(W),
attach_wait(W, N);
may_wait_node(W),
attach_may_wait(W, N); % reanalysis
attach_spec(N);
reanalyze(N). % reanalysis
17
Structure postulation
the boy … who
DP adjunct
waiting node
the boy
CP
the boy … you
DPi
the boy
CP
waiting node
who
Conform to Frazier 1998: parsing is
not purely bottom-up, but also
has some top-down features
ei
TP
you
18
Forward parsing
 Examples
We need... disk… storage... management... software.
 I saw the … teacher’s son’s dog.
Reanalysis repeated??? Seems unlikely!
I go… to… school / see her.
They are… selected… people / by John.
Reanalysis seems not occur in either case
 Hypothesis: With normal pronouncing rhythm, reanalysis
does not occur here but forward parsing
19
Forward parsing (2)
 Forward parsing: the parser delays attachment and
continues to work with the input stream to gather
enough information for a good attachment
We need... disk… storage... management... software.
When disk arrives, the parser does not attach it
immediately, but continues parsing forward to
get the head of the DP
I go… to… school / see her.
When to arrives, the parser continues parsing
forward to know whether it will get a PP or an
infinitive VP
20
Implementation
Programmed in Prolog (Windows)
Coverage: many basic and 'advanced'
structures of English. See test examples.
Test:
Time complexity:
Theoretical ?
Practical: seems good (linear ?)
21
Test Results
(on  300 sentences)
Time (operations)
3000
Average
2500
Min
Max
2000
1500
1000
500
Length
25
23
21
19
17
15
13
11
9
7
5
3
1
0
22
Some test sentences







The folder I kept my letters in is blue.
Is the girl working in the house your daughter?
Is the girl working in the house your daughter bought recently?
What exactly do you want to see in this old house?
The big white horse raced quickly past the old barn fell behind the house.
The girl who I hear they say he thinks John loves is their daughter.
To drink with friends that you have not met for a long time is very
interesting.
 Do you notice the old tall woman who sits looking at your daughter's little
dog has spent a long time here?
 Are the three little boys in the computer group chosen recently by the
university where your friend is studying computer science intelligent
boys?
23
The folder I kept my letters in is blue.
'node(4,d:0:[sg,def],0-1,the)
node(2,n:[sg],1-2,folder)
- 'node(7,e:o,?,e(d,node(4,d:0:[sg,def],0-1,the)))
node(5,c:r,?,[])
'node(3,d:p:[sg,p(1)],2-3,i)
node(6,t:[],?,[])
node(8,v:[o,g]:[cj,past0,r(keep)],3-4,kept)
node(13,d:ps:[pl],4-5,my)
node(11,n:[pl],5-6,letters)
-node(12,p:[],6-7,in)
node(15,e:d,?,e(d,node(7,e:o,?,
e(d,node(4,d:0:[sg,def],0-1,the)))))
node(16,t:[cj,sg,p(3),pres,sg,def],?,[])
node(14,v:be:[cj,sg,p(3),pres],7-8,is)
node(17,a,8-9,blue)
24
Discussions
Future work:
Lexical ambiguities: words of multiple categories
Conjunctive structures, sentences with punctuations
Incorporating non-structural (semantic, real world)
information: sentence processing.
Incorporating other techniques: statistics.
Comparing the parser with traditional cfg-based and
statistics-based parsers.
Programming the parser on the computer may
help us understand more on the human process.
25
Conclusions
A serial parsing prototype has been
implemented with some good results
Particularities:
Reanalysis instead of easy attachment
Forward parsing
Still a lot of things to do
26
References
Crocker, M. & Brants, T., 2000. Wide-Coverage Probabilistic Sentence Processing. Journal of
Psycholinguistic Research, Vol. 29, No. 6, 2000.
De Vincenzi M., 1991. Syntactic parsing strategies in Italian (Dordrecht: Kluwer Academic, 1991).
Fodor, J.D. & Ferreira, F., 1998 (eds). Reanalysis in Sentence Processing. Kluwer Academic.
Fodor, J.D. & Inoue, A., 1994. The Diagnosis and Cure of Garden Paths. Journal of Psycholinguistic
Research, Vol. 23, No. 5, 1994.
Fodor, J.D. & Inoue, A., 1998. Attach Anyway. In Fodor & Ferreira (1998) 101-141.
Frazier, L., 1978. On Comprehending Sentences: Syntactic Parsing Strategies. Doctoral dissertation,
University of Connecticut.
Frazier, L., 1998. Getting There (Slowly). Journal of Psycholinguistic Research, Vol. 27, No. 2, 1998.
Frazier, L. & Clifton, C., 1996. Construal. Cambridge, MA: MIT Press.
Lombardo, V., 1998. A Computational Model of Recovery. In Fodor & Ferreira (1998) 287-325.
Marcus, M., 1980. A Theory of Syntactic Recognition for Natural Language. Cambridge, MA: MIT
Press.
McRoy, S. & Hirst, G., 1990. Race-based Parsing and Syntactic Disambiguation. Cognitive Science,
14.
Stevenson S., 1998. Parsing as Incremental Restructuring. In Fodor & Ferreira (1998) 327-363.
27