Argument - Dan Roth

Download Report

Transcript Argument - Dan Roth

Starting from Scratch
in
Semantic Role Labeling
Michael Connor, Yael Gertner, Cynthia Fisher, Dan Roth
Page 1
How do we acquire language?

Topid rivvo den marplox.
Page 2
The language-world mapping problem
“the world”
“the language”
[Topid rivvo den marplox.]
Page 3
Observe how words are distributed across situations
Smur! Rivvo della frowler.
Scene 1
Topid rivvo den marplox.
Blert dor marplox, arno.
Scene 3
Marplox dorinda blicket.
Scene n
Page 4
Structure-mapping: A proposed starting point for syntactic
bootstrapping

Children can learn the meanings of some nouns via crosssituational observation alone [Fisher 1996, Gillette, Gleitman,
Gleitman, & Lederer, 1999; Snedeker & Gleitman, 2005]

But how do they learn the meaning of verbs?

Sentences comprehension is grounded by the acquisition of an
initial set of concrete nouns

These nouns yields a skeletal sentence structure — candidate
arguments; cue to its semantic predicate—argument structure.

Represent sentence in an abstract form that permits
generalization to new verbs
[Johanna rivvo den sheep.]
Nouns identified
Page 5
Strong Predictions [Gertner & Fisher, 2006]


Test 21 month olds on assigning arguments with novel verbs
How order of nouns influences interpretation: Transitive & Intransitive
Agent-first: The boy and the girl are daxing!
Transitive: The boy is daxing the girl!
Agent-last: The girl and the boy are daxing!
Error disappears by 25 months
preferential looking paradigm
Page 6
Current Project: BabySRL

Realistic Computational model for Syntactic Bootstrapping via
Structure Mapping:



Develop Semantic Role Labeling System (BabySRL) to
experiment with theories of early language acquisition



Verbs meanings are learned via their syntactic argument-taking roles
Semantic feedback to improve syntactic & meaning representation
SRL as minimal level language understanding
Determine who does what to whom.
Inputs and knowledge sources

Only those we can defend children have access to
Page 7
BabySRL: Key Components

Representation:


Theoretically motivated representation of the input
Shallow, abstract, sentence representation consisting of




# of nouns in the sentence
Noun Patterns (1st of two nouns)
Relative position of nouns and predicates
Learning:

Guided by knowledge kids have



Classify words by part-of-speech
Identify arguments and predicates
Determine the role arguments take
Page 8
BabySRL: Early Results
[Connor et. al CoNLL’08, 09]

Fine grained experiments with how language is represented
Test different levels of representation

Key Finding:
Primary focus
on nounreproduces
pattern (NPattern)
feature
 NPattern
errors in children




Hypothesis: number
and order
nouns important
 Promotes
A0-A1of
interpretation
in
transitive,
but alsocan
intransitive
sentences
 Once we know
some nouns,
use them
to represent structure
 Verb
position
does not make this error
NPattern gives
count
and placement:


Incorporating
recovers correct
First of two, second
of three,it etc.
interpretation
 But:
Done
with manually labeled data
Alternative:
Verb
Position

 Feedback
varies
Target argument
is before
or after verb
Page 9
BabySRL: Key Components

Representation:


Theoretically motivated representation of the input
Shallow, abstract, sentence representation consisting of




# of nouns in the sentence
Noun Patterns (1st of two nouns)
Relative position of nouns and predicates
Learning:

Guided only by knowledge kids have



Classify words by part-of-speech
Identify arguments and predicates
Determine the role arguments take
Page 10
This work: Minimally Supervised BabySRL


Goal: Unsupervised “parsing” for identifying arguments
Provide little prior knowledge & high level semantic feedback



Overview
Unsupervised Parsing


Defensible from psycholinguistic evidence
Identifying part-of-speech states
Argument Identification
Identify Argument States
 Identify Predicate States


Argument Role Classification


Labeled Training using unsupervised arguments
Results and comparison to child experiments
Page 11
BabySRL Overview

Traditional Approach
She always has salad .
[NP][
[ ][
[ A0]

Parse input
Identify Arguments
Classify Arguments
Global inference over Arguments

Each stage has its own classifier/knowledge source




VP
V
][ NP ]
][
]
[ A1 ]
Finely labeled training data throughout
Page 12
BabySRL Overview

Unsupervised Approach
She always has salad .

Unsupervised HMM
Identify Argument States
Classify Arguments
No Global inference

Labeled training: only for argument classifier




46
N
A0
48
26
V
74
N
A1
Rest is driven by simple background knowledge
Page 13
2
Unsupervised Parsing

We want to generate a representation that permits
generalization over word forms



Incorporate Distributional Similarity
Context Sensitive
Hidden Markov Model (HMM)


Simple model
Essentially provides Part of Speech information


Without names for states; we need to figure this out
Train on child directed speech


CHILDES repository
Around 1 million words, across multiple children
Page 14
Unsupervised Parsing (II)

Standard way to train unsupervised HMM


Simple EM produces uniform size clusters
Solution: Include priors for sparsity



Replace this by psycholinguistically plausible knowledge
Knowledge of function words


Function and content words have different statistics
Evidence that even newborns can make this distinction


Dirichlet prior (Variational Bayes, VB)
We don't use prosody, but it may provide this.
Technically: allocate a number of states to function words


Leave the rest to the rest of the words
Done before parameter estimation, can be combined with EM or VB
learning: EM+Func, VB+Func
Page 15
Unsupervised Parsing Evaluation
Test as unsupervised POS on subset of hand corrected CHILDES
data.
EM: HMM trained with EM
Variance of Information

Better
VB: HMM trained with
Variational Bayes & Dirichlet
Prior
EM+Funct
VB+Funct: Different training
methods with function word
pre-clustering
Training Sentences

Incorporating function word pre-clustering allows both EM & VB to achieve
the same performance with an order of magnitude fewer sentences
Page 16
Argument Identification

Now we have a parser that gives us state ( cluster) each word
belongs to
Next: identify states that correspond to arguments & predicates

Knowledge: We provide list of frequent nouns


As few as 10 nouns covers 60% occurrences



Mostly pronouns
A lot of evidence that children know and recognize nouns early on
Algorithm: Those states that appear over half the time with
known nouns treated as argument state

Assumes that nouns = arguments
Page 17
Argument Identification
Knowledge:
Frequent Nouns:
You, it, I, what, he, me, ya, she, we, her, him, who, Ursula,
Daddy, Fraser, baby, something, head, chair, lunch,…
She
46
{it he she who Fraser Sarah Daddy Eve...}
always
48
{just go never better always only even ...}
has
26
{have like did has...}
salad
74
{it you what them him me her something...}
.
2
{. ? !}
List of words that
occurred with state 46
in CHILDES
Page 18
Predicate Identification

Nouns are concrete, can be identified





Predicates more difficult
Not learned easily via cross-situational observation
Structure-mapping account: sentence comprehension is
grounded in the learning of an initial set of nouns
Verbs are identified based on their argument-taking behavior.
Algorithm: Identify predicates as those states that tend to
appear with a given number of arguments

Assume one predicate per sentence
Page 19
Predicate Identification
She
46
always
48
has
26
salad
74
.
2
State 48:
1 argument - 0.2
2 argument - 0.4
3 argument – 0.3
...
State 26:
1 argument - 0.1
2 argument - 0.6
3 argument – 0.2
...
Page 20
Argument Identification Results


Test compared to hand labeled argument boundaries on
CHILDES directed speech
Implications on
features’ quality
Vary number of seed nouns
Differences in
parsing
disappeared
Argument Identification
Good
EM+Funct
Predicate Identification
Bad
Page 21
Finally: BabySRL Experiments

Given potential arguments & predicates, train argument classifier


To train, apply true labels to noisily identified arguments



Given abstract representation of argument and predicate, determine its
role (Agent, Patient, etc)
Roles relative to predicate
Regularized Perceptron
Abstract Representations (features) considered:



(1) Lexical: Target noun and target predicate
(2) NounPattern: first of two, second of three, etc.


Only depends on number and order of arguments
She always has salad: `She` is first of two, `salad` is second of two
(3) VerbPosition representation:

`She` is before the verb, `salad` is after the verb
Page 22
BabySRL Experiments: Test Data


Unlike previous experiments (CHILDES), here we compare to
psycholinguistic data
Evaluate on two-noun constructed sentences with novel verbs





Test two noun transitive vs. intransitive sentences
A krads B vs. A and B krads.
A and B filled with known nouns
Test generalization to unknown verb
Reproduces experiments on young children


At 21 months of age make mistake: interpreting A as agent, B as
patient in both cases.
Hypothesize children (at this stage) represent sentences in terms of
number and order of nouns, not position of verb.
Page 23
BabySRL Experiments
Learns Agent First,
Patient Second
%A0A1
%A0A1
Transitive
10 Seed
Nouns
Parsing Algorithm

Reproduces error on
Intransitive.
With noisy representation
does not recover even
when VerbPos available
Intransitive
Parsing Algorithm
NounPat features promote Agent-Patient (A0-A1) interpretation for both
Transitive (correct) and Intransitive (incorrect). VerbPos pushes the
intransitive case in the other direction. Works for Gold Training
Page 24
Summary

BabySRL: a realistic computational model for verb meaning
acquisition via the structure-mapping theory.





Representational Issues
Unsupervised learning driven by minimal plausible knowledge sources
Even with noisy parsing and argument identification, able to
learn abstract rule: agent first, patient second.
Difficult identification of predicates harms usefulness of
superior representation (VerbPos)
Thank You
 Reproduces errors seen in children
Next Steps:


Use correct high level semantic feedback to improve earlier
identification and parsing decisions — improve VerbPos feature
Relax correctness of semantic feedback
Page 25
How do we acquire language?
Balls
The Dog kradz the
balls
Dog
Page 26
BabySRL Experiments
Transitive
Even with VerbPosition
available, with noisy
parse make error on
intransitive sentences.
Intransitive
10 Seed
Nouns
365 Seed
Nouns
Page 27