SRI_March05_Collecti.. - Carnegie Mellon University

Download Report

Transcript SRI_March05_Collecti.. - Carnegie Mellon University

Learning TFC Meeting, SRI March 2005
On the Collective Classification of Email
“Speech Acts”
Vitor R. Carvalho & William W. Cohen
Carnegie Mellon University
Classifying Email into Acts
Verb
Commisive
Deliver

From EMNLP-04, Learning to
Classify Email into Speech Acts, CohenCarvalho-Mitchell

An Act is described as a verbnoun pair (e.g., propose meeting,
request information) - Not all
pairs make sense. One single
email message may contain
multiple acts.

Try to describe commonly
observed behaviors, rather than
all possible speech acts in
English. Also include nonlinguistic usage of email (e.g.
delivery of files)
Verbs
Directive
Request
Commit
Propose
Amend
Noun
Activity
Event
Ongoing
Meeting
Other
Delivery
Opinion
Data
Nouns
Idea: Predicting Acts from Surrounding Acts
Example of Email Sequence
• Strong correlation with
previous and next
message’s acts
Delivery
Request
Request
Proposal
Delivery
Commit
Commit
Delivery
<<In-ReplyTo>>
Commit
• Act has little or no correlation
with other acts of same
message
Related work on the Sequential Nature of
Negotiations

Winograd and Flores,
1986: “Conversation
for Action Structure”

Murakoshi et al. 1999;
“Construction of
Deliberation Structure
in Email”
Data: CSPACE Corpus


Few large, free, natural email corpora are available
CSPACE corpus (Kraut & Fussell)
o Emails associated with a semester-long project for Carnegie
Mellon MBA students in 1997
o 15,000 messages from 277 students, divided in 50 teams (4
to 6 students/team)
o Rich in task negotiation.
o More than 1500 messages (from 4 teams) were labeled in
terms of “Speech Act”.
o One of the teams was double labeled, and the interannotator agreement ranges from 72 to 83% (Kappa) for
the most frequent acts.
Evidence of Sequential Correlation of Acts




Transition diagram for most common verbs from CSPACE corpus
It is NOT a Probabilistic DFA
Act sequence patterns: (Request, Deliver+), (Propose, Commit+,
Deliver+), (Propose, Deliver+), most common act was Deliver
Less regularity than the expected ( considering previous deterministic
negotiation state diagrams)
Content versus Context




Content: Bag of Words features only
Context: Parent and Child Features only ( table below)
8 MaxEnt classifiers, trained on 3F2 and tested on 1F3 team dataset
Only 1st child message was considered (vast majority – more than 95%)
Context
Request
Delivery
Content
???
Request
Proposal
Commit
dData
Meeting
Parent message
Commissive
Child message
Directive
Propose
Parent Boolean
Features
Child Boolean
Features
Parent_Request,
Parent_Deliver,
Parent_Commit,
Parent_Propose,
Parent_Directive,
Parent_Commissive
Parent_Meeting,
Parent_dData
Child_Request,
Child_Deliver,
Child_Commit,
Child_Propose,
Child_Directive,
Child_Commissive,
Child_Meeting,
Child_dData
Commit
Deliver
Request
0
0.1
0.2
0.3
0.4
0.5
Kappa Values (%)
Kappa Values on 1F3 using Relational (Context) features
and Textual (Content) features.
Set of Context Features (Relational)
Collective Classification using Dependency
Networks
Dependency networks are probabilistic graphical models in which the full
joint distribution of the network is approximated with a set of conditional
distributions that can be learned independently. The conditional probability
distributions in a DN are calculated for each node given its neighboring nodes
(its Markov blanket).


Pr( X )   Pr( X i | NeighborSet ( X i ))
i
No acyclicity constraint. Simple parameter estimation – approximate
inference (Gibbs sampling)

In

this case, Markov blanket = parent message and child message
Heckerman et al., JMLR-2000. Neville & Jensen, KDD-MRDM-2003.
Collective Classification algorithm
(based on Dependency Networks Model)
Agreement versus Iteration
Deliver
Commissive
Request
0.55

Kappa
0.5
0.45
0.4
0.35
0.3
0.25
0
10
20
30
Iteration
40
50
Kappa versus
iteration on 1F3
team dataset,
using classifiers
trained on 3F2
team data.
Leave-one-team-out Experiments

4 teams: 1f3(170 msgs),
2f2(137 msgs), 3f2(249
msgs) and 4f4(165 msgs)
Kappa Values
80
70



(x axis)= Bag-of-words
only
(y-axis) = Collective
classification results
Different teams present
different styles for
negotiations and task
delegation.
60
50
40
30
4f4
1f3
20
3f2
2f2
10
Reference
0
0
10
20
30
40
50
60
70
80
Leave-one-team-out Experiments
Kappa Values

Consistent
improvement of
Commissive, Commit
and Meet acts
70
60
50
40
30
20
Commiss/Commit/Meet
Direct/dData/Request
10
Proposal/Delivery
Reference
0
0
10
20
30
40
50
60
70
Leave-one-team-out Experiments

Deliver and dData
performance usually
decreases
Kappa Values
80
70

Associated with data
distribution, FYI,
file sharing, etc.
60
50
40
30

For “non-delivery”,
improvement in avg.
Kappa is statistically
significant (p=0.01 on a
two-tailed T-test)
20
Non-delivery
10
Deliver/dData
Reference
0
0
10
20
30
40
50
60
70
80
Act by Act Comparative Results
Baseline
Collective
43.44
44.98
dData
38.69
42.01
Deliver
40.72
36.84
Propose
49.55
47.25
Request
58.37
58.27
Directive
Meeting
47.81
52.42
32.77
30.74
Commit
Commissive
37.66
0
10
20
30
42.55
40
50
60
70
Kappa Values (% )
Kappa values with and without collective classification, averaged over the
four test sets in the leave-one-team out experiment.
Discussion and Conclusion

Sequential patterns of email acts were observed in the
CSPACE corpus.

These patterns, when studied an artificial experiment, were
shown to contain valuable information to the email-act
classification problem.

Different teams present different styles for negotiations and
task delegation.

We proposed a collective classification scheme for Email
Speech Acts of messages. (based on a Dependency
Network model)
Conclusion

Modest improvements over the baseline (bag of words)
were observed on acts related to negotiation (Request,
Commit, Propose, Meet, etc) . A performance deterioration
was observed for Delivery/dData (acts less associated with
negotiations)

Agrees with general intuition on the sequential nature of
negotiation steps.

Degree of linkage in our dataset is small – which makes
the observed results encouraging.