HLT-ACTS_2006 - Carnegie Mellon University
Download
Report
Transcript HLT-ACTS_2006 - Carnegie Mellon University
HTL-ACTS Workshop, June 2006, New York City
Improving Email Speech Acts Analysis via
N-gram Selection
Vitor R. Carvalho & William W. Cohen
Carnegie Mellon University
Outline
1.
Email Speech Acts: Can we do it? What for?
2.
Language Cues
3.
Introduction
Data
Applications
Preprocessing
N-grams
Results
Motivation
Email classification for
topic/folder identification
spam/non-spam
Speech-act classification in conversational speech (aka
dialog act classification)
email is new domain - multiple acts/msg
Winograd’s Coordinator (1987): users manually annotated
email with intent.
Extra work for (lazy) users
Murakoshi et al (1999): hand-coded rules for identifying
speech-act like labels in Japanese emails
“Email Acts” Taxonomy
From: Benjamin Han
To: Vitor Carvalho
Subject: LTI Student Research Symposium
An Act is described as a verb-noun
pair (e.g., propose meeting, request
information) - Not all pairs make
sense
When exactly is the LTI
SRS submission deadline?
Single email message may contain
multiple acts
Also, don’t forget to ask
Eric about the SRS
webpage.
Try to describe commonly observed
behaviors, rather than all possible
speech acts in English
Also include non-linguistic usage of
email (e.g. delivery of files)
Hey Vitor
Thanks.
Ben
Request
- Information
Reminder
- Action/Task
Classifying Email into Acts
[Cohen, Carvalho & Mitchell, EMNLP-04]
Verb
Commisive
Deliver
Verbs
Directive
Request
Commit
Propose
Amend
Noun
Activity
Event
Ongoing
Meeting
Other
Delivery
Opinion
An Act is a verb-noun pair (e.g.,
propose meeting)
One single email message may
contain multiple acts. Not all
pairs make sense.
Try to describe commonly
observed behaviors, rather than
all possible speech acts.
Also include non-linguistic
usage of email (delivery of files)
Data
Nouns
Most of the acts can be learned
(EMNLP-04)
Email Acts - Applications
Improved email clients.
Negotiating/managing shared tasks is a central use of email
Tracking commitments, delegations, pending answers
Integrating to-do/task lists to email, etc.
Kushmerick et al, AAAI-06
Email overload
Iterative Learning of Email Tasks and Speech Acts
Kushmerick & Khousainov, IJCAI-05, CEAS-05
Predicting Social Roles and Group Leadership.
Leusky, SIGIR-04
Carvalho et al. in progress
Data: CSPACE Corpus
Few large, free, natural email corpora are available
CSPACE corpus (Kraut & Fussell)
o Emails associated with a semester-long project for
Carnegie Mellon MBA students in 1997
o 15,000 messages from 277 students, divided in 50 teams
(4 to 6 students/team)
o Rich in task negotiation.
o 1500+ messages (5 teams) had their “Speech Acts”
labeled.
o One of the teams was double labeled, and the interannotator agreement ranges from 72 to 83% (Kappa)
for the most frequent acts.
Inter-Annotator Agreement
Kappa Statistic
A = probability of
agreement in a category
R = prob. of agreement
for 2 annotators labeling
at random
Kappa range: -1…+1
Inter-Annotator Agreement
Email Act
Kappa
Deliver
0.75
Commit
0.72
Request
0.81
Amend
0.83
Meeting
0.82
Propose
0.72
PreProcessing
Signature and Quoted removal
Request Act: IG n-grams
Error Rate Analysis
0.3
1g (1354 msgs: EMNLP04)
1g (1716 msgs)
0.25
1g+PreProcess
Error Rate
1g+2g+3g+PreProcess
1g+2g+3g+4g+5g+Preprocess
0.2
0.15
0.1
0.05
Request
Commit
Deliver
Propose
Meet
dData
1g (1716 msgs)
1g+2g+3g+PreProcess
1
0.9
Precision
0.8
0.7
0.6
0.5
0.4
0.3
0
0.2
0.4
0.6
Recall
0.8
1
Idea: Predicting Acts from Surrounding Acts
Example of Email Thread Sequence
• Strong correlation with
previous and next
message’s acts
Delivery
Request
Request
Proposal
Delivery
Commit
Commit
Delivery
<<In-ReplyTo>>
Commit
• Act has little or no correlation
with other acts of same
message