Transcript ppt

Markov Model Based
Classification of Semantic Roles
A Final Project in Probabilistic Methods in AI
Course
Submitted By: Shlomit Tshuva, Libi Mann and Noam Ben Haim
The Problem




Different parts in the sentence denote
different semantic roles.
The team cars and publicity vehicles
will drive through the night
Automatically identify the
different roles
Good for Automatic Translation,
Question Answering and more
Self_Mover
Duration
The Graphical Model

Markov Chain:



Headwords (verbs and nouns, excluding adjectives
and determiners) as the Nodes.
Local Potentials – Estimated from FrameNet data
base, augmented with WordNet data, with nonzero probability for unseen data.
Transition Tables – From statistics on consecutive
Frame Elements.
Results




From 456 Sentences only 4 FE appeared
more than 3 times (GOAL, PATH,
SELF_MOVER and SOURCE).
Boundaries were not taken into account when
counting.
Both Precision and Recall measures are
~67%.
Major drawback is wrong boundaries for FEs,
and tendency of names to be attributed to
GOAL.
Problems

Sparse Data – Only 456 annotated sentences
in the largest annotated Frame, and not
statistically characteristic - usage based.





Not enough lemmas in the database.
Some Frame Elements (FE’s) appear only a few
times (Path, Source, Time).
Some words almost exclusively belong to a single
FE.
We tried to solve some of the lack of data w.r.t
lemmas by using WordNet for words relationships –
added some noise, but a good start.
A lot of sentences have large unmarked sections,
and when we have a word that appeared a lot in
some FE, it has a big prior for that FE.
Problems (Cont.)

Using only local dependencies





Hard to exit a FE – unless a significant headword
appears
The transition from FE to itself dominate the distribution
Treating ALL proper names the same – whether they
denote a Person (Usually SELF_MOVER) or a place (A
GOAL, SOURCE or AREA)
The information of the number of appearances of a
frame element in a sentence is lost.
Restricted usage of Syntax


Related to the local dependencies problem
But Syntax only is no good either (~69% with State of
the art systems)
Further Research

Add syntax in all levels.




Enhance data



Use syntactic constituents to estimate constituent specific
transition tables.
Use syntactic constituents to determine FE boundaries.
Larger windows.
More representative Local Potentials
Lemma specific transition tables
More extensive usage of WordNet


Differentiate between relations (we only used Hypernym
relation)
Wider search in the WordNet hierarchy (we only used siblings of
second order)