Language Resources and Tools For Supporting The System

Download Report

Transcript Language Resources and Tools For Supporting The System

Language Resources and Tools For
Supporting The System Engineering
Process
Onditi V. O. et. al.
Computing Department
Lancaster University
Overview


System Engineering is a collaborative process.
The process is characterised by decisions:



about the product
about the process
The decisions are used for estimating
individual’s contribution, system maintenance
and training
NLDB 2004
2
Challenges/Solutions

Challenges

Decisions aren’t adequately recorded


Decisions are unstructured and therefore
difficult to retrieve and use


Use different strategies for recording decisions
e.g. use both minutes and audio
Use formal representations during capture
Formalism introduces cognitive overload on
decision makers
NLDB 2004
3
Challenges/Solutions (Cont’d)

Decisions are implicitly recorded


Discover decisions through actions
Solution

Use unstructured representation to create
and share a structured representation
NLDB 2004
4
The architecture
Indicator Words
Document
Stream
Tokenization
Structure/Style Rules
POS/Semantic
Rules
POS/Semantic
Tagging
Extract Actions
Sentences
Syntactic Pattern
Store Action &
Context on DB
Html derived
document
NLDB 2004
5
Document tokenisation


Use document’s style and structure to
break a document into paragraphs.
Use string patterns to tokenise the
paragraphs into:

sentences, multi-word-expressions, and
words
NLDB 2004
6
Analysing a document’s
content

Analysis is done at two levels:



surface and
deep (syntactic/semantic)
In surface analysis:


choose and constrain indicator words.
use indicator words for identifying agenda
items and minute items.
NLDB 2004
7
Analysing a document’s
content (cont’d)

In deep analysis:



use part-of-speech (pos) attribute for
selecting content words (nouns, pronouns,
verbs, adjectives)
use a pos pattern for extracting action
statements (actions)
use semantic attribute to associate action
sentences
NLDB 2004
8
Template for action sentences

An action sentence comprises:



an object (action), a verb or verb phrase
a subject (agent), a noun or pronoun
Nouns and verb phrases are syntactically
arranged:


the subject appears at the head
the object appears at the tail
NLDB 2004
9
Template for action sentences
(cont’d)


a modal verb or function word ‘to’ ties the
subject and object
An action sentence template is defined
thus:


subject + modal verb/function word ‘to’ +
object
the template = NP* + VM/TO + V* in
CLAWS (Constituent Likelihood Automatic
Word Tagging Systems) tag set.
NLDB 2004
10
Template for action sentences
(cont’d)




NP* matches all proper nouns (subject)
VM/TO matches all modal verbs or function
word ‘to’
V* matches all verbs (object)
There are other ways to arrange a
subject and an object in a sentence.

the subject can be at the head instead of
the tail
NLDB 2004
11
Action template: An example
NLDB 2004
12
Action template: An example
(cont’d)

In the example, the elements


<s> = sentence, <w> = word
Element <w> has attributes:



id (identity) - identifies the ordinal number
of a sentence in a document and the
ordinal number of a word in a sentence
pos = part-of-speech
sem = semantic category
NLDB 2004
13
Action template: An example
(cont’d)

pos sequence from id 37.5 to 37.7:



is NP1, TO, VVI
matches the action template NP* + TO/VM
+ V*
The sentence is marked as an action
NLDB 2004
14
Action template: Results
NLDB 2004
15
Action template: Results
(cont’d)

Three sets (1,2,3) of minutes from four
organisations (A,B,C,D) were processed



Rel = relevant actions, Ret = actions
retrieved by the tool, RelRet = relevant
actions retrieved
Recall = RelRet/Rel, Precision = RelRet/Ret
Overall precision = 78, overall recall = 62
NLDB 2004
16
Representing extracted
information

Extracted information is represented in
a structured format:



agenda items, minute items and actions
are represented as database objects
associations between the objects are
captured
associations between the objects and the
minute documents are captured
NLDB 2004
17
Retrieving actions

Actions can be retrieved through:



browsing
query
The context of the actions can be
retrieved by jumping into the minute
document
NLDB 2004
18
Retrieving actions (cont’d)
NLDB 2004
19
Conclusion



Minute documents can be automatically
structured and efficiently shared.
Actions sentences can be automatically
extracted from minutes documents.
Process decisions can be tracked
through actions.
NLDB 2004
20