Putting Innovation to Work - Customer Projects from IBM`s Emerging

Download Report

Transcript Putting Innovation to Work - Customer Projects from IBM`s Emerging

SWG Strategy
P4 Task 2 Fact Extraction using a CNL
Current Status
David Mott, Dave Braines,
ETS, Hursley, IBM UK
Steve Poteet, Ping Xue, Anne Kao, Boeing
(C) Copyright IBM Corp. 2006, 2012. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Agenda
 Fact Extraction
 Extending ITA Controlled English
 Next Steps
2
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Supporting the analyst
doc27
doc27
doc27
Requirements
Assumption
s
NLP
Analysts Conceptual
Model
CE Facts
Query
Uncertainty
3
Product
Inference
Argumentation
Rationale
CE Tools
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Project 6 Task 2 Research Objectives
 Improve extraction of facts (in CE) from documents (in Natural Language)
– unambiguous "semantics" of the document
– machine can assist analyst in inference of new conclusions
 Provide rationale for linguistic and analytic processing
– allow the human to be part of the NL processing
– reasoning, argumentation about ambiguities, incomplete parsing ...
 Define a model of linguistics, grammar, semantics
– facilitate configuration of NLP tools in a CNL
– human analyst can better understand the processing
 Improve Expressibility of CE
– interest in CE, but needs a more "stylistic" grammar
 How is the Natural Language Processing related to the "Analysts Conceptual
Model" (ACM)
4
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Team
 Dave Braines, David Mott
– IBM, Hursley
 Steve Poteet, Ping Xue, Anne Kao
– Boeing, Seattle
 Paul Smart, Antonio Penta, Ron Tasker
– University of Southampton
5
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Fact Extraction
6
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
SYNCOIN sentences
 Corpus of sentences from Penn State University
 Based on a synthetic Counter Intelligence
scenario
– typical messages from reports
– several "hidden" narrative threads
 Representative of the problems we are trying to
solve
The Christian market in Abu Dasheer attacked today as a local Christian
congregation gathered for church services nearby.
BCT patrol in East Rashid discover a bomb-making facility on Abu Tajara
Street //MGRSCOORD: 38S MB 43655 78909//
7
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Example processing (1)
BCT patrol in East Rashid discover a bomb-making facility on Abu
Tajara Street //MGRSCOORD: 38S MB 43655 78909//
the patrol unit '|BCT patrol|' finds the facility '|p6|' and is contained in
the place '|East Rashid|' and is located in the place '|East Rashid|'
and is a NATO military unit.
...
ISSUES:
BUT:
 names are a bit strange
 this is CE, fully conformant to the ACM
 unnecessary "contained in"
 this is machine-processable
 missed the bomb-making and the "on ..."  this has a defined semantics
 ignored the MGRS information
 rationale for processing is available
"a NATO military unit" is unnecessary?
8
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Example processing (2)
The Christian market in Abu Dasheer attacked today as a local
Christian congregation gathered for church services nearby.
the attack '|te2|' has the christianentity '|c2|' as agent role and has the
christianentity '|c2|' as perpetrator.
the christianentity '|c2|' is contained in the place '|Abu Dasheer|' and
is located in the place '|Abu Dasheer|' and is a market and is a agent.
...
ISSUES:
BUT (again):
 names are a bit strange
 this is CE, fully conformant to the ACM
 unnecessary "as agent role" "is contained in"  this is machine-processable
 treated as ACTIVE voice; agent is wrong
 this has a defined semantics
 relation of two sentences not handled
 rationale for processing is available
9
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Current NL Processing
SYNCOIN
Reports
Just
exploratory
steps
Message
PreProcessor
Proper Nouns
(places, units)
Stanford
Parser
Entity
Extractor
Situation
Extractor
CEStore
Names
CE
Aggregator
"Stylistic" CE
Conceptual Model
(concepts, logical rules, linguistic expression)
10
For Analysis
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Conceptual Model(s)
Meta Model
Concept, Entity Concept, Relation Concept,
Conceptual Model
belongs to, has as domain
Semiotic
Triangle
Thing, Meaning, Symbol
stands for, expresses
General
Agent, Spatial Entity, Temporal Entity, Situation,
Container
has as agent role, is contained in
Linguistic
Sentence, Phrase, Word, Noun, Linguistic Category,
Linguistic Frame
has as modifier, is parsed from
ACM
Place, Church, Person, Village, IED, Facility, ....
is located in
meaning
expresses
symbol
conceptualises
thing
stands for
"Our" Semiotic Triangle, based on the original [Ogden, C. K. and Richards, I. A. (1923). ]
11
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Stanford Parser
 Developed by Stanford University, available as
Open Source, with Java API
 Probabilistic context free grammar, trained on
standard corpus
 Produces a syntactic parse tree using Penn tags
BUT:
 Does not provide semantics
 Does not link to a conceptual model
 Does not always parse "correctly"
But the Stanford parser does more than we currently make use of
12
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Stanford Parser Agent
We have wrapped the Stanford parser to produce CE
Parse tree:
the noun phrase ph14_21 has '(NP (DT a) (JJ bomb-making) (NN
facility))' as raw parse tree.
the noun phrase ph14_21 has the noun '|facility|' as head and
has the adjective '|bomb-making|' as modifier and
has the determiner '|a|' as modifier and
has "a bomb-making facility" as covered text.
Semiotic triangle
associations:
the noun phrase ph14_21 stands for the thing [14_21].
Proper names from
noun phrases
the noun phrase ph14_12 has the proper noun '|East Rashid|' as
proper name head.
the verb phrase ph14_17 stands for the thing [14_17].
some linguistic commitments embedded in the code
13
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Entity Extraction Agent
Constructs CE to describe information about "things"
"Common names" from
proper nouns
the thing '[14_12]' has the proper noun '|East
Rashid|' as common name.
"same as" relations from
same common names,
inferring new properties
the thing '[14_12]' is the same as the place '2973'.
the place '2973' has the proper noun '|East Rashid|'
as common name and has '34_234512,65_52063' as
coordinates. (reference information)
the thing '[14_12]' is a place and has
'34_234512,65_52063' as coordinates.
14
Conceptualisation from
heads/modifiers
there is a market named '[1_3]'.
Relationships from
prepositional phrases
the patrol unit '[14_3]' is contained in the container
'[14_12]'.
Specialised relationships
from ACM semantics
the patrol unit '[14_3]' is located in the place
'[14_12]'
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Sample Entity Extraction Rules
if
( the noun phrase NP has the noun N as head and stands for the thing T )
and
( the noun N expresses the entity concept C )
then
( the thing T is categorised as the concept C )
.
the ACM defines the
link between noun
and concept
if ( the thing T is contained in the place P )
then
( the thing T is located in the place P )
.
15
the ACM provides
specific information
about the
relationships
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Situation Extraction Agent
Constructs CE to describe information about "situations"
Situations from verb the verb phrase ph14_17 stands for the
phrases
situation [14_17].
Situation concepts
from phrase head
the situation [14_17] is a discovery.
Semantic roles from the discovery '[14_17]' has the patrol unit
phrase structures
'[14_3]' as agent role.
16
Specialised roles
from ACM
semantics
the discovery '[14_17]' has the patrol unit
'[14_3]' as finder.
Specialist
relationships from
ACM roles
the patrol unit '[14_3]' finds the facility
'[14_20]'.
(Limited) timings
from verb tense
the utterance u1 occurs after the attack
'[1_19]'.
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Sample Situation Extraction Rules
if ( the verb phrase VP has the noun phrase NP as modifier and
stands for the situation VPT ) and
( the noun phrase NP stands for the thing NPT )
then
( the situation VPT has the thing NPT as patient role )
Linguistic structure of
verb phrase defines
the "patient" role
eg "finds the facility"
.
if ( the discovery D has the agent A1 as agent role )
then
( the discovery D has the agent A1 as finder )
.
17
The ACM provides
specific information
mapping roles to
relations
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Approximate Flow of conceptualisation
BCT patrol in East Rashid discover a bomb-making facility on Abu Tajara Street
nounphrase
verbphrase
prepphrase
prepphrase
thing
thing
nounphrase
thing
thing
contained in
agent role
Common Names
|BCT patrol|
situation
patient role
discovery
facility
generic
linguistic model
the word W expresses
the concept C
patrol unit
X is the same as Y
place
|East Rashid|
coordinates
located in
finder
18
Parser
(head/modifiers)
finds
find
ACM logical
relationships
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
CE Aggregation
 Agents generate "basic CE"
 Each rule contributes knowledge about the entities
 Complete knowledge must be presented in a "stylistic" way, within
the limits of CE syntax, eg:
Try to follow
– combine into single sentences
"literary
devices"
– don't show superclass detail (is a person and is a thing)
– don't show super-relation detail (is contained in and is located in)
– don't show information already known (is a NATO military unit)
– prefer relation (X finds Y) to relational object (the discovery D has X as
finder)
– ...
 Want to improve CE syntax, eg:
– allow prepositional phrases to integrate more than two components of
a relationship
19
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Demonstration/testing tool
20
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Rationale
Experimenting to show rationale in best way!
21
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Rationale (another example)
22
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Some Key Issues
 How is the conceptual model linked to the
linguistic information?
 What happens if the parse is incorrect?
 How do we handle more complex structures and
semantics?
 How do we handle ambiguity?
 How do we improve CE style?
23
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Linking the conceptual model to linguistic information
 Fundamental relationship is the semiotic triangle:
– the symbol S expresses the meaning M
 Only the analyst knows what his/her concepts mean
– Must be ultimately responsible for linking the words and linguistic
expressions to the concepts, ie defining the "expresses"
 Don't want analyst to do too much work
– Provide tools to suggest "expresses"
– Provide predefined lexical models to choose from (since much of the
meaning will be common to normal language)
 The conceptual model should not contain significant additional
linguistic knowledge
– The concepts corresponds to one meaning
– The means of expression of the concepts in words should not be part of
the concept
• This allows the CE (defined in terms of the conceptual model) to be the
unambiguous and canonical semantic definition of a sentence
24
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
"Analysts Helper" to suggest linguistic expressions
wordnet
Conceptual
Model
Analyst
Helper
A "synset" is a
Wordnet
representation of
meaning (defined as a
set of words)
the concept C has the same meaning as the synset S1.
Option1
the noun W expresses the concept C.
the noun W1 expresses the concept C.
the concept C has the same meaning as the synset S2.
Option2
Analyst
Choice
25
the noun W2 expresses the concept C.
the noun W3 expresses the concept C.
the word W expresses the concept C.
itanet
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Mapping CE concepts to words via WordNet synsets
“meeting of minds”
meaning
{tank,armoured combat vehicle}
synset
the synset {tank, armoured combat vehicle}
means the same as the concept tank.
meaning
conceptualise
a ~ tank ~ T.
concept
lexicographer
analyst
the synset {tank, armoured combat vehicle}
has the word sense tank/1 as component.
armoured
combat vehicle/1
word
sense
armoured
combat
vehicle
26
tank/1
the word |tank| expresses the concept tank .
word
sense
tank
word
word
The Analyst STILL has
to decide the lexical
relations, since only he
knows what his
concept is
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Rationale for entity extraction
there is an ita synset
named S.
the word W expresses
the word sense WS
the concept C has the same
meaning as the synset S.
Document
the word sense WS
adds meaning to the ita
synset S.
Stanford Parser
Analyst Helper
the noun phrase NP has the
word W as head/modifier
wordnet
the word W expresses
the word sense WS
the word sense WS
adds meaning to the
wordnet synset S.
(General Semantics)
Wordnet
Inference
the word W expresses
the concept C.
the noun phrase NP
stands for the thing T.
Entity Extractor
the thing T is categorised as
the concept C
27
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Extensions needed
 To handle verbs
 To define special ACM relationships:
if ( the discovery D has the agent A1 as agent role )
then
( the discovery D has the agent A1 as finder ).
 To suggest more "remote" alternative words based on
relationships in Wordnet
 To allow combinations of words from different synsets
(leading to a new ITA synset)
28
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
One concept – many linguistic "viewpoints"
the discovery d has the patrol p1 as finder and
has the facility f1 as find.
the finder p1
patrol:p1
agent
role
discovery:d
patient
role
facility:f1
the find f1
the patrol p1 finds the facility f1
the patrol p1 will find the facility f1
How does the analyst create these easily?
29
the patrol p1 found the facility f1
Use existing linguistic
resources? (Wordnet,
tables of verbs)
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
"Linguistic Viewpoints"
the linguistic frame v1
is based on the statement that
( the discovery D has the thing T1 as agent role and
has the thing T2 as patient role ) and
has the statement that
( the thing T1 is a finder ) as subject viewpoint and
has the statement that
( the thing T1 finds the thing T2 ) as relation viewpoint and
has the statement that
( the thing T1 found the thing T2 ) as past relation viewpoint.
We could generate ACM rules from this?
related to the CE parser linguistic frames?
30
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Learning from the corpus?
 Could the NL parser flag up when a phrase is not turned into a
concept?
– the adjective |bomb-making| is an unprocessed word.
– the analyst is asked via "Helper" to choose an "expresses" link
 Could the NL parser "guess"?
– it is assumed to degree X that the adjective |bomb-making| expresses
the concept 'IED facility'.
– the system continues the reasoning but marks the consequences as
dependent upon the assumption
– later "argumentation" could review this assumption
– but what information could be used to guess: wordnet?
31
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Issues
 How is the conceptual model linked to the linguistic
information?
 What happens if the parse is incorrect?
– investigate Stanford parser generating alternatives?
 How do we handle ambiguity?
– could we use assumptions to maintain alternatives?
– could we use ACM semantics to rule out inconsistencies?
 How do we handle more complex structures and semantics?
 How do we improve stylistic expressiveness of CE?
– extend CE grammar
32
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Extending ITA Controlled English
33
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Strong need to improve stylistic expressiveness
 Allow "common name" identity handling
– the person John ...
 Prepositional phrases
– in, at, on
 Adjectives
 Reduce need to state the type explictly
– John ...
 Collections
– the group of...
 Tense and aspect inflection in verbs
 ...
John met the group of US soldiers in East Rashid
34
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Our design principles for CE enhancement
35
1.
Retain existing principles of a CE conceptual model
2.
Based on full English grammar
3.
Chart parser for efficient syntax parsing
4.
Formal semantics, based upon scientific theory
5.
Higher level extensions handled in same theory
6.
Parser configurable in CE, based on linguistic model
7.
Modelling of Sentence or Dialog Context
8.
Follow same principles in both NL and CNL processing
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Parallel NL and CNL parsers
stylistically expressive CE
NLP
Reference
English
Grammar
lexicon
CNL Parser
NL Parser
Semantic
Theory
conceptual
model
stylistically expressive CE
Better understanding of linguistics
Increase stylistic expressibility of CE
36
basic CE or
predicate logic or
CE-in-Java
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
The ambiguity barrier
Ambiguity
Full English
domain specific syntax
sub clauses
anaphoric reference
verb inflections
Ambiguity Barrier
prepositional phrases
flexible identities
Basic CE
 we start from basic CE and move towards full English
 Can we control the crossing of the ambiguity barrier?
37
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Stanford parser as reference
there is a person named Joe.
Stanford Parser
 But only provides reference syntax
38
CE parser
What about the
semantics?
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Linguistic Frame for semantics
there is a linguistic frame named vp0 that
v(T) T=OBJ,...
verb
phrase
has 'is the dog Fido' as example and
defines the verb phrase VP_vp0 and
syntax
has the sequence
( the copula BE_vp0 , and the noun phrase OBJ_vp0 )
copula
is
the dog fido
as syntactic pattern and
is predicated on the thing T and
v(OBJ),
dog(OBJ)..
noun
phrase
has the statement that
semantics
( the noun phrase OBJ_vp0 is predicated on the thing OBJ )
and
( the thing T is the same as the thing OBJ )
as semantic statement.
Linguistic
Model
the word |is| belongs to the linguistic category 'copula'.
the word |dog| is a noun.
Analyst's
Conceptual
Model
39
We want exactly the
same logic here as in
the real NL
processing
the entity concept ce:Dog is expressed by the word |dog| and
has 'dog' as concept term.
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Hierarchy of linguistic frames
John attends a meeting
with Jane.
specialist CE
syntax
semantics
domain CE
the person John attends the
meeting X and the person Jane
attends the meeting X.
syntax
semantics
linguistic CE
there is a meeting X that
has the person John as agent role and
has the person Jane as agent role.
syntax
semantics
predicate CE
Is this only
necessary for
logicians?
the formula f3 has the statement that
( there is a meeting situation [123] that has the
person Jane as agent role and has the person John
as agent role ) as semantic expression
syntax
semantics
Predicate Logic
40
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Extended CE Parser
Stylistic CE
Chart Parser
syntactic pattern
lexicon of words,
categories and
syntactic features
mapping
to concepts
Phrase structure
grammar
lexical categories
annotations
lock-step
Parse Trees
(1-1)
Analyst's
Conceptual
Model
Semantic
representation
and
combination
Logical
Representation
Semantic
processor
basic CE
semantic statement
linguistic frame
Current parses some of
basic CE, configured via
linguistic frames
41
Linguistic
Model
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
42
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Output from CE parser
This slide shows old
output
43
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Next Steps
 Extend Analyst Helper to handle verbs
 Sort out naming
 Start to extend CE (prepositional phrases)
 Extend coverage of SYNCOIN sentences
44
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Backup
45
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Extractor/Anaphor Agent
Stanford
Parser
Linguistic
Model
Analysts
Model
Anaphor
Resolution
Rules
Java Agent
Parse Tree
CE Store
Linguistic
Model
Java Agent
SYNCOIN
sentences
Analysts
Model
Entities
and "same
as"
relations
Entity
Extraction
Linguistically
Identified
•Stanford Parser reads SYNCOIN data and generates parse trees
•Anaphor/Extractor Agent reads parse information and uses rules + models to:
• turn noun phrases into entities ("market")
• link noun phrases that are anaphoric references ("he")
46
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Combining Linguistic and Analytic Rationale
 A fact extracted by a parser may lead to conclusions via analysts
reasoning
– may include assumptions and uncertainty
 The extraction of the fact may itself include assumptions and
uncertainty
 The total rationale graph of linguistic and analysts reasoning shows all
sources of uncertainty
– removing a linguistic assumption may lead to no support for the analysts
conclusions
 Argumentation may need to occur at both the linguistic and analytic
level
– but different skills (and people) needed for the different levels
47
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
CE rules to use WordNet to relate words to concepts
Analyst provides the
link between his
meaning and a
standard meaning
if ( the synset S means the same as to the concept C ) and
( the synset S has the word sense WS as component ) and
( the word sense WS has the word W as word )
then
( the word W expresses the concept C )
Now the parser can link
words to concepts
48
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Using WordNet to extend the linguistic mappings
meaning
{tank,armoured combat vehicle}
“meeting of minds”
meaning
conceptualise
a ~ tank ~ T.
the synset {tank, armoured combat vehicle}
means the same as the concept tank.
synset
concept
lexicographer
analyst
the synset {tank, armoured combat vehicle}
has the word sense tank/1 as component.
49
the synset {military vehicle}
means the same as the concept tank.
armoured
combat vehicle/1
tank/1
tank
the synset ‘{tank,armoured combat
vehicle} ' is a hyponym of the
synset ‘{military vehicle}'.
word
sense
word
sense
word
word
armoured
combat
vehicle
‘{military vehicle}'.
synset
the word |military vehicle|
expresses the concept tank.
word
military vehicle.
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
CE rules to use WordNet to extend word-to-concept relations
if ( the synset S means the same as the concept C ) and
( the synset S is a hyponym of the synset Super )
then
( the synset Super means the same as the concept C )
.
50
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Controlled English
 A Controlled Natural Language, being a subset of English
– limited syntax, but still readable as English
– meanings of the expressions unambiguously defined
 Avoids the complexity of a real Natural Language
– computer systems can read, interpret and apply it
 Retains the appearance of a real language
– humans can naturally use it, without learning "computer speak"
 The analyst may use Controlled English to
construct their Conceptual Model
the person John is married to the person Jane and has red as hair colour.
51
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
We have used CE to model:
•
•
•
•
•
•
Collaborative Planning
Analysis of IED activities and societal influences
Matching Sensors to Missions
Provenance
Social Networks (Twitter)
UK Government data (crimes, accidents,
schools)
• NL processing itself
[52]
52
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Simplistic Anaphor Rules in CE
if ( the noun phrase NP has the personal pronoun PRP as head )
then
Needs to handle more
categories
( the noun phrase NP is an anaphor )
.
if ( the noun phrase NPA is an anaphor ) and
Needs much more rules with
selection constraints on the
target NP
( the noun phrase NPA follows the noun phrase NP ) and
( the noun phrase NP stands for the man T ) and
( the noun phrase NPA stands for the man TA )
then
( the noun phrase NPA is coreferent with the noun phrase NP )
.
if ( the noun phrase NP1 is coreferent with the noun phrase NP2 ) and
( the noun phrase NP1 stands for the thing T1 ) and
( the noun phrase NP2 stands for the thing T2 )
then
( the thing T1 is the same as the thing T2 )
.
53
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Extended CE Parser
person(Joe)
S
exists(A)
v(A), A=Joe, person(A)
NP
VP@copula
v(A), A=Joe, person(A)
EX
there
VBZ@be
NP @postmodifier
v(A), A=Joe
is
NP
v(A), person(A)
DT
a
VP @nonfinite
NN
person
VBN
NNP
named
Joe
Full English Syntax
Semantics (based on Montague semantics)
54
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Extended CE Parser Agent
Grammar
pattern
Lexicon
mapping
to concepts
CE parser
CE
semantics
semantic
statement
Linguistic Frame
Java Agent
Analysts
Model
SYNCOIN
sentences
Entities
CE Store
SYNCOIN
Model
Predicate
Logic Model
•CE Parser agent reads SYNCOIN data and runs simple CE linguistic frames
•Agent extracts best" parse", turns into low level CE
•This is simple entity extraction
• when the noun phrase is at the start ("the man ...")
55
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
CE fact extraction framework
SYNCOIN Sentence
SYNCOIN Sentence
as parsed by Stanford Parser + CE semantic extraction rules
as parsed by CE Parser + CE semantic extraction rules
Basic syntactic parse tree information from Stanford Parser
Semantic information more general than the ACM
Semantic information added from Analysts Conceptual Model
CE facts extracted from sentence
56
Basic syntactic parse tree information from CE Parser
Semantic information more general than the ACM
Semantic information added from Analysts Conceptual Model
CE facts extracted from sentence
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Prepositional phrase "in" as a container
57
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
SWG Strategy – Emerging Technology Services, Hursley
Allowing analyst to define how words express concepts
Document
Stanford
parser
the noun phrase NP has the word W as
head/modifier and stands for the thing T.
Conceptual
Model
Analyst
Helper
Analyst
wordnet
58
the concept C has the same meaning
as the synset S.
Entity
Extractor
the thing T is categorised as the
concept C.
itanet
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.