Scalable Ontological Sense Matching

Download Report

Transcript Scalable Ontological Sense Matching

Scalable Ontological Sense
Matching
Dr. Geoffrey P Malafsky
TECHi2 LLC, Fairfax, VA
Need for Smarter Systems
 Enormous and ever increasingly amounts of
data and information are available
 Potential exists for significant increases in
efficiency, effectiveness, and success in all
fields IF the data and information can be
harnessed
 Most common use case is information
overload- too much to sift through in too little
time and too little resources/support/authority
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
2
Major Challenges
 Value is subjective and context-based, i.e. not
deterministic
 Metrics and decision criteria are heavily dependent
on conditions, information uncertainty,
decision/activity timelines, vulnerability to error, risk
capacity
 Rules and interaction mechanisms are usually
nebulous, poorly defined, non-existent, or incorrect
 Information Technology approaches are immature
with handling this situation with poor scalability (e.g.
too computationally intensive, storage requirements,
security) and/or untrustworthy results
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
3
Advanced Techniques
 Improve machine understanding of
information by annotating meaning (e.g.
Ontologies)
 Compute best scenario using domain models
built using Subject Matter Experts defining
core knowledge coupled to probabilistic fit
calculations
 Extract patterns from very large scale data
sets
 Hire lots of people
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
4
Example: Knowledge Discovery &
Dissemination (KDD)
 Seeks to find knowledge for practical
purposes within large scale data/information
stores
Cross organizational and functional boundaries
High relevance to searcher
Uncertainty is assessed and used
Secure
Rules and domain model based
Automated
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
5
Knowledge is Not Just Information or
Data
 Knowledge has:
 Context: what is it about?
 Confidence: is it right?
 Relationships: what does it have to do with that?
 Priorities: what is most important?
 Types
 Explicit knowledge is codified and can be manipulated
 Tacit knowledge is unspoken “know-how”
 Looks just like data when in an electronic system
 It is data
 Annotations on “about” and “how” tied to intelligent
application logic make it knowledge for user
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
6
KDD Knowledge Map
 Analysis of scientific and technological areas on emphasis
in Knowledge Discovery and Dissemination (KDD)
 Gaps reveal technical vulnerabilities
Total and TRL Rated Citations
350
Dissemination
300
Number of Citations
Discovery
Knowledge
250
Most research and
development is
concentrated in the
discovery portion of
KDD
200
150
100
50
0
Total
1-3
4-7
8-9
TRL
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
7
0
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
Security
Storage
On-demand
(TPPU)
Personalization
Agents
Distributed
architecture
Real-time
operations
Visualization
Feature extraction
+ analysis
Uncertainty mgmt
+ mitigation
Models
Info/data mining
Fusion
Lifecycle
Maintenance
Collection + Tacit
capture
Rules formation +
analysis
Representation
Information & Data Mining Dominates
KDD Focus
Citations
250
200
150
100
50
8
What is Missing to Make KDD Work
 Knowledge-based metadata architecture
 Predictive personalization algorithms
 Models of knowledge lifecycle
 Computational knowledge techniques
 Real-time analysis with large data and high
uncertainty
 Conduit to feedback from end-users to
capture and evolve domain knowledge
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
9
Bringing Structure to Unbounded
Knowledge
 Knowledge Mgmt systems have failed to
meet operational requirements
Knowledge is inherently expansive and evolving
IT tends to collect and organize assets without
relevance, context, ..
Level of Effort to manually collect, map, cleanse
source data/information is too high
AI is not around the corner
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
10
Structured Knowledge
 Applying a structured framework creates repeatable,
interoperable, consistent solutions
 Knowledge fidelity is maintained with combination of
human and machine processable representations
 Unknowns are discovered using known analogies
via triangulation
 Reduction of universe of possible combinations of
knowledge and user needs to engineering scale
solutions
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
11
Applying Quantum Mechanics to
Knowledge Processing
 Even the small area of this room has an infinite number of
possible combinations of things (macro, micro, atomic,
subatomic)
 Representing these “knowledge” and “states” with a domain
model reducing infinite possible to a tractable few:
Schrödinger Equation (H = E )
 Wavefunctions describe specific states (4 quantum numbers)
 Connection between objects defined by overlap integral of
wavefunctions
  *0  1 = Degree of match
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
12
The KORS™ Framework
 Knowledge: collected using templates from SMEs
 Ontologies: Conceptual models of domain
knowledge
 Rules: Business and technical rules are extracted
and defined from domain knowledge and ontologies
 Semantic metadata: knowledge, ontology
relationships, and rules are connected to and
represented with data and information
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
13
KORS Structured Knowledge
 Knowledge is inherently expansive and evolving
 KM systems have failed to meet operational requirements
 IT collects & organizes assets without relevance, context, ..
 Level of Effort is too high to manually collect, map, cleanse
source data/information
 KORS™-pending framework creates repeatable, interoperable,
consistent solutions
 Knowledge fidelity is maintained with combination of human
and machine processable representations
 Ontologies (concepts) are expressed with domain specific and
standard terms
 Reduction of universe of possible combinations of knowledge
and concepts to engineering scale solutions
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
14
KORS Ontologies
 Conventional wisdom
 Multi-tiered, fully explicit, broad coverage
= Too large; Too difficult to maintain; Too hard to implement
 KORS™ Ontologies
 Cross-domain framework, domain specific instances of
classes, leverage existing ontologies, concepts defined
with domain-based uncontrolled vocabulary AND common
controlled vocabulary
= Smaller; easier to maintain; supports engineering processes
 Answers the broad question: ”Do the concepts in this other
ontology have semantic similarity to those in my ontology?”
 Semantic metadata used to characterize domain ontologies
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
15
Conventional Wisdom
 Upper,
middle and
lower
ontologies.
 Every
concept
made fully
explicit
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
16
The KORS Engineering Framework
Structured
knowledge capture
Incorporate
within the
engineered
solution
Identify rules and
metadata structures
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
17
Cross-Concept Overlap Calculation
 Overlap integrals calculated from semantic metadata – updated
when ontologies change
S A B   On[domain A (termA )]  On[domainB (termB )]
 Overlap (S) is computed at:
 Ontology-ontology level using primary task-description pairs
 Term level using allowed and disallowed senses (not just synonyms)
 Real-time determination using coarse-medium-fine concept match
 Coarse= ontology-ontology
 Medium=Ontology-term
 Fine=term-term
 With metadata architecture implementation, calculation uses what
is available
 Inherently scalable, distributed with evolving improvements
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
18
Semantic Variance Across Domains
 For example, the term “Insurgent” means:
Mission Planner:
person who takes part in an armed rebellion against the
constituted authority
Geospatial Analyst:
someone who participates in a peaceful public display of
group feeling
Diplomatic Corps:
someone who participates in a public display against an
established government
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
19
Semantic Challenges
 Semantic Consistency
 Semantic Variance Across Domains
 Search and Discovery Requirements
 Controlled
 Uncontrolled Vocabularies
 Have “local” variants
Navy fliers and Air force Pilots
 Change to match changing reality
Yesterday’s friend could be tomorrow’s foe
 Change to match changes in policy (spin)
Today, “freedom fighter;” tomorrow, “insurgent”
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
20
KORS is Extensible, Scaleable, Adaptive
 Ontology level metadata describes the basic
functional concepts and processes of the domain
 Can be linked to Enterprise Architecture products
 Direct conceptual match two functional domains
 Ontology term descriptions use:
 domain specific (uncontrolled) expressions
 Allowed senses from controlled vocabularies
 Disallowed senses from controlled vocabularies
 True meaning is found from combination of domain,
allowed, disallowed as is done in real language
 Metadata architecture: values used if present but
not required  scalability and extensibility
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
21
Example Domain: GEOINT
 Exploitation and analysis of imagery and geospatial
information to describe, assess, and visually depict
physical features and geographically referenced
activities on the Earth.
 GEOINT encompasses all the activities involved in
the collection, analysis, and exploitation of spatial
information in order to gain knowledge about the
national security environment, and the visual
depiction of that knowledge.
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
22
MyGEOINT Ontological Architecture
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
23
Functional Application
 Semantic Expansion
 Cross domain commonality
 Qualified synonym identification
>> discovery of potentially relevant knowledge
 Semantic Resolution
 Allowed and disallowed alternate semantics
 Binding of dynamic domain-specific semantics to controlled
vocabularies.
>> computable semantic comparisons and knowledge
relevance ranking
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
24
Cross domain commonality
 Answers the broad question:
“Do the concepts in this other ontology have
semantic similarity to those in my ontology?”
Semantic metadata used to characterize domain
ontologies
Overlap integrals calculated from semantic
metadata – updated when ontologies change
Run-time ontology-to-ontologies greatly simplified.
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
25
Qualified synonym identification
 Knowledge discovery via synonym lists is
well supported by standardized lexical tools
(WordNet, etc)
 Broad-domain perspective limits ability to
isolate domain-specific usage
 KORS domain-specific ontologies and
semantic metadata allow the use of broadspectrum vocabularies to make fine-grain
distinctions.
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
26
Normalizing Uncontrolled Vocabularies
 Semantic Homing
 The Binding of dynamic domain-specific semantics to
controlled vocabularies
 Allows domain-specific ontologies to evolve
 Provides stable semantic anchors for knowledge
computability.
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
27
MyGEOINT: Ontology Knowledge Discovery
Ontology applies concept matching to
make discoveries more relevant
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
28
Functional Impact
 Discovery of knowledge sources of potentially
relevance.
 Simpler solutions,
 lower real-time computational requirements,
 practical multi-domain solutions.
 Computable semantic comparisons and knowledge
relevance ranking
 Directly computed rather than inferred
 Higher domain-level precision without the overhead of
extensive upper and mid-level ontologies
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
29