Scalable Ontological Sense Matching
Download
Report
Transcript Scalable Ontological Sense Matching
Scalable Ontological Sense
Matching
Dr. Geoffrey P Malafsky
TECHi2 LLC, Fairfax, VA
Need for Smarter Systems
Enormous and ever increasingly amounts of
data and information are available
Potential exists for significant increases in
efficiency, effectiveness, and success in all
fields IF the data and information can be
harnessed
Most common use case is information
overload- too much to sift through in too little
time and too little resources/support/authority
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
2
Major Challenges
Value is subjective and context-based, i.e. not
deterministic
Metrics and decision criteria are heavily dependent
on conditions, information uncertainty,
decision/activity timelines, vulnerability to error, risk
capacity
Rules and interaction mechanisms are usually
nebulous, poorly defined, non-existent, or incorrect
Information Technology approaches are immature
with handling this situation with poor scalability (e.g.
too computationally intensive, storage requirements,
security) and/or untrustworthy results
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
3
Advanced Techniques
Improve machine understanding of
information by annotating meaning (e.g.
Ontologies)
Compute best scenario using domain models
built using Subject Matter Experts defining
core knowledge coupled to probabilistic fit
calculations
Extract patterns from very large scale data
sets
Hire lots of people
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
4
Example: Knowledge Discovery &
Dissemination (KDD)
Seeks to find knowledge for practical
purposes within large scale data/information
stores
Cross organizational and functional boundaries
High relevance to searcher
Uncertainty is assessed and used
Secure
Rules and domain model based
Automated
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
5
Knowledge is Not Just Information or
Data
Knowledge has:
Context: what is it about?
Confidence: is it right?
Relationships: what does it have to do with that?
Priorities: what is most important?
Types
Explicit knowledge is codified and can be manipulated
Tacit knowledge is unspoken “know-how”
Looks just like data when in an electronic system
It is data
Annotations on “about” and “how” tied to intelligent
application logic make it knowledge for user
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
6
KDD Knowledge Map
Analysis of scientific and technological areas on emphasis
in Knowledge Discovery and Dissemination (KDD)
Gaps reveal technical vulnerabilities
Total and TRL Rated Citations
350
Dissemination
300
Number of Citations
Discovery
Knowledge
250
Most research and
development is
concentrated in the
discovery portion of
KDD
200
150
100
50
0
Total
1-3
4-7
8-9
TRL
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
7
0
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
Security
Storage
On-demand
(TPPU)
Personalization
Agents
Distributed
architecture
Real-time
operations
Visualization
Feature extraction
+ analysis
Uncertainty mgmt
+ mitigation
Models
Info/data mining
Fusion
Lifecycle
Maintenance
Collection + Tacit
capture
Rules formation +
analysis
Representation
Information & Data Mining Dominates
KDD Focus
Citations
250
200
150
100
50
8
What is Missing to Make KDD Work
Knowledge-based metadata architecture
Predictive personalization algorithms
Models of knowledge lifecycle
Computational knowledge techniques
Real-time analysis with large data and high
uncertainty
Conduit to feedback from end-users to
capture and evolve domain knowledge
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
9
Bringing Structure to Unbounded
Knowledge
Knowledge Mgmt systems have failed to
meet operational requirements
Knowledge is inherently expansive and evolving
IT tends to collect and organize assets without
relevance, context, ..
Level of Effort to manually collect, map, cleanse
source data/information is too high
AI is not around the corner
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
10
Structured Knowledge
Applying a structured framework creates repeatable,
interoperable, consistent solutions
Knowledge fidelity is maintained with combination of
human and machine processable representations
Unknowns are discovered using known analogies
via triangulation
Reduction of universe of possible combinations of
knowledge and user needs to engineering scale
solutions
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
11
Applying Quantum Mechanics to
Knowledge Processing
Even the small area of this room has an infinite number of
possible combinations of things (macro, micro, atomic,
subatomic)
Representing these “knowledge” and “states” with a domain
model reducing infinite possible to a tractable few:
Schrödinger Equation (H = E )
Wavefunctions describe specific states (4 quantum numbers)
Connection between objects defined by overlap integral of
wavefunctions
*0 1 = Degree of match
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
12
The KORS™ Framework
Knowledge: collected using templates from SMEs
Ontologies: Conceptual models of domain
knowledge
Rules: Business and technical rules are extracted
and defined from domain knowledge and ontologies
Semantic metadata: knowledge, ontology
relationships, and rules are connected to and
represented with data and information
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
13
KORS Structured Knowledge
Knowledge is inherently expansive and evolving
KM systems have failed to meet operational requirements
IT collects & organizes assets without relevance, context, ..
Level of Effort is too high to manually collect, map, cleanse
source data/information
KORS™-pending framework creates repeatable, interoperable,
consistent solutions
Knowledge fidelity is maintained with combination of human
and machine processable representations
Ontologies (concepts) are expressed with domain specific and
standard terms
Reduction of universe of possible combinations of knowledge
and concepts to engineering scale solutions
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
14
KORS Ontologies
Conventional wisdom
Multi-tiered, fully explicit, broad coverage
= Too large; Too difficult to maintain; Too hard to implement
KORS™ Ontologies
Cross-domain framework, domain specific instances of
classes, leverage existing ontologies, concepts defined
with domain-based uncontrolled vocabulary AND common
controlled vocabulary
= Smaller; easier to maintain; supports engineering processes
Answers the broad question: ”Do the concepts in this other
ontology have semantic similarity to those in my ontology?”
Semantic metadata used to characterize domain ontologies
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
15
Conventional Wisdom
Upper,
middle and
lower
ontologies.
Every
concept
made fully
explicit
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
16
The KORS Engineering Framework
Structured
knowledge capture
Incorporate
within the
engineered
solution
Identify rules and
metadata structures
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
17
Cross-Concept Overlap Calculation
Overlap integrals calculated from semantic metadata – updated
when ontologies change
S A B On[domain A (termA )] On[domainB (termB )]
Overlap (S) is computed at:
Ontology-ontology level using primary task-description pairs
Term level using allowed and disallowed senses (not just synonyms)
Real-time determination using coarse-medium-fine concept match
Coarse= ontology-ontology
Medium=Ontology-term
Fine=term-term
With metadata architecture implementation, calculation uses what
is available
Inherently scalable, distributed with evolving improvements
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
18
Semantic Variance Across Domains
For example, the term “Insurgent” means:
Mission Planner:
person who takes part in an armed rebellion against the
constituted authority
Geospatial Analyst:
someone who participates in a peaceful public display of
group feeling
Diplomatic Corps:
someone who participates in a public display against an
established government
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
19
Semantic Challenges
Semantic Consistency
Semantic Variance Across Domains
Search and Discovery Requirements
Controlled
Uncontrolled Vocabularies
Have “local” variants
Navy fliers and Air force Pilots
Change to match changing reality
Yesterday’s friend could be tomorrow’s foe
Change to match changes in policy (spin)
Today, “freedom fighter;” tomorrow, “insurgent”
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
20
KORS is Extensible, Scaleable, Adaptive
Ontology level metadata describes the basic
functional concepts and processes of the domain
Can be linked to Enterprise Architecture products
Direct conceptual match two functional domains
Ontology term descriptions use:
domain specific (uncontrolled) expressions
Allowed senses from controlled vocabularies
Disallowed senses from controlled vocabularies
True meaning is found from combination of domain,
allowed, disallowed as is done in real language
Metadata architecture: values used if present but
not required scalability and extensibility
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
21
Example Domain: GEOINT
Exploitation and analysis of imagery and geospatial
information to describe, assess, and visually depict
physical features and geographically referenced
activities on the Earth.
GEOINT encompasses all the activities involved in
the collection, analysis, and exploitation of spatial
information in order to gain knowledge about the
national security environment, and the visual
depiction of that knowledge.
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
22
MyGEOINT Ontological Architecture
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
23
Functional Application
Semantic Expansion
Cross domain commonality
Qualified synonym identification
>> discovery of potentially relevant knowledge
Semantic Resolution
Allowed and disallowed alternate semantics
Binding of dynamic domain-specific semantics to controlled
vocabularies.
>> computable semantic comparisons and knowledge
relevance ranking
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
24
Cross domain commonality
Answers the broad question:
“Do the concepts in this other ontology have
semantic similarity to those in my ontology?”
Semantic metadata used to characterize domain
ontologies
Overlap integrals calculated from semantic
metadata – updated when ontologies change
Run-time ontology-to-ontologies greatly simplified.
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
25
Qualified synonym identification
Knowledge discovery via synonym lists is
well supported by standardized lexical tools
(WordNet, etc)
Broad-domain perspective limits ability to
isolate domain-specific usage
KORS domain-specific ontologies and
semantic metadata allow the use of broadspectrum vocabularies to make fine-grain
distinctions.
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
26
Normalizing Uncontrolled Vocabularies
Semantic Homing
The Binding of dynamic domain-specific semantics to
controlled vocabularies
Allows domain-specific ontologies to evolve
Provides stable semantic anchors for knowledge
computability.
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
27
MyGEOINT: Ontology Knowledge Discovery
Ontology applies concept matching to
make discoveries more relevant
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
28
Functional Impact
Discovery of knowledge sources of potentially
relevance.
Simpler solutions,
lower real-time computational requirements,
practical multi-domain solutions.
Computable semantic comparisons and knowledge
relevance ranking
Directly computed rather than inferred
Higher domain-level precision without the overhead of
extensive upper and mid-level ontologies
Dr. Geoffrey P Malafsky, TECHi2, CapSci 2006,
ASTI
29