Technologies

Download Report

Transcript Technologies

Open Sources -- Intelligence
The Ugly
Challenges
The Good
The Bad
Reading



The Good: Rasmus Rosenqvist Petersen and Uffe Kock Wiil. 2011.
Hypertext structures for investigative teams. In Proceedings of the
22nd ACM conference on Hypertext and hypermedia (HT '11). ACM,
New York, NY, USA, 123-132.
The Bad: C. Farkas and A. Stoica, “Correlated Data Inference in
Ontology Guided XML Security Engine,” Proc. of IFIP 17th WG
11.3 working conference on Data and Application Security, 2003.
The Challenges: Joseph V. Treglia and Joon S. Park. 2009. Towards
trusted intelligence information sharing. In Proceedings of the ACM
SIGKDD Workshop on CyberSecurity and Intelligence Informatics
(CSI-KDD '09)
CSCE 727 - Farkas
2
The Good:
Support for Data Integration
and Analysis
CSCE 727 - Farkas
3
Intelligence Analysis

Law Enforcement Investigations
 Investigative teams:
– Collect, process, analyze information related to
a specific target
– Disseminate findings

Need automated tool to support activities
CSCE 727 - Farkas
4
Application Areas



Policing
– Reactive nature
– Incident-driven response
Counterterrorism
– National security
– Proactive
– Covert operations
Investigative journalism
– “Wrong doing” of organizations, influential individuals,
etc.
CSCE 727 - Farkas
5
Knowledge Management

Acquisition: collect and process data
– Traditional methods
– Artificial Intelligence: Machine Learning

Synthesis: create model of the target
 Sense making: extract useful information
 Disseminate findings: appropriate
representation for the appropriate audiance
CSCE 727 - Farkas
6
Case Study

Kidnapping of Daniel Pearl, Wall Street
Journal bureau chief in 2002
 The Perl Project, Georgetown Univeristy,
http://pearlproject.georgetown.edu/press_pe
arlrelease.html
 Complex mapping of people, situations,
locations, etc. to find kidnappers
CSCE 727 - Farkas
7
Technology Involved in
Visualizing Information

White Board – link chart
– Useful to model entities, their attributes, and
relationships
– Complex data types (text, images, symbols,
etc.)
– Becomes complex
– Difficult to share
CSCE 727 - Farkas
8
Computer Support –
Functionality

Acquisition: import, drag-drop, cut/past
 Synthesis: add/modify/delete entities, relations, restructure, group, collapse/expand, brainstorming
 Sense-making: retracing, creating hypothesis and
alternative interpretations, prediction, exploring
perspectives, decision making
 Dissemination: storytelling, report generation
CSCE 727 - Farkas
9
Hypertext Structuring
Mechanism





Associative structures – extended to handle
composites (supports synthesis)
Spatial structure – handle emerging and dynamic
structures over time
Taxonomy structures – supports classification tasks
Issue-based structures – support argumentation and
reasoning
Annotation and metadata structure – add semantics
CSCE 727 - Farkas
10
Past
Copyright: Rasmus Rosenqvist Petersen and Uffe Kock Wiil. 2011. Hypertext structures for investigative teams. In Proceedings of the
22nd ACM conference on Hypertext and hypermedia (HT '11). ACM, New York, NY, USA, 123-132.
CSCE 727 - Farkas
11
With the CrimeFighter Investigator
Copyright: Rasmus Rosenqvist Petersen and Uffe Kock Wiil. 2011. Hypertext structures for investigative teams. In Proceedings of the
22nd ACM conference on Hypertext and hypermedia (HT '11). ACM, New York, NY, USA, 123-132.
CSCE 727 - Farkas
12
What Would be Better?

Automated Data Collection
 Semantic-based Data Integration
 Intelligent Data Analysis
 Assurance of results
CSCE 727 - Farkas
13
The Bad:
Unauthorized Disclosure
CSCE 727 - Farkas
14
The Bad: A. Stoica and C. Farkas,
“Ontology guided Security Engine,”
Journal of Intelligent Information
Systems, 23(3): 209-223, 2004.
(http://www.cse.sc.edu/~farkas/publications
/j5.pdf )
Computer Science and Engineering
15
Semantic Web
• Open, dynamic environment
• Large number of users, agents,
resources
•Semantic tools
• Autonomous agents
• Machine understandable in data semantics
• Computers exchange information
transparently on behalf of the user
Computer Science and Engineering
16
IS INFERENCING ON THE
SEMANTIC WEB CREATES
A
SECURITY PROBLEM?
Computer Science and Engineering
17


Motivation 1: Simulation Exploitation Using
Open Source Information
Objective: US Government would like to share a limited
simulation software with friendly countries.
– Can this software be used to explore the capabilities
of US weaponry?
– Can sufficient information be found from public
sources to create such simulation?
Findings:
– Most of the information needed for the simulation
was available on the Internet.
– Needed human aid to combine available information
Computer Science and Engineering
18
Motivation 2: Homeland Security


Objective: Hide location of water reservoirs supplying
military bases to limit terrorist activities.
– Can location of a reservoir of a military base be found
from public data on the Internet?
Findings:
– Location of a military base and water reservoirs of
that region are available on the Web.
– Needed human aid to combine available information
Computer Science and Engineering
19
The Inference Problem
General Purpose Database:
Non-confidential data + Metadata 
Undesired Inferences
Semantic Web:
Non-confidential data + Metadata +
Computational Power + Connectivity 
Undesired Inferences
Computer Science and Engineering
20
The Inference Problem
• Given
• a set of confidential
information,
• large amount of public data, and
• semantic relationship of public
data.
• Is it possible to deduce the
confidential information from the
semantically enhanced public data?
• Security violation = disallowed data
can be deduced from public data
Computer Science and Engineering
21
Ontology Guided XML Security
Engine (Oxegin)
Public
Organizational Data
Public User
Ontology
Web Data
Confidential
Replicated Correlated
Data Inf.
Data Inf.
Oxegin
Computer Science and Engineering
22
Correlated Data Inference
• Finds confidential information from public data
(sensitive associations)
• Inference guidance:
– Ontology concept hierarchy
– Structural similarity of public data
• Features of similarity
– Levels of abstraction for each node
– Distance of associated nodes from association root
• Similarity of the distances
• Length of the distance
– Similarity of sub-trees originating from correlated
nodes
Computer Science and Engineering
23
Associated Nodes

Association similarity
– Distance of each node from the association
root
– Difference of the distance of the nodes from
the association root
– Similarity of the sub-trees originating at nodes
Example:

XML document:
Air show
Public
Inference Association Graph:
Public, AC
fort
fort
address
address
Computer Science and Engineering
24
Concept Generalization


Ontology concept hierarchy
– Normalized weight of concepts (more specific
concept, higher weight)
– Concept abstraction level
– Range of allowed abstractions
Example:
Abstraction Level
Object[].
waterSource :: Object
basin :: waterSource
place :: Object
district :: place
address :: place
base :: Object
fort :: base
Weight
OAL=0
WGT=1
OAL=1
WGT=15
OAL=2
WGT=1
OAL=1
WGT=15
OAL=2
WGT=1
OAL=2
WGT=1
OAL=1
WGT=15
OAL=2
WGT=1
Computer Science and Engineering
Normalized weight
OP=1/50
OP=15/50
OP=1/50
OP=15/50
OP=1/50
OP=15/50
OP=15/50
OP=1/50
25
Correlated Inference
Public
fort
address
Public
basin
district
?
Object[].
waterSource :: Object
basin :: waterSource
place :: Object
district :: place
address :: place
base :: Object
fort :: base
Confidential
base
Water source
Computer Science and Engineering
26
Correlated Inference
(cont.)
Public
fort
address
Public
basin
district
Object[].
waterSource :: Object
basin :: waterSource
place :: Object
district :: place
address :: place
base :: Object
fort :: base
Computer Science and Engineering
27
Correlated Inference (cont.)
base
Public
fort
place
Object[].
waterSource :: Object
basin :: waterSource
place :: Object
district :: place
address :: place
base :: Object
fort :: base
address
Public
basin
district
Water Source
Confidential
base
Water source
Computer Science and Engineering
28
Inference Removal


Relational databases
– Database design time: redesign database
– Query processing time: modify/refuse answer
Web inferences
– Problems:
 Cannot redesign public data outside of protection domain
 Cannot modify/refuse answer to already published web page
– Possible solutions:
 Withhold data: do not publish any public data, that may lead
to inferences.
 Publish confusing data: publish data that creates confusion
in contrast with existing publicly available data
Computer Science and Engineering
29
The Ugly Challenges
Technology Support
CSCE 727 - Farkas
30
Technical Influences –
Interoperability

Heterogenenous data
– Unstructured data, semi-structured data,
structured data

Representation of data semantics
– Schema languages, taxonomies, ontologies

Policy compliance
– Policy languages, expressive power,
implementation
CSCE 727 - Farkas
31
Technical Influences –
Availability

Survivability
– Response
– Critical environments

Open vs. protected
 Redundancy
CSCE 727 - Farkas
32
Technical Influences – Control

Control, monitor, and manage all usage
 Track dissemination of information
 Workflow management
– Policy, trust, efficiency, local vs. global
properties
CSCE 727 - Farkas
33
Social Influences

Trust
– People and agencies

Shadow network
– Conflict of interest
– Self-interest

Criticality
– The greater the threat the greater the likelihood
of information sharing
CSCE 727 - Farkas
34
Legal Influences

Policy Conflict and Competition
– Agency policy
– Agencies may compete for the same resources
– need to maintain advantage

Governance
– No universal policy on information sharing
(federal, state, local, tribal, etc.)
– International law
CSCE 727 - Farkas
35