Technologies
Download
Report
Transcript Technologies
Open Sources -- Intelligence
The Ugly
Challenges
The Good
The Bad
Reading
The Good: Rasmus Rosenqvist Petersen and Uffe Kock Wiil. 2011.
Hypertext structures for investigative teams. In Proceedings of the
22nd ACM conference on Hypertext and hypermedia (HT '11). ACM,
New York, NY, USA, 123-132.
The Bad: C. Farkas and A. Stoica, “Correlated Data Inference in
Ontology Guided XML Security Engine,” Proc. of IFIP 17th WG
11.3 working conference on Data and Application Security, 2003.
The Challenges: Joseph V. Treglia and Joon S. Park. 2009. Towards
trusted intelligence information sharing. In Proceedings of the ACM
SIGKDD Workshop on CyberSecurity and Intelligence Informatics
(CSI-KDD '09)
CSCE 727 - Farkas
2
The Good:
Support for Data Integration
and Analysis
CSCE 727 - Farkas
3
Intelligence Analysis
Law Enforcement Investigations
Investigative teams:
– Collect, process, analyze information related to
a specific target
– Disseminate findings
Need automated tool to support activities
CSCE 727 - Farkas
4
Application Areas
Policing
– Reactive nature
– Incident-driven response
Counterterrorism
– National security
– Proactive
– Covert operations
Investigative journalism
– “Wrong doing” of organizations, influential individuals,
etc.
CSCE 727 - Farkas
5
Knowledge Management
Acquisition: collect and process data
– Traditional methods
– Artificial Intelligence: Machine Learning
Synthesis: create model of the target
Sense making: extract useful information
Disseminate findings: appropriate
representation for the appropriate audiance
CSCE 727 - Farkas
6
Case Study
Kidnapping of Daniel Pearl, Wall Street
Journal bureau chief in 2002
The Perl Project, Georgetown Univeristy,
http://pearlproject.georgetown.edu/press_pe
arlrelease.html
Complex mapping of people, situations,
locations, etc. to find kidnappers
CSCE 727 - Farkas
7
Technology Involved in
Visualizing Information
White Board – link chart
– Useful to model entities, their attributes, and
relationships
– Complex data types (text, images, symbols,
etc.)
– Becomes complex
– Difficult to share
CSCE 727 - Farkas
8
Computer Support –
Functionality
Acquisition: import, drag-drop, cut/past
Synthesis: add/modify/delete entities, relations, restructure, group, collapse/expand, brainstorming
Sense-making: retracing, creating hypothesis and
alternative interpretations, prediction, exploring
perspectives, decision making
Dissemination: storytelling, report generation
CSCE 727 - Farkas
9
Hypertext Structuring
Mechanism
Associative structures – extended to handle
composites (supports synthesis)
Spatial structure – handle emerging and dynamic
structures over time
Taxonomy structures – supports classification tasks
Issue-based structures – support argumentation and
reasoning
Annotation and metadata structure – add semantics
CSCE 727 - Farkas
10
Past
Copyright: Rasmus Rosenqvist Petersen and Uffe Kock Wiil. 2011. Hypertext structures for investigative teams. In Proceedings of the
22nd ACM conference on Hypertext and hypermedia (HT '11). ACM, New York, NY, USA, 123-132.
CSCE 727 - Farkas
11
With the CrimeFighter Investigator
Copyright: Rasmus Rosenqvist Petersen and Uffe Kock Wiil. 2011. Hypertext structures for investigative teams. In Proceedings of the
22nd ACM conference on Hypertext and hypermedia (HT '11). ACM, New York, NY, USA, 123-132.
CSCE 727 - Farkas
12
What Would be Better?
Automated Data Collection
Semantic-based Data Integration
Intelligent Data Analysis
Assurance of results
CSCE 727 - Farkas
13
The Bad:
Unauthorized Disclosure
CSCE 727 - Farkas
14
The Bad: A. Stoica and C. Farkas,
“Ontology guided Security Engine,”
Journal of Intelligent Information
Systems, 23(3): 209-223, 2004.
(http://www.cse.sc.edu/~farkas/publications
/j5.pdf )
Computer Science and Engineering
15
Semantic Web
• Open, dynamic environment
• Large number of users, agents,
resources
•Semantic tools
• Autonomous agents
• Machine understandable in data semantics
• Computers exchange information
transparently on behalf of the user
Computer Science and Engineering
16
IS INFERENCING ON THE
SEMANTIC WEB CREATES
A
SECURITY PROBLEM?
Computer Science and Engineering
17
Motivation 1: Simulation Exploitation Using
Open Source Information
Objective: US Government would like to share a limited
simulation software with friendly countries.
– Can this software be used to explore the capabilities
of US weaponry?
– Can sufficient information be found from public
sources to create such simulation?
Findings:
– Most of the information needed for the simulation
was available on the Internet.
– Needed human aid to combine available information
Computer Science and Engineering
18
Motivation 2: Homeland Security
Objective: Hide location of water reservoirs supplying
military bases to limit terrorist activities.
– Can location of a reservoir of a military base be found
from public data on the Internet?
Findings:
– Location of a military base and water reservoirs of
that region are available on the Web.
– Needed human aid to combine available information
Computer Science and Engineering
19
The Inference Problem
General Purpose Database:
Non-confidential data + Metadata
Undesired Inferences
Semantic Web:
Non-confidential data + Metadata +
Computational Power + Connectivity
Undesired Inferences
Computer Science and Engineering
20
The Inference Problem
• Given
• a set of confidential
information,
• large amount of public data, and
• semantic relationship of public
data.
• Is it possible to deduce the
confidential information from the
semantically enhanced public data?
• Security violation = disallowed data
can be deduced from public data
Computer Science and Engineering
21
Ontology Guided XML Security
Engine (Oxegin)
Public
Organizational Data
Public User
Ontology
Web Data
Confidential
Replicated Correlated
Data Inf.
Data Inf.
Oxegin
Computer Science and Engineering
22
Correlated Data Inference
• Finds confidential information from public data
(sensitive associations)
• Inference guidance:
– Ontology concept hierarchy
– Structural similarity of public data
• Features of similarity
– Levels of abstraction for each node
– Distance of associated nodes from association root
• Similarity of the distances
• Length of the distance
– Similarity of sub-trees originating from correlated
nodes
Computer Science and Engineering
23
Associated Nodes
Association similarity
– Distance of each node from the association
root
– Difference of the distance of the nodes from
the association root
– Similarity of the sub-trees originating at nodes
Example:
XML document:
Air show
Public
Inference Association Graph:
Public, AC
fort
fort
address
address
Computer Science and Engineering
24
Concept Generalization
Ontology concept hierarchy
– Normalized weight of concepts (more specific
concept, higher weight)
– Concept abstraction level
– Range of allowed abstractions
Example:
Abstraction Level
Object[].
waterSource :: Object
basin :: waterSource
place :: Object
district :: place
address :: place
base :: Object
fort :: base
Weight
OAL=0
WGT=1
OAL=1
WGT=15
OAL=2
WGT=1
OAL=1
WGT=15
OAL=2
WGT=1
OAL=2
WGT=1
OAL=1
WGT=15
OAL=2
WGT=1
Computer Science and Engineering
Normalized weight
OP=1/50
OP=15/50
OP=1/50
OP=15/50
OP=1/50
OP=15/50
OP=15/50
OP=1/50
25
Correlated Inference
Public
fort
address
Public
basin
district
?
Object[].
waterSource :: Object
basin :: waterSource
place :: Object
district :: place
address :: place
base :: Object
fort :: base
Confidential
base
Water source
Computer Science and Engineering
26
Correlated Inference
(cont.)
Public
fort
address
Public
basin
district
Object[].
waterSource :: Object
basin :: waterSource
place :: Object
district :: place
address :: place
base :: Object
fort :: base
Computer Science and Engineering
27
Correlated Inference (cont.)
base
Public
fort
place
Object[].
waterSource :: Object
basin :: waterSource
place :: Object
district :: place
address :: place
base :: Object
fort :: base
address
Public
basin
district
Water Source
Confidential
base
Water source
Computer Science and Engineering
28
Inference Removal
Relational databases
– Database design time: redesign database
– Query processing time: modify/refuse answer
Web inferences
– Problems:
Cannot redesign public data outside of protection domain
Cannot modify/refuse answer to already published web page
– Possible solutions:
Withhold data: do not publish any public data, that may lead
to inferences.
Publish confusing data: publish data that creates confusion
in contrast with existing publicly available data
Computer Science and Engineering
29
The Ugly Challenges
Technology Support
CSCE 727 - Farkas
30
Technical Influences –
Interoperability
Heterogenenous data
– Unstructured data, semi-structured data,
structured data
Representation of data semantics
– Schema languages, taxonomies, ontologies
Policy compliance
– Policy languages, expressive power,
implementation
CSCE 727 - Farkas
31
Technical Influences –
Availability
Survivability
– Response
– Critical environments
Open vs. protected
Redundancy
CSCE 727 - Farkas
32
Technical Influences – Control
Control, monitor, and manage all usage
Track dissemination of information
Workflow management
– Policy, trust, efficiency, local vs. global
properties
CSCE 727 - Farkas
33
Social Influences
Trust
– People and agencies
Shadow network
– Conflict of interest
– Self-interest
Criticality
– The greater the threat the greater the likelihood
of information sharing
CSCE 727 - Farkas
34
Legal Influences
Policy Conflict and Competition
– Agency policy
– Agencies may compete for the same resources
– need to maintain advantage
Governance
– No universal policy on information sharing
(federal, state, local, tribal, etc.)
– International law
CSCE 727 - Farkas
35