Bhavani-presesntation-for-IARPA - The University of Texas at Dallas

Download Report

Transcript Bhavani-presesntation-for-IARPA - The University of Texas at Dallas

Sample of Data Security and
Knowledge Discovery
Research at the
University of Texas at Dallas
Dr. Bhavani Thuraisingham
Dr. Latifur Khan
Dr. Murat Kantarcioglu
Dr. Kevin Hamlen
September 20, 2007
4/9/2016 16:35
2
Outline
0 Data and Applications Security
- Information sharing, Geospatial data management,
Surveillance, Secure web services, Privacy, Dependable
information management, Intrusion detection
0 Data Mining an d Knowledge Discovery
- Data Mining for Security Applications, Data Mining for
Bioinformatics, Data Mining for Data and Software Quality
4/9/2016 16:35
3
Research Group:
Data and Applications Security
0 Core Group
- Prof. Bhavai Thuraisingham (Professor & Director,
Cyber Security Research Center)
- Prof. Latifur Khan (Director, Data Mining Laboratory)
- Prof. Murat Kantarcioglu (Joined Fall 2005, PhD.
Purdue)
- Prof. Kevin Hamlen (Peer to Peer systems Security,
Joined 2006 from Cornell U.)
0 Students and Funding
- Over 20 PhD Students, 40 MS students (combined)
- Research grants Air Force Office of Scientific
Research NSF, NGA, Raytheon, - - - -
4/9/2016 16:35
4
Vision 1: Assured Information Sharing
Data/Policy for Coalition
Publish
Data/Policy
Publish
Data/Policy
Publish
Data/Policy
Component
Data/Policy for
Agency A
Research funded by two
grants from AFOSR
Component
Data/Policy for
Agency C
Component
Data/Policy for
Agency B
1.
Friendly partners
2.
Semi-honest partners
3.
Untrustworthy partners
4/9/2016 16:35
5
Vision 2: Secure Geospatial Data Management
Data Source A
Data Source B
Data Source C
Semantic Metadata
Extraction
Decision Centric Fusion
Geospatial data
interoperability through
web services
Geospatial data mining
Geospatial semantic web
Tools for
Analysts
SECURITY/ QUALITY
Research Supported by Raytheon on pne grant; working on robust prototypes on
second grant
4/9/2016 16:35
6
Vision 3: Surveillance and Privacy
Raw video surveillance data
Face Detection and
Face
Derecognizing
system
Faces of trusted people
derecognized to
preserve privacy
Suspicious Event
Detection System
Manual Inspection
of video data
Suspicious people
found
Suspicious events
found
Report of security personnel
Comprehensive
security report
listing suspicious
events and people
detected
4/9/2016 16:35
7
Example Projects
0 Assured Information Sharing
- Secure Semantic Web Technologies
- Social Networks and game playing
- Privacy Preserving Data Mining
0 Geospatial Data Management
- Secure Geospatial semantic web
- Geospatial data mining
0 Surveillance
- Suspicious Event Detention
- Privacy preserving Surveillance
- Automatic Face Detection, RFID technologies
0 Cross Cutting Themes
- Data Mining for Security Applications (e.g., Intrusion detection, Mining
Arabic Documents); Dependable Information Management
4/9/2016 16:35
8
Social Networks
0 Individuals engaged in suspicious or undesirable behavior rarely
act alone
0 We can infer than those associated with a person positively
identified as suspicious have a high probability of being either:
- Accomplices (participants in suspicious activity)
- Witnesses (observers of suspicious activity)
0 Making these assumptions, we create a context of association
between users of a communication network
4/9/2016 16:35
9
Privacy Preserving Data Mining
0 Prevent useful results from mining
- Introduce “cover stories” to give “false” results
- Only make a sample of data available so that an adversary is
unable to come up with useful rules and predictive functions
0 Randomization and Perturbation
- Introduce random values into the data and/or results
- Challenge is to introduce random values without significantly
affecting the data mining results
- Give range of values for results instead of exact values
0 Secure Multi-party Computation
- Each party knows its own inputs; encryption techniques used to
compute final results
4/9/2016 16:35
10
Framework for Geospatial Data Security
DATA PRESENTATION COMPONENTS
Open
Geospatial
Consortium
Framework
Traditional GIS
GIS Web Services
Wrapper
SECURITY LAYER
Core &
Application
Schemas
Geospatial
Features
Geography
Markup
Language
Authentic
Data Publication
DAC/RBAC Policy
Specification
Policy Reasoning
Engine
Access Control
Module
Trust & Privacy
Management
Auditing
Misuse Detection
Metadata
DATA ACCESS LAYER
Geospatial Data Registration
spatial and temporal
registration of geospatial data
Geospatial
Data
Repositories
Data Integration Services
&
Data Repository Access
4/9/2016 16:35
11
Data Mining for Surveillance
0 We define an event representation measure based on low-level
features
0 This allows us to define “normal” and “suspicious” behavior and
classify events in unlabeled video sequences appropriately
0 A visualization tool can then be used to enable more efficient
browsing of video data
4/9/2016 16:35
Data Mining for Intrusion Detection / Worm
Detection
Training
Data
Classification
Hierarchical
Clustering (DGSOT)
SVM Class Training
Testing
DGSOT: Dynamically growing self organizing tree
SVM: Support Vector Machine
Testing Data
12
4/9/2016 16:35
13
Intrusion Detection: Results
Training Time, FP and FN Rates of Various Methods
Average
Average
FP
Average
FN
Rate (%)
Rate (%)
Accuracy
Total Training
Time
Random Selection
52%
0.44 hours
40
47
Pure SVM
57.6%
17.34 hours
35.5
42
SVM+Rocchio
Bundling
51.6%
26.7 hours
44.2
48
SVM + DGSOT
69.8%
13.18 hours
37.8
29.8
Methods
4/9/2016 16:35
Information
Assurance Education

Current Courses
Introduction to Information Security: Prof. Sha
Trustworthy Computing: Prof. Sha
Cryptography: Profs. Sudborough, Murat
Information Assurance: Prof. Yen
Data and Applications Security: Prof. Bhavani Thuraisingham
Biometrics: Prof. Bhavani
Privacy: Prof. Murat Kantarcioglu
Secure Language, prof. Kevin Hamlen
Digital Forensics: Prof. Bhavani Thuraisingham
 Future Courses
Network Security: Profs. Ventatesan, Sarac
Security Engineering: Profs. Bastani, Cooper
Intrusion Detection: Profs. Khan, Thuraisingham
Digital Watermarking: Prof. Prabhakaran
 Courses at AFCEA and AF Bases
Knowledge Management, Data Mining for Counter-terrorism, Data Security,
preparing a course on SOA and NCES with Prof. Alex Levis - GMU and Prof.
Hal Sorenson - UCSD)
14
4/9/2016 16:35
15
Knowledge Discovery in Images
0 Goal: Find unusual changes
Process:
- Use data mining to model
normal differences between
images
- Find places where differences
don’t match model
0 Questions to be answered:
- What are the right mining
techniques?
- Can we get useful results?
4/9/2016 16:35
16
Change Detection:
0 Trained Neural Network to predict “new” pixel from “old” pixel
- Neural Networks good for multidimensional continuous data
- Multiple nets gives range of “expected values”
0 Identified pixels where actual value substantially outside range
of expected values
- Anomaly if three or more bands (of seven) out of range
0 Identified groups of anomalous pixels
4/9/2016 16:35
17
Multimedia/Image Mining
Automatically annotate images then retrieve based on the textual annotations.
Images
Segments
Blob-tokens
4/9/2016 16:35
18
Web Page Prediction:
Problem Description
Office of admission (P1)
?
VIP web page (P2)
Financial Aid Information (P3)
What page
is Next??
4/9/2016 16:35
19
Web Page Prediction: Architecture
User
sessions
SVM
SVM
output
Sigmoid
mapping
SVM
prediction
fusion
Feature
Extraction
ANN
Markov
Model
Sigmoid
ANN mapping
output
ANN
Prediction
Markov
prediction
Dempster’s
Rule
Final
Prediction
4/9/2016 16:35
20
Misuse/Misinformation/ Insider threat
0 %50 of corporate breaches or losses of information that were
0
0
0
0
0
0
made public in the past year were insider attacks
%50 of those insider attacks were the thefts of information by
employees
It is hard to model individuals!!!
Role based access control provides tools to model given
roles
Challenge: How to develop models for predicting normal
usage of a role vs misuse?
Challenge: How to integrate misuse, auditing and access
control systems?
Current Status: We are developing misuse detection system
based on clustering; Risk-based analysis
4/9/2016 16:35
21
Time Constrained KDD: Proposal to
AFOSR with UIUC
0 The military must continually carry out the followed operations:
- Surveillance: monitor the behavior of the people or objects to see if they
are deviating from the norm; Maneuver – Place the enemy in a position
of disadvantage through the flexible application of combat power; Mass:
the effects of overwhelming combat power at the decisive place and
time; Attack: an attempt to actively strike at the enemy, as opposed to a
defensive plan.
0 Track the enemy and DETER him during surveillance and maneuver
stage through
- Knowledge Discovery: Extract concepts from the stream data arriving
from the sensors; Time Constrained Activity Analysis: Extract
knowledge from the enemy activities arriving in the form of streams;
Ontology Management: Develop ontologies and subsequently conduct
multi-modal data analysis of the multimedia data captured and resolve
conflicts and uncertainty; Resource Allocation: Utilize the knowledge
discovered, apply decision theories and determine resource allocation
4/9/2016 16:35
22
Some Experiences with Tools
0 Tools developed in-house
- Image mining tool, Data Sharing Tool,
- Intrusion detection/Malicious code detection tools, Web page
prediction tool
- Multimedia mining/Image extraction including MPEG7 feature
descriptors
- Cluster visualization tool
0 External tools
- Oracle data mining product
- IDIS data mining tool
- WEKA data mining tool
- XML SPIE and QUIP
- INTEL OpenCV
4/9/2016 16:35
Technical and Professional
Accomplishments

Publications of research in top journals and conferences, books
 IEEE Transactions, ACM Transactions, 8 books published and 2 books
in preparation including one on UTD research (Data Mining Applications,
Awad, Khan and Thuraisingham)
 Member of Editorial Boards/Editor in Chief
 Journal of Computer Security, ACM Transactions on Information and
Systems Security, IEEE Transactions on Dependable and Secure
Computing, IEEE Transactions on Knowledge and Data Engineering,
Computer Standards and Interfaces - -  Advisory Boards / Memberships/Other
Purdue University CS Department, Invitations to write articles in
Encyclopedia Britannica on data mining, Keynote addresses, Talks at
DFW NAFTA and Chamber of Commerce, Commercialization
discussions of data mining tools for security
 Awards and Fellowships
 IEEE Fellow, AAAS Fellow, BCS Fellow, IEEE Technical Achievement
Award, IEEE Senior Member
23
4/9/2016 16:35
24
Our Model: R&D, Technology Transfer
Standardization and Commercialization
 Basic Research (6-1 Type)




Funding agencies such as NSF, AFOSR, NGA, - - - -, etc. ; Publish our
research in top journals (ACM and IEEE Transactions)
Applied Research
 Some federal funding (e.g., from government programs) and
Commercial Corporations (e.g., Raytheon); Our current collaboration
with AFRL-ARL
Technology Transfer / Development
Work with corporations such as Raytheon to showcase our research
to sponsors (e.g., GEOINT) and transfer research to operational
programs such as DCGS
Standardization
Our collaborations with OGC, OASIS and standardization of our
research (e.g., GRDF)
Commercialization
 Patents, Work with VCs, Corporations, SBIR, STTR for
commercialization of our tools (e.g., our work on data mining tools)
4/9/2016 16:35
25
Our Vision for
Assured Information Sharing/KDD
Time
constrained
KDD
(Future)
Link Analysis
(AFOSR,
Texas)
Assured
Information
Sharing/KDD
Privacy
Preserving
data mining
(Texas)
Technologies
will contribute to
Assured
Information
Sharing
Game Theory
(AFOSR
Incentive
based
Knowledge
management
(Future)
Dependable
Information
Management
(Texas)
Misinformation/
Misuse
(AFOSR)
Geospatial
(NGA,
Raytheon)
Semantic Web
(NSF, AFOSR)
4/9/2016 16:35
26
Our Collaborations in
Assured Information Sharing and KDD
Game Theory
(UTD
Management
School)
Time
Constrained
KDD
(UIUC)
Link Analysis
(UGA, UAZ)
Privacy
Preserving
data mining
(Purdue)
Assured
Information
Sharing/KDD
Dependable
Information
Management
(UCR, UTSA)
Misinformation/
Misuse
(Purdue)
Geospatial
(UMN, UCD,
Purdue, WVU,
UCF)
Knowledge
management
(SUNY Buffalo)
Semantic Web
(UMBC, UTSA)