Language-based Security Dr. Kevin W. Hamlen

Transcript Language-based Security Dr. Kevin W. Hamlen

Cyber Security Research at the
University of Texas at Dallas
Sample Projects
Prof. Bhavani Thuraisingham, PhD, CISSP
Prof. Latifur Khan, PhD
Prof. Murat Kantarcioglu, PhD
Prof. Kevin Hamlen, PhD
Prof. Edwin Sha, PhD
August 2010
Data Mining for Malicious Traffic
Dr. Latifur Khan (NASA, AFOSR)
Technical Approach
Motivation
•
•
Network traffic is a continuous flow of data,
which is evolving with time
How can we detect intrusion by mining the
network traffic when
•
•
•
the intrusions evolve themselves ?
only a small fraction of the traffic is analyzed
and labeled by human experts ?
new kind of intrusions appear ?
•
•
Idea: Build a classification model from past data
and predict intrusions using the model.
The model must be able to
•
•
•
•
Strategy:
•
•
System Architecture
•
Last Partially labeled
chunk
2
Last Unlabeled chunk
1
Training
4
Classification
Update
Ensemble of
models
New model
Refinement
FEARLESS engineering
Semi-supervised learning to compensate for the
short of labeled training data
Ensemble classification technique to cope with the
changes in the traffic
Novel class detection to detect new kind of
intrusions in the traffic
Newer chunks
Older chunks
Network traffic
keep itself up-to-date so that it can detect
intrusions even if their characteristics change over
time
use the limited amount of labeled data to
efficiently update itself
detect new kind of intrusions in the traffic
3
Intrusion?
Reactively Adaptive Malware
Dr. Kevin W. Hamlen and Dr. Latifur Khan (AFOSR)
•
•
•
Motivation
Design and study malware immune to conventional
antivirus technologies
Important for AF active defense project
Important for developing adequate defenses in
anticipation of next-generation attacks
Signature Query Interface
Antivirus
Signature
Database
FEARLESS engineering
Signature
Inference
Engine
Signature
Approximation
Model
Malware
Binary
•
•
Technical Approach
Data Mining
• use machine learning to discover
signatures dynamically
• adapt to new malware in the field
• share learned signatures amongst
mutually trusting attackers
Reactively Adaptive Malware
• discover false negatives in protection
system
• self-obfuscate to defeat defenses
Obfuscation
Generation
Obfuscation
Function
Obfuscated
Binary
AFOSR: Assured Information Sharing:
2005-2008 (Dr. Bhavani Thuraisingham)
Data/Policy for Coalition
Export
Data/Policy
Export
Data/Policy
Export
Data/Policy
Component
Data/Policy for
Agency A
Component
Data/Policy for
Agency C
Component
Data/Policy for
Agency B
Integrate the Medicaid claims data and mine
the data; next enforced policies and
determine how much information has
been lost (Trustworthy partners);
Prototype system; Application of
Semantic web technologies
Apply game theory and probing to extract
information from semi-trustworthy
partners
Conduct Active Defence and determine the
actions of an untrustworthy partner
Defend ourselves from our partners
using data mining techniques
Conduct active defence – find our what
our partners are doing by
monitoring them so that we can
defend our selves from dynamic
situations
Trust for Peer to Peer Networks
(Infrastructure security)
Trustworthy Partners
Semi-Trustworthy Partners
Untrustworthy Partners
Incentive Issues in Assured Information Sharing
Dr. Murat Kantarcioglu (DoD MURI Project 2008-2013, AFOSR))
Motivation
•Misaligned incentives could be a significant problem in Information Security.
—Software bugs vs. Software companies’ incentives
•Incentive issues in information sharing have been explored to some extent
—Incentive issues in file sharing p2p networks
•Assured information sharing creates new challenges
—Security considerations vs. Utility
•
Technical Approach
Verify that the other participants do not lie about their data.
If the data is revealed as it is
Trust but verify (Our initial results: DKE ’08 paper)
If the data is not revealed (e.g., SMC techniques are used)
Non-cooperative computing
Mechanism design
SMC with rational adversaries.
–
–
•
•
•
•
FEARLESS engineering
Scalable Social Network Mining
Dr. Murat Kantarcioglu (NSF)
Motivation
•Mining social network data could provide important
insights.
•Recently many different data mining techniques have
been suggested for mining social network data.
•These techniques require many iterations (e.g., collective
inference techniques) and expensive computations (e.g.,
maximum likelihood methods) over the large social
networks.
Technical Approach
•Our goal is to scale the existing social network mining
techniques to very large social network data by using
cloud computing.
•To achieve this goal, we are exploring
Intelligent data partition techniques based on
social network concepts
 Caching of some important queries
Efficient update of cached query results using
cloud computing
FEARLESS engineering
Initial Results
•Partitioning techniques based on various social network
centrality metrics have been implemented
 Degree centrality (DC)
Clustering coefficient (CC)
Closeness centrality (CloC)
Betweenness centrality (BC)
Random partionining
Domain specific
Our initial results indicate by intelligent partitioning we can
increase accuracy and reduce running time.
Language-based Security
Dr. Kevin W. Hamlen (AFOSR)
•
•
Motivation
Mobile code security (web scripts, patches,
etc.)
How to enforce application-specific security
policies over these untrusted software
extensions?
–
–
–
•
•
•
One simple rewriting strategy:
–
•
rewritten code must satisfy security policy
rewritten code behaves exactly like original
(except with regard to policy violations)
insert guard instructions before every
potentially dangerous instruction
Use compiler optimizations to eliminate or
streamline unnecessary guards
FEARLESS engineering
untrusted
code
Trusted
Computing
Base
Policy #1: Untrusted code must not create or
modify any file whose name ends in “.exe”
Policy #2: Untrusted code must not access the
network after reading a confidential file
Policy #3: Untrusted code must relinquish the
thread after at most 1000 instruction cycles
Technical Approach
Idea: Automatically rewrite the code prior to
execution
Two constraints on rewritten code:
–
–
System Architecture
reject
security
policy
Rewriter
verifier
self-monitoring
code + proof
accept
Example Code
(inserted code shown in green)
…
eax := “filename.exe”
if (eax == “*.exe”) abort();
call System.open(eax, “w”);
…
Privacy-preserving Distributed Data Mining
Dr. Murat Kantarcioglu (NSF)
Motivation
•
•
•
Privacy sensitive data that is needed for many
critical tasks is distributed among different
organizations.
 Statistical analysis of hospital discharge data
for detecting biological weapons attacks.
Privacy concerns may hinder sharing such data for
legitimate purposes.
Our goal is to develop techniques to enable
distributed data mining without sacrificing
individual privacy
Technical Approach
•
•
Idea: Combine sanitization and cryptographic
techniques to enable efficient and accurate privacypreserving distributed data mining.
 Each data source sanitizes its own data.
 Sanitized data is shared directly .
 Cryptographic algorithms use sanitize data
along with original data to get the data mining
results.
Our initial results indicate that this idea is more
efficient than pure cryptographic approaches and
more accurate than pure sanitization approaches.
FEARLESS engineering
Cryptographic Protocols
Sanitized
Data Processing
Sanitized
Data 1 (Public)
Sanitized
Data 2 (Public)
Data Sanitization
Data Sanitization
Source Data 1
(Private)
Source Data 2
(Private)
Result
 WWW problems as a source of geo-information

Geographic context embedded in natural language
descriptions

Place names ambiguous and confused with names of
organisations, people, buildings and streets
Text
Info.
Retrieval
Web queries depend on exact match of text terms
 Applications:
•
•
•
NNP
Update

Webpage
gazetteer
Location-based services
NN, NNS,
NNP, NNPS
Locally targeted web advertising
Mining geographic properties
Market research
•
Geo-Tagging = Geo-parsing + Geo-coding
•
•
Ranking Based
Disambiguation
Geo-Information Web services
Geo-parsing
Recognising geographic references (ignoring nongeographic uses of place terminology)
Geo-coding
– Attaching a unique quantitative locations
(footprint) to geographic references
 Example:
 Geo-Geo ambiguity
{city}Columbia/{S_C}California/U.S.
{City}Columbia/{S_C}Pennsylvania/U.S.

Geo- non Geo ambiguity
e.g. “Samuel Lancaster”
Lancaster > Last name.
{City} Lancaster / Texas/ U.S.
Other Projects
•
•
•
•
•
•
•
•
•
Secure Cloud Computing
http://www.wpafb.af.mil/news/story.asp?id=123209377
Secure Social and Private Networks
Security and Privacy preserving ontology alignment
Secure Peer to Peer Data Management
Risk modeling and analysis of Botnets
Policy interoperability of geospatial data
Data provenance and Attribution of Attacks
Accountability of Secure Systems

Language-based Security Dr. Kevin W. Hamlen

Transcript Language-based Security Dr. Kevin W. Hamlen

Directory